Postprocessing: Jupyter notebooks

In all analyses we paid special attention to excluding any queries which incorrect/incomplete results.

Notebook	Description	Input	Output
01_LogsToCSV	Parses benchmarker output logs to generate queryevents.csv files	Log files (see tab Benchmarker Logs)	*_queryevents.csv, files can be found here
02_QueryCorrectness_Interthread	For every RDF system we verify if a query always return the same result in every thread	*_queryevents.csv	None
03_QueryCorrectness_CountQueries	Count queries always have #results = 1 => verify if the count queries have the same actual result (queries identified via hash)	Dump of full results of benchmarks, files too large for Github	CountQueryConsistency.csv has the first result per query
04_ResultsPerQueryDF	No inconsistencies between threads so per query engine the number of results per query can be unambigously calculated here	*_queryevents.csv	resultsperquery_csv
05_QueryCorrectness_Intersim_Watdiv	Calculate the number of results per query by consensus (WatDiv)	resultsperquery_csv	Consensus per benchmark , csv_correct/*
06_QueryCorrectness_Intersim_Ontoforce	Calculate the number of results per query by consensus (Ontoforce)	resultsperquery_csv	Consensus per benchmark , csv_correct/*
07_ErrorAnalysis_Ontoforce	Give an overview of all Vendor benchmarks on the Ontoforce benchmark. Overview shows a status for each query execution which can result in success, error, incorrect, timeout or unknown (engine crashed).	csv_correct/*	Fig 09
08_FeaturesOfFailedQueries_Ontoforce	First attempt at visualizing properties of failed Ontoforce Queries u sing parallel coordinates	csv_correct/* , ontoforce_query_features	None
09_FeaturesOfFailedQueries_Ontoforce_2_FeatureCorrelations	Query Feature Analysis: Which query features are correlated?	ontoforce_query_features	None
10_DTreesPerBenchmark	First Attempt at Decision Tree Analysis: which features determine outcome of a query?	csv_correct/* , ontoforce_query_features	None
11_DTreesForAllSims	Generating Decision Trees for all simulations as one, query engine is a feature!	csv_correct/* , ontoforce_query_features	Trees All Sims
12_DTreesForAllSims2	Generating Decision Trees for all simulations as one, query engine is a feature!	csv_correct/* , ontoforce_query_features	Trees All Sims
13_CachingAnalytics	Studying the effect of caching by comparing the fastest to the slowest run of a query	csv_correct/*	query_events_sorted
14_CachingAnalytics_2	Studying the effect of caching by comparing the fastest to the slowest run of a query	query_events_sorted	Caching Fig.
15_BenchmarkSurvival	Calculating the benchmark survival interval	*_queryevents.csv	Figure BM Survival
16_QueriesThatCrashSimulations	What type of query is the first one to fail, or the first one to crash a query engine?	query_events_sorted	None
17_SingleMultiClientRuntimes	Query runtimes during warmup (single-threaded) vs stress test (5 threads)	query_events_sorted	Figure Server Load
18_RuntimeAnalysisCSV_DiscardIncorrectQueries	Query Runtime Analysis (discarding incorrect queries!)	csv_correct/*	runtime_csv_correct
19_RuntimeVisualAnalysisBoxplots_DiscardIncorrectQueries	Boxplots Query Runtimes	runtime_csv_correct	Figures ResultsI
20_QueryTemplateAnalysisWatdiv_DiscardIncorrectQueries	Query Runtimes Per Query Template (WatDiv)	runtime_csv_correct	Figures ResultsII
21_BenchmarkCostVisualizations_ExcludeIncorrectQueries	Comparing all simulations in terms of runtime cost, taking into account cloud / licensing / runtimes	runtime_csv_correct , sim_cost, loadtimes	Figure ResultsIII