Postprocessing: Jupyter notebooks

In all analyses we paid special attention to excluding any queries which incorrect/incomplete results.

Notebook Description Input Output
01_LogsToCSV Parses benchmarker output logs to generate queryevents.csv files Log files (see tab Benchmarker Logs) *_queryevents.csv, files can be found here
02_QueryCorrectness_Interthread For every RDF system we verify if a query always return the same result in every thread *_queryevents.csv None
03_QueryCorrectness_CountQueries Count queries always have #results = 1 => verify if the count queries have the same actual result (queries identified via hash) Dump of full results of benchmarks, files too large for Github CountQueryConsistency.csv has the first result per query
04_ResultsPerQueryDF No inconsistencies between threads so per query engine the number of results per query can be unambigously calculated here *_queryevents.csv resultsperquery_csv
05_QueryCorrectness_Intersim_Watdiv Calculate the number of results per query by consensus (WatDiv) resultsperquery_csv Consensus per benchmark , csv_correct/*
06_QueryCorrectness_Intersim_Ontoforce Calculate the number of results per query by consensus (Ontoforce) resultsperquery_csv Consensus per benchmark , csv_correct/*
07_ErrorAnalysis_Ontoforce Give an overview of all Vendor benchmarks on the Ontoforce benchmark. Overview shows a status for each query execution which can result in success, error, incorrect, timeout or unknown (engine crashed). csv_correct/* Fig 09
08_FeaturesOfFailedQueries_Ontoforce First attempt at visualizing properties of failed Ontoforce Queries u sing parallel coordinates csv_correct/* , ontoforce_query_features None
09_FeaturesOfFailedQueries_Ontoforce_2_FeatureCorrelations Query Feature Analysis: Which query features are correlated? ontoforce_query_features None
10_DTreesPerBenchmark First Attempt at Decision Tree Analysis: which features determine outcome of a query? csv_correct/* , ontoforce_query_features None
11_DTreesForAllSims Generating Decision Trees for all simulations as one, query engine is a feature! csv_correct/* , ontoforce_query_features Trees All Sims
12_DTreesForAllSims2 Generating Decision Trees for all simulations as one, query engine is a feature! csv_correct/* , ontoforce_query_features Trees All Sims
13_CachingAnalytics Studying the effect of caching by comparing the fastest to the slowest run of a query csv_correct/* query_events_sorted
14_CachingAnalytics_2 Studying the effect of caching by comparing the fastest to the slowest run of a query query_events_sorted Caching Fig.
15_BenchmarkSurvival Calculating the benchmark survival interval *_queryevents.csv Figure BM Survival
16_QueriesThatCrashSimulations What type of query is the first one to fail, or the first one to crash a query engine? query_events_sorted None
17_SingleMultiClientRuntimes Query runtimes during warmup (single-threaded) vs stress test (5 threads) query_events_sorted Figure Server Load
18_RuntimeAnalysisCSV_DiscardIncorrectQueries Query Runtime Analysis (discarding incorrect queries!) csv_correct/* runtime_csv_correct
19_RuntimeVisualAnalysisBoxplots_DiscardIncorrectQueries Boxplots Query Runtimes runtime_csv_correct Figures ResultsI
20_QueryTemplateAnalysisWatdiv_DiscardIncorrectQueries Query Runtimes Per Query Template (WatDiv) runtime_csv_correct Figures ResultsII
21_BenchmarkCostVisualizations_ExcludeIncorrectQueries Comparing all simulations in terms of runtime cost, taking into account cloud / licensing / runtimes runtime_csv_correct , sim_cost, loadtimes Figure ResultsIII