Walking without a Map: Ranking-Based Traversal for Querying Linked Data

Database Research Group
Cheriton School of Computer Science
University of Waterloo

This page provides all digital artifacts related to our paper in the 15th International Semantic Web Conference (ISWC 2016). All content on this page is licensed under the Creative Commons Attribution-Share Alike 3.0 License.

Table of content:

Documents
Software
Test Webs

Abstract

The emergence of Linked Data on the WWW has spawned research interest in an online execution of declarative queries over this data. A particularly interesting approach is traversal-based query execution which fetches data by traversing data links and, thus, is able to make use of up-to-date data from initially unknown data sources. While the downside of this approach is the delay before the query engine completes a query execution, user perceived response time may be improved significantly by returning as many elements of the result set as soon as possible. To this end, the query engine requires a traversal strategy that enables the engine to fetch result-relevant data as early as possible. The challenge for such a strategy is that the query engine does not know a priori what data sources will be discovered during the query execution and which of them contain result-relevant data. In this paper, we investigate 16 different approaches to rank traversal steps and achieve a variety of traversal strategies. We experimentally study their impact on response times and compare them to a baseline that resembles a breadth-first traversal. While our experiments show that some of the approaches can achieve noteworthy improvements over the baseline in a significant number of cases, we also observe that for every approach, there is a non-negligible chance to achieve response times that are worse than the baseline.

Documents

PDF file Preprint of the Paper (16 pages)
Extended Version (18 pages, double column)

Software

Query Execution System

To execute conjunctive Linked Data queries we used our traversal-based query execution system SQUIN, which is Free Software, licensed under the Apache License, Version 2.0.

JAR file of the SQUIN version used for the experiments
Source code package of the SQUIN version used for the experiments
Benchmark package for SQUIN as used for the experiments (includes shell scripts, test queries, and JAR and source code of the benchmark driver and implementations of the different approaches tested)

The SQUIN code depends on Apache Jena (for the experiments we used version 2.10.1) and the Norbert library (we used version 0.3.2). Additionally, the benchmark package depends on the Berlin SPARQL Benchmark tools and on the JUNG framework (we used version 2.0.1)

WODSim

To generate test Webs from a given base dataset and to simulate such a test Web using a Java servlet container (such as Apache Tomcat) we developed the WODSim framework, which is Free Software, licensed under the Apache License, Version 2.0.

SQUIN-WODSim.zip

The WODSim code depends on SQUIN and Apache Jena.

Test Webs

Each of the following packages contains a materialized version of one of the test Webs used for our study. To simulate such a test Web using WODSim (see above) unpack the package and refer to the obtained directory in the configuration file of the WODSim servlet.

W(0,0) (5.9MB)
W(0,0.33) (6.1MB)
W(0,0.66) (6.3MB)
W(0,1) (6.4MB)
W(0.33,0) (6.1MB)
W(0.33,0.33) (6.3MB)
W(0.33,0.66) (6.4MB)
W(0.33,1) (6.5MB)
W(0.66,0) (6.3MB)
W(0.66,0.33) (6.4MB)
W(0.66,0.66) (6.4MB)
W(0.66,1) (6.5MB)
W(1) (6.4MB)
BSBM dataset used as base dataset for generating these test Webs (6.6MB)