Walking without a Map: Ranking-Based Traversal for Querying Linked Data

Olaf Hartig M. Tamer Özsu
Database Research Group
Cheriton School of Computer Science
University of Waterloo

This page provides all digital artifacts related to our paper in the 15th International Semantic Web Conference (ISWC 2016). All content on this page is licensed under the Creative Commons Attribution-Share Alike 3.0 License.

Table of content:


Abstract

The emergence of Linked Data on the WWW has spawned research interest in an online execution of declarative queries over this data. A particularly interesting approach is traversal-based query execution which fetches data by traversing data links and, thus, is able to make use of up-to-date data from initially unknown data sources. While the downside of this approach is the delay before the query engine completes a query execution, user perceived response time may be improved significantly by returning as many elements of the result set as soon as possible. To this end, the query engine requires a traversal strategy that enables the engine to fetch result-relevant data as early as possible. The challenge for such a strategy is that the query engine does not know a priori what data sources will be discovered during the query execution and which of them contain result-relevant data. In this paper, we investigate 16 different approaches to rank traversal steps and achieve a variety of traversal strategies. We experimentally study their impact on response times and compare them to a baseline that resembles a breadth-first traversal. While our experiments show that some of the approaches can achieve noteworthy improvements over the baseline in a significant number of cases, we also observe that for every approach, there is a non-negligible chance to achieve response times that are worse than the baseline.

Documents

PDF file Preprint of the Paper (16 pages)
PDF file Extended Version (18 pages, double column)

Software

Query Execution System

To execute conjunctive Linked Data queries we used our traversal-based query execution system SQUIN, which is Free Software, licensed under the Apache License, Version 2.0.

The SQUIN code depends on Apache Jena (for the experiments we used version 2.10.1) and the Norbert library (we used version 0.3.2). Additionally, the benchmark package depends on the Berlin SPARQL Benchmark tools and on the JUNG framework (we used version 2.0.1)

WODSim

To generate test Webs from a given base dataset and to simulate such a test Web using a Java servlet container (such as Apache Tomcat) we developed the WODSim framework, which is Free Software, licensed under the Apache License, Version 2.0.

The WODSim code depends on SQUIN and Apache Jena.

Test Webs

Each of the following packages contains a materialized version of one of the test Webs used for our study. To simulate such a test Web using WODSim (see above) unpack the package and refer to the obtained directory in the configuration file of the WODSim servlet.