Learning to assess Linked Data Relationships

Note: experiments here refer to the paper “Learning to Assess Linked Data Relationships Using Genetic Programming”, currently submitted to the 15th International Semantic Web Conference (ISWC2016).


CONTEXT

  • Pathfinding techniques to identify entity relationships in Linked Data use informed search methods, i.e. they pre-compute a portion of the graph
  • The Linked Data principle of incremental and serendipitous exploration of the graph through link traversal is violated!
  • We can use the uninformed (blind) searches, which explore the graph on-the-fly without expensive pre-computations or the introduction of a priori knowledge

PROBLEM

  • Blind searches require a suitable cost-function to prune the search space
  • We want to find such cost-function, i.e. a measure to identify which paths represent the strongest relationships between two entities when performing a Linked Data blind search.

CHALLENGE

  • Identify  the Linked Data structural information (i.e. topological and semantic features)  that we need to such cost-function.
  • Learning this cost-function empirically (through Genetic Programming), instead of defining it manually

Browse and download data

Datasets

Linked Data relationships (paths), scored by 8 judges (CSV format). Download

The paths are created by selecting random entities extracted from Geonames , Yago, VIAF, MusicBrainz, LMDB, and the UNESCO dataset. Download

Genetic Programming runs (see paper for details)

Each run includes

  1. the last evolved population, with its weighted fitness on the training set
  2. the top 10 cost-functions, and their weighted and unweighted fitness on both the training and test set
  3. the unweighted fitness on the full dataset

T-block (5 runs using Linked Data topological features) Download

N-block (5 runs using Linked Data topological features and namespace variety) Download

S-block (5 runs using Linked Data topological and semantic features) Download

Code available upon request

Comparison with the literature

T-, N-, S-functions compared to measures found in the literature (Normalised Google Distance for each example in the dataset) Download