Note: experiments here refer to the paper “Learning to Assess Linked Data Relationships Using Genetic Programming”, currently submitted to the 15th International Semantic Web Conference (ISWC2016).
- Pathfinding techniques to identify entity relationships in Linked Data use informed search methods, i.e. they pre-compute a portion of the graph
- The Linked Data principle of incremental and serendipitous exploration of the graph through link traversal is violated!
- We can use the uninformed (blind) searches, which explore the graph on-the-fly without expensive pre-computations or the introduction of a priori knowledge
- Blind searches require a suitable cost-function to prune the search space
- We want to find such cost-function, i.e. a measure to identify which paths represent the strongest relationships between two entities when performing a Linked Data blind search.
- Identify the Linked Data structural information (i.e. topological and semantic features) that we need to such cost-function.
- Learning this cost-function empirically (through Genetic Programming), instead of defining it manually
Browse and download data
Linked Data relationships (paths), scored by 8 judges (CSV format). Download
The paths are created by selecting random entities extracted from Geonames , Yago, VIAF, MusicBrainz, LMDB, and the UNESCO dataset. Download
Genetic Programming runs (see paper for details)
Each run includes
- the last evolved population, with its weighted fitness on the training set
- the top 10 cost-functions, and their weighted and unweighted fitness on both the training and test set
- the unweighted fitness on the full dataset
T-block (5 runs using Linked Data topological features) Download
N-block (5 runs using Linked Data topological features and namespace variety) Download
S-block (5 runs using Linked Data topological and semantic features) Download
Code available upon request
Comparison with the literature