Ontology Alignment Evaluation Initiative - OAEI-2009 CampaignOAEI

Oriented Matching - Results 2009


This are the preliminary results of the Oriented Matching track. As the evaluation process, especially for the open-eneded track is still ongoing, these results will be updated and extended until the OM workshop at ISWC'09.

Gold-Standard based Evaluation

Authors: George Vouros (georgev ## aegean :: gr), Vassilis Spiliopoulos (vspiliop ## aegean :: gr), University of the Aegean Greece

Datasets Description


The Gold Standard Based Evaluation Task comprises two datasets. For each pair of ontologies in this data set a gold standard has been created containing only strict subsumption mappings. Although some of these subsumption relations can be inferred by using the equivalence mappings, a generic algorithm that computes subsumption relations should not only exploit the equivalence mappings. This is due to the following: (a) In the general case equivalence mappings may not exist. This may for instance happen in cases where ontologies have specifications at different granularity levels, or in cases where ontologies elaborate on non-equivalent elements. (b) If subsumption mappings are directly derived from assessed equivalences, then these cannot be used for filtering wrong equivalences, and vise-versa.

Based on the above, it is very interesting to see whether and how the systems participating in OAEI 2009 exploit equivalences to compute subsumptions between the elements of two different ontologies. This, must be reported in conjunction to the observations made in the open-ended evaluation of the Oriented Matching track.

In the paragraphs that follow we describe the two datasets of the Gold Standard Based Evaluation Task.

The first dataset (dataset 1) has been derived from the benchmarking series of the OAEI 2006 contest and was created for the evaluation of the "Classification-Based Learning of Subsumption Relations” (CSR) method. As a configuration of CSR exploits the properties of concepts (for the cases where properties are used as features), we do not include the OAEI 2006 ontologies whose concepts have no properties. Furthermore, we have excluded from the dataset the OAEI ontologies with no defined subsumption relations among their concepts. This is done because CSR exploits the subsumption relations in the input ontologies to generate training examples. More specifically, all benchmarks (101-304) except 301 to 304, define the second ontology of each pair as an alteration of the same ontology (i.e. the first one, numbered 101). Extensive information concerning the benchmark series is provided in the OAEI 2006 contest site [1].

The second dataset (dataset 2) is composed of 45 pairs of real-world ontologies coming from the Consensus Workshop track [2] of the OAEI contest 2006 (pairs result from all combinations per two). The domain of the ontologies concerns the organization of conferences and they have been developed within the OntoFarm project [3]. Detailed information concerning these ontologies is provided in the Consensus Workshop track site [2].

The gold standard for all datasets has been manually created by knowledge engineers. The major guidelines that were followed for the location of subsumption relations are as follows: (a) use existing equivalence mappings in order to find inferred subsumptions, and (b) understand the “intended meaning” of the concepts (e.g. by inspecting specifications and relevant information attached to them). The format of the gold standard is the same with the one used in the benchmark series of the OAEI competition (more information is provided at http://people.kmi.open.ac.uk/marta/oaei09/orientedMatching.html).

Participants


Three systems gave us subsumption mappings for the first of the previously presented datasets. Specifically, ASMOV, RiMoM and TaxoMap gave results for the benchmarks dataset. We present these results by also presenting the results achieved by CSR (considering this as the fourth participant in this task), presenting also the results of CSR for the second (Consensus) dataset.

Results

Table 1. Results of all systems when applied to dataset 1

System

CSR

ASMOV

RiMoM

TaxoMap

Test

Prec.

Rec.

F-m.

Prec.

Rec.

F-m.

Prec.

Rec.

F-m.

Prec.

Rec.

F-m.

1xx

0.97

0.97

0.97

1.0

1.0

1.0

1.0

1.0

1.0

NaN

0

NaN

2xx

0.84

0.78

0.80

0.94

0.94

0.94

0.67

0.85

0.69

0.84

0.08

0.25

3xx

0.66

0.72

0.69

0.86

0.60

0.83

0.59

0.81

0.64

0.72

0.11

0.17

Average

0.83

0.79

0.80

0.94

0.90

0.93

0.69

0.86

0.71

0.63

0.07

0.23

Table 1 presents the average precision, recall and f-measure values, of each participating system in all tests (Average) and separately in each test category (e.g. 1xx). We observe that in terms of f-measure ASMOV achieves the best results, followed by CSR, RiMoM and then by TaxoMap. Also, we observe that although CSR has a higher precision than RiMoM, RiMoM has a higher recall. As already commented, it would be interesting to see if and how the systems exploit equivalences (known or inferred) in order to locate subsumption ones. At this time we can report only for CSR, which does not exploit equivalence mappings.

Table 2. Results of CSR when applied to dataset 2

 

Precision

Recall

Iasted-Cmt

0.37

0.67

Cmt-confOf

0.35

0.83

Cmt-Confious

0.25

0.33

confOf-Confious

0.14

0.18

crs_dr-Confious

0.00

0.00

Iasted-Confious

0.08

0.28

OpenConf-Confious

0.19

0.38

Pcs-Confious

0.25

0.38

Cmt-crs_dr

0.33

0.36

confOf-crs_dr

0.00

0.00

confOf-Iasted

0.48

0.41

crs_dr-Iasted

0.18

0.39

OpenConf-Iasted

0.14

0.38

PcsIasted

0.22

0.38

Cmt-OpenConf

0.43

0.43

confOf-OpenConf

0.32

0.38

crs_dr-OpenConf

0.12

0.21

Cmt-Pcs

0.50

0.79

confOf-Pcs

0.10

0.12

crs_dr-Pcs

0.34

0.69

OpenConf-Pcs

0.09

0.21

Cmt-Sigkdd

0.54

0.75

confOf-Sigkdd

0.08

0.31

Confious-Sigkdd

0.29

0.38

crs_dr-Sigkdd

0.00

0.00

Iasted-Sigkdd

0.15

0.82

OpenConf-Sigkdd

0.33

0.39

Pcs-Sigkdd

0.24

0.43

Cmt-Conference

0.26

0.11

confOf-Conference

0.47

0.29

Confious-Conference

0.16

0.33

crs_dr-Conference

0.33

0.40

Iasted-Conference

0.25

0.10

OpenConf-Conference

0.11

0.10

Pcs-Conference

0.11

0.10

Sigkdd-Conference

0.22

0.18

Cmt-ekaw

0.56

0.70

confOf-ekaw

0.57

0.72

Confious-ekaw

0.37

0.67

crs_dr-ekaw

0.26

0.20

Iasted-ekaw

0.52

0.48

OpenConf-ekaw

0.31

0.28

Pcs-ekaw

0.37

0.67

Sigkdd-ekaw

0.68

0.76

Conference-ekaw

0.53

0.63

Average

0.28

0.39

  

Concerning dataset 2 table 2 depicts the precision and recall values for each pair of ontologies in the dataset provided by CSR. An observation is that the performance of CSR is worst in this dataset, in comparison to the first dataset.

References

[1] Ontology Alignment Evaluation Initiative 2006, http://oaei.ontologymatching.org/2006/.
[2] Consensus Workshop Track, OAEI 2006, http://oaei.ontologymatching.org/2006/conference/.
[3] O. Svab, V. Svatek, P. Berka, D. Rak, P. Tomasek, OntoFarm: Towards an Experimental Collection of Parallel Ontologies, Poster Track of ISWC, Galway, Ireland (2005).

Open-ended Evaluation

The goal of this track is to get a better understanding of the types of semantic mappings that current systems can return as well as to experiment with various evaluation measures that go beyond Gold Standard based evaluations.

Participants

The following participants provided non-equivalence relations for a set of tracks:

The four systems that participated to this track derived subsumption type mappings. None of the systems derived other types of mapping relations. Also, the focus of the participants was mainly on the Benchmark-subs dataset, which is a subset of the Benchmarks dataset used for the Gold Standard based evaluation. A notable difference is the case of ASMOV which provided some results for the Conference dataset, and TaxoMap which additionally covered the Benchmarks, Anatomy and Directory datsets as well. 

Results

Under construction.