In my previous post I introduced USDL, the W3C Incubator Group associated to it and I mentioned that within it, I was going to work on devising a USDL version that would be ready for the application of Linked Data principles, the interlinking of USDL data with that hosted in other Linked Data repositories, etc. This is the second post of this series in which I start reporting the work I’m undertaking. Indeed the work at this stage is essentially exploratory and cannot be considered in anyway complete or thorough but I hope it will still be of interest. Obviously any comments are more than welcome…
USDL is composed of a number of modules, see the specification Web page. In particular, USDL 3.0 milestone 4, which is the version I will be using, contains a foundation module, and 7 other modules, namely, Service, Participants, Functional, Pricing, Interaction, Legal and Service Level. Creating the RDF version of USDL requires capturing the information covered by these modules. Therefore one would need to proceed module by module. Indeed I will focus more on the aspects I am more familiar with like the functional aspects.
The starting point is the foundation module which provides the basis for the other ones. In a nutshell this module covers aspects such as Time, Locations, Resources and Agents. Additionally it includes some other elements that are necessary for capturing all the information USDL requires like, for instance, units of measure. As a first step I herein go quickly through the main Elements in this module identifying vocabularies that could be reused. Through vocabulary reuse we expect to simplify the gathering/modeling of USDL information, to enrich USDL models with externally provided Linked Data, and to promote its uptake by simplifying its interpretation thanks to the reuse of known properties and concepts. This analysis shall eventually help us devise the RDF version of the module. In the remainder of this post I will cover this module structuring the analysis according to the main topics handled starting first with general concerns.
IdentifiableElement is the super type of all elements in USDL. This notion is directly catered for by RDF where every node (except literals and blank nodes) has its own unique URI. We therefore would not need to take any action to capture it.
ElementDescription is used almost everywhere to capture aspects such as the name, the description, the concept and the language of the element being characterised. Most of these aspects are directly supported by RDF(S) using for instance rdfs:label (one could also use in addition skos:altLabel and skos:prefLabel ), rdfs:comment, dcterms:description from Dublin Core and the internationalisation support in RDF (e.g., @en). The term concept is in USDL, I believe, used for pointing to concepts in external ontologies, which is precisely what sawsdl:modelReference is for, see SAWSDL. It would, however, also be possible to devise as many rdfs:Class as necessary, possibly as part of domain-specific vertical extensions of USDL.
ElementDescription also uses keywords. Essentially the role of this attribute is to support tagging elements. Although tagging as opposed to annotating using existing ontologies is somewhat open ended (i.e., you can use any term you may think of), there has been quite some work on creating vocabularies for expressing that a certain resource has been tagged (not for fixing the actual tags that can be used). I won’t go in detail through them, the interested reader is referred to  and Common Tag for some of the main options. Choosing one vocabulary over the rest, at the moment, is somewhat a matter of taste since there doesn’t seem to be a de-facto standard. For our endeavour having a simple tagging mechanism should do the job. Thus, Newman’s Tag Ontology, and Common Tag seem like good options. The latter, however, is supported by some companies like Yahoo! which may make more sense for USDL for its potential impact commercially.
Artifact and Artifact Types
Artifact is a generic concept that allows to point to additional metadata of different kinds including mimeType, URI, copyright, etc. Dublin Core does precisely deal with this. Notably it includes among others dcterms:type, dcterms:FileFormat, dcterms:identifier, dcterms:rights, dcterms:creator, etc. Artifacts have ArtifactTypes which could be modelled using 2 main approaches. We could use SKOS to define knowledge organisation systems, or we could use formal ontologies. In the latter case we may want to use a similar approach to that used in WSMO-Lite for Functional Classifications and define a concept ArtifactClassificationRoot that any artifact classification should instantiate with its root concept. This would allow people to devise a variety of Artifact taxonomies based on the requirements within their domain. If this level of flexibility is not required (which may well be the case for Artifact Types) it may be more appropriate to simply embed this hierarchy of types within USDL directly. USDL includes a set of predefined types which we should reuse and capture in both approaches.
DependencyTarget is the super type for all the elements that can be the target of a dependency. It seems that this will not require specific treatment and would possibly be best approached as a property. Should we need to, we can always add a concept although ontologically this is probably closer to the notion of role that could be played by a given element in a certain situation.
Resource and Resource Types
Resource is a generic concept to represent classes of real-world objects. Again this notion seems to be used in USDL as if it were a role. We would need to dig into other modules to see how it is used and identify the best way to capture this. Among its properties though we identify already a number of attributes which we have dealt with perviously such as name, type and descriptions. Again USDL points here to the notion of classifications which could be approached in the same way we approached ArtifactType. Concerning the ResourceTypes, USDL defines SoftwareResource, HumanResource, and PhysicalResource as the possible kinds. These concepts are already defined in other vocabularies and could directly be reused. For instance dctypes:Software, dctypes:PhysicalObject, and foaf:Person, see FOAF.
USDL includes quite a few concepts for handling time including Time (an abstract class), TimeInstant, TimeInterval, AbsolutePointInTime, RelativePointInTime, etc. Time representation and reasoning has been one of those topics that have been addressed quite often by researchers. Indeed, Semantic Web researchers encountered this issue and there has already been quite some work on representing time. In particular, perhaps the most popular for Linked Data is OWL Time which is hosted by W3C.
Time Ontology defines temporal entities and temporal relationships based on James Allen’s interval temporal algebra . It therefore identifies Instants, defines Intervals on the basis of beginning and end Instants and includes the typical temporal relationships between Instants and between Intervals, see Image (note that the names used in the image are not exactly those used within the ontology since I’ve reused an older image of mine for my own Time Ontology in OCML but this should serve as illustration nonetheless.
The notion of duration is Time Ontology is defined explicitly as composed out of a number of values for different units of time. Intervals can have multiple duration descriptions (e.g., 2 minutes and 120 seconds are different duration descriptions) but only have one duration. Relative to durations, USDL provides the notion of DurationInterval which allows to express things like “during the next 3 years starting from today“. This notion is directly supported by OWL Time. Finally, OWL Time imports the Timezone ontology which covers the notion of timezone and provides some basic geographical modelling capabilities concerning regions, etc.
On the basis of OWL Time, temporal manipulation would be as expressive as it currently is and it would additionally support implementing Allen’s interval temporal algebra easily for reasoning about intervals and instants.
There still remain, though, some minor issues which would require further attention:
- The most fine-grained value in OWL Time are seconds which may not be enough for automated settings
- RecurrentTime is not supported
- TimePattern is not supported. This notion is for the time being underspecified in USDL and we would therefore disregard it for the time being.
- RelativePointInTime is not directly supported and would need to be covered
One core aspect of USDL Foundation module concerns location related entities and relationships. In particular USDL includes the super type for all location related entities, namely Location, and the elements PhysicalLocation, GeographicalPoint, PhysicalAddress, AdministrativeArea & AdministrativeAreaType, Polygon, Area, VirtualAddress & VirtualAddressType, MessagingAddress and VirtualRegion.
Currently, perhaps the most reused vocabulary for geographic-related aspects is W3C WGS84, which allows capturing GeographicalPoints on the basis of their latitude, longitude and altitude.
In addition to this vocabulary, there is a comprehensive suite of vocabularies devised by Ordnance Survey as part of the Data.gov.uk initiative for the public release of a large quantity of governmental data in the UK. Notably, they have devised:
- the Spatial Relations vocabulary for basic spatial relationships (e.g., contains, touches);
- the Administrative geography and civil voting area ontology covering administrative divisions in the UK (e.g., region, county, etc);
- the Geometry ontology for describing abstract geometries (this one seems rather limited);
- the Postcode ontology which covers the modelling of UK postcodes including aspects such as Postcode Area, Postcode District, etc
Although some of these vocabularies are somewhat specific to the UK, they could be reused, extended or adapted to deal with other countries. However, the level of detail to be captured should in as much as possible be minimised to avoid highly detailed cross-country definitions which may not be that beneficial for the adoption of USDL.
In addition to this effort, the W3C Geo Incubator Group did also devote some effort which is worth highlighting here. This group produced a study of existing approaches as well as a simple and reusable vocabulary. This vocabulary is perhaps a better option for capturing the basic geometry than the one from Ordnance Survey.
Finally, it is worth noting the ontology produced by Geonames.org. This ontology contains some of the relations previously indicated but perhaps the most relevant aspect is that it comes together with a large knowledge base of locations and services for accessing this data. This is also used as source of geographical information by large knowledge bases like Yago. Other relevant sources of information are DBPedia and Freebase. Given that these would mostly be used at the instance level to create links or reuse data, we shall leave these aspects for ulterior steps once the vocabularies have been defined (indeed, bearing in mind the solutions adopted by these Web sites can help us simplify the linking at a later stage).
The vocabularies listed here could cover to a great extent the location specific requirements for the module, e.g., PhysicalLocation, AdministrativeArea, Polygon, GeographicalPoint. We still require supporting the notion of Address and Virtual Address which we cover in more detail next.
Update: I recently encountered additional information at the GeoLinkedData.es portal. In particular, you can find here information which is most relevant. Notably, the FAO ontologies and services and the GML OWL ontology seem to be good candidates as well. The server hosting the GML ontology is down at the moment, though.
In addition to generic and administrative geographic aspects, USDL provides support for capturing addresses, both physical (i.e., postal addresses), and virtual ones (e.g., email). This notion has largely been addressed by the vCard vocabulary recently submitted to W3C. Although the vocabulary does not explicitly distinguish between physical and virtual addresses, it does include the main communication means (e.g., telephone, email, postal), and if necessary we could easily include this distinction. Although quite detailed, this vocabulary does not cover some of the types such as URL, and IPv4-v6, nor does it cover the notion of VirtualRegions (e.g., URI Templates). These aspects should be added to the vocabulary, and the notion of URI Template could possibly reused from the hRESTS extension to the Minimal Service Model. I believe, though, that URI Templates would need to be better specified.
Agent represents all the entities that can take active part in the provisioning of a service. This term appears in a number of vocabularies, notably dcterms:Agent, and foaf:Agent to name the main ones. The notion of Agent does indeed also concern organisations which are covered in other vocabularies, one of which is GoodRelations, e.g., gr:BusinessEntity. Agents may be classified (see previous cases on how to approach this), they may have Certifications. There are approaches for supporting the modelling of quality certifications such as ISO 9000, see for instance  part of the work on TOVE. I believe, however, that this would add unnecessary complexity at this level. Perhaps within other modules this becomes more relevant and we may need to revisit this issue.
Handling of Units
Finally, USDL includes some support for handling units. The support included is, however, limited to basic units and no knowledge is explicitly captured about the relationships between different units for the same dimension, between different dimensions, and the implications from a processing perspective. Indeed, this limits to a significant extent what the information about units can be used for and/or would require the use of implicit knowledge within the systems. Currently, there are no established vocabularies for handling units. GoodRelations includes a pretty limited support through the notion of gr:QuantitativeValue and gr:UnitPriceSpecification. Additionally there are other approaches like the work in progress on the Units of Measure ontology started within the Ontolog forum, the work on EngMath by Tom Gruber , or even my own work derived from EngMath presented briefly in  which covers the notions of physical quantities, international system of units, prefixes, dimension, and additional machinery for automating the manipulation of quantities taking into account their dimensions and units. On the basis of the approaches mentioned above, the handling of units within USDL could in fact be more advanced than currently supported.
The different parts exposed above cover the foundation module of USDL. The analysis carried out shall serve as a good basis for modeling this module in RDF. I’m deferring this activity until I go through some of the other core modules since there are aspects that I’m not clear how they are dealt with in USDL yet, and the other modules may well clarify this. Nonetheless, I’m glad to see that, apparently, with a bit of care most of this module could be captured reusing vocabularies and the result would enable even more advanced processing than that currently enabled by USDL with little (if any?) additional complexity.
More posts covering other parts of USDL coming up soon…
 H. L. Kim et al. Review and Alignment of Tag Ontologies for Semantically-Linked Data in Collaborative Tagging Spaces. Semantic Computing, 2008 IEEE International Conference on (2008) pp. 315 – 322
 J. Allen. Maintaining knowledge about temporal intervals. Communications of the ACM (1983) vol. 26 (11) pp. 832-843
 H. M. Kim and M. S. Fox. Using enterprise reference models for automated ISO 9000 compliance evaluation. In Proceedings of the 35th Hawaii International Conference on Systems Science, 2002.
 T. R. Gruber and G. R. Olsen. An Ontology for Engineering Mathematics. In J. Doyle, P. Torasso, and E. Sandewall, editors, Fourth International Conference on Principles of Knowledge Representation and Reasoning, pages 258–269, Bonn, Germany, 1994. Morgan Kaufmann.
 C. Pedrinaci and J. Domingue. Ontology-based metrics computation for business process analysis. In: 4th International Workshop on Semantic Business Process Management (SBPM 2009), Workshop at ESWC 2009, 1 June 2009, Crete, Greece. (2009)