Decisions in industry, government and health care increasingly depend on improved access to and processing of digital information. This has led to a pressing demand for more powerful and flexible information systems. New generation information systems will need to efficiently process large data sets, exploit machine-readable domain knowledge, and answer queries by taking into account both knowledge and data.
Ontology-based information systems (OISs) constitute a rapidly maturing technology with the potential to meet these requirements. An ontology provides a vocabulary of terms that are familiar to the user, together with axioms describing the meaning of those terms. OISs can exploit the rich domain knowledge in an ontology to provide a unified view of the data and enrich query answers with implicit information using an automated reasoner.
Several standards for ontology and query languages have been developed, including RDF, OWL, OWL 2, and SPARQL. OWL and its revision OWL 2 provide a powerful and flexible ontology modelling language that can capture features such as class hierarchies, incomplete information, negative information, and so on. OWL ontologies are being used in an increasing range of applications, and are becoming a core technology for accessing, gathering, and sharing knowledge and data.
Applications involving large amounts of data, however, still pose serious challenges to the applicability of OISs. Problems in the applicability of OISs typically originate from conflicting application requirements.
- Modelling complex application domains requires rich ontology languages.
- Fine-grained access to information requires powerful query languages.
- Answering queries over large data sets requires scalable reasoners.
- Critical decisions that depend on access to information require query answers that are either complete, or where the incompleteness is well-understood.
Due to high worst-case complexity of the relevant reasoning problems, scalability is usually in conflict with the use of powerful ontology and query languages, and many applications give up completeness to achieve the desired scalability. As a result, existing OISs fail to meet one or more of these requirements: they support only weak ontology or query languages, they do not scale to the required volumes of data, or they do not provide guarantees as to the completeness of query answers.
Our goal in this project is to lay the foundations for a new generation of OISs that meet all the aforerementioned requirements, thus providing the ideal combination of expressive power, scalability and completeness.
To accomplish such an ambitious goal, we observe that the limitations imposed by the trade-offs between expressivity, scalability and completeness apply at the language level: that is, they involve worst-case complexity bounds for every ontology, query, and data set expressed in given ontology, query and data modeling languages. The class of ontologies, queries and data sets that are relevant to a particular application is, however, much more restricted. For example, although application data is often unknown or frequently changing, the ontology itself is fixed at design time, or changes infrequently. As a result, a reasoner known to be incomplete in general for given query and ontology languages might yield the same results as a complete reasoner for the application at hand. Identifying such cases is challenging, but it would have tremendous added value: applications could exploit scalable incomplete reasoners while still enjoying completeness guarantees, thus achieving 'the best of both worlds'.
We believe that our main goal can be accomplished by designing OISs that are optimised for the ontologies, queries and data sets relevant to the application at hand. Such OISs would maximise scalability while ensuring completeness of query answers, even for rich ontologies, large-scale data sets, and complex user queries.
|