Sources of semi-structured, overlapping, and semantically-related data on the Web are currently proliferating at a phenomenal rate, which has created a demand for more powerful and flexible information systems (ISs). This new generation of ISs will need to integrate incomplete and semi-structured information from heterogeneous sources, employ rich and flexible schemas, and answer queries by taking into account both knowledge and data.Ontology-based data access has recently been proposed as an architectural principle for such systems. The main idea is to develop a unified view of the data by describing the relevant domain in an ontology, which then provides the vocabulary used to ask queries. The IS can use ontological statements, such as the concept hierarchy, to derive new facts and thus enrich query answers with implicit knowledge. This idea has been incorporated into systems such as QuOnto, Owlgres, ROWLKit, and REQUIEM, and ontology reasoners such as RACER, FaCT++, Pellet, and HermiT.Such systems suffer from two main problems. First, the modelling capabilities of ontology languages are often insufficient for practical use cases. In order to achieve favourable computational properties, ontology languages are usually capable of describing only tree-shaped relationships; furthermore, (with some notable exceptions) they usually support only unary and binary predicates. Finally, ontology languages typically employ the open world assumption; however, when answering queries over large amounts of data, the closed world assumption (CWA) is often more appropriate.Second, query answering facilities in existing ontology-based ISs typically do not scale to data sets commonly encountered in practice. Up to now, approaches to addressing this problem have focused on reducing the expressivity of the ontology language even further in order to obtain formal tractability guarantees. This obviously exacerbates the first problem (restricted modelling capabilities), while not necessarily delivering robust scalability in practice.Database theory and practice can provide partial solutions to these problems. In databases, complex domains can be described using dependencies. Dependencies are used in a number of different ways: they are often used as integrity constraints--checks that verify whether a database instance includes all data specified in the domain description; however, dependencies can also be used similarly to ontologies to derive implicit knowledge. Treating dependencies as integrity constraints and answering queries under CWA has allowed practical relational database management systems (RDBMSs) to scale to very large data sets.Database techniques alone do not, however, satisfy all the requirements for an ontology-based IS. In particular, dependencies often cannot model arbitrarily large structures and thus do not cover all practical modelling use cases. Furthermore, generalising the query answering techniques used in practical RDBMSs to the case where information deriving dependencies must be taken into account is still an open problem.We therefore believe that the next generation of ontology-based ISs should be based on a synthesis and an extension of ontology and database systems and techniques, providing data handling capabilities similar to current RDBMSs, but with schemas that are rich, flexible, and tightly integrated with the data. In order to achieve this ambitions goal, however, a number of challenging fundamental problems must be solved. First, ontology and dependency languages need to be unified in a coherent theoretical framework. Second, it will be necessary to identify fragments of the framework that are likely to exhibit robust scalability but can still support realistic use cases. Third, it will be necessary to devise effective algorithmic techniques that can form the basis of practical ISs.
|