EPSRC Reference: |
EP/G012407/1 |
Title: |
Structural Comparison of Labelled Graph Data |
Principal Investigator: |
Connor, Dr R |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Computer and Information Sciences |
Organisation: |
University of Strathclyde |
Scheme: |
Standard Research |
Starts: |
01 October 2009 |
Ends: |
30 September 2012 |
Value (£): |
81,067
|
EPSRC Research Topic Classifications: |
Information & Knowledge Mgmt |
|
|
EPSRC Industrial Sector Classifications: |
No relevance to Underpinning Sectors |
|
|
Related Grants: |
|
Panel History: |
Panel Date | Panel Name | Outcome |
08 Sep 2008
|
ICT Prioritisation Panel (September 08)
|
Announced
|
|
Summary on Grant Application Form |
Semistructured data is an important data format, essentially embodied by the XML standard. It is increasingly used in many critical applications, especially in business-to-business communications and peer-to-peer traffic on the Internet.The main advantage of the semistructured format is that it increases the flexibility of the way the data may be structured: as new situations arise, the structure of the data may evolve as well as the values.For example, given an established stream of business messages about car insurance between multiple brokers and underwriters, one underwriter may decide that the colour of a car is a significant factor not currently included in the data being supplied. They can advertise this fact, and brokers may choose to start asking their clients for the colour of their cars. From this point onwards, a field may start to appear in the messages between brokers and underwriters, the field being optionally included by brokers and optionally acted upon by underwriters where it is present.When much of this kind of activity takes place, it becomes important to consider the structural attributes of the whole, potentially large, pool of data, as well as the individual items. For example, given a set of data items: do they have anything much in common with each other?; do they all have at least something in common, and if so what?; how different is one given item from another, and are there any others exactly the same as this one?; can one or more clusters of similarly or identically-structured items be identified within the pool?; is the pool of data itself evolving or becoming quiescent, that is, over time, are individual items becoming, on the whole, more different or more similar?Recent work we have done on the inherent complexity, and thus regularity, of semistructured data items has led us to an observation that we believe will give great insights into how to answer these and other similar questions. Using some long-established results from Information Theory, we have applied the concept of mechanical entropy to the domain of semistructured data to give a metric for the complexity of individual data items. We have also discovered an efficient way of calculating this, by use of a data structure, the structural fingerprint, which represents the essential structure of the item. We now believe that the reapplication of this work into the above context will give a great leverage in terms of producing useful, quantified answers to the above questions and others, while the use of the structural fingerprint will make it computationally feasible to perform these calculations upon large pools of semistructured data in the global domain of the Internet.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
|
Further Information: |
|
Organisation Website: |
http://www.strath.ac.uk |