EPSRC Reference: |
EP/T022124/1 |
Title: |
QUINTON -- QUerying and INTegrating Over Nested data |
Principal Investigator: |
Benedikt, Professor M |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Computer Science |
Organisation: |
University of Oxford |
Scheme: |
Standard Research |
Starts: |
01 January 2021 |
Ends: |
30 June 2024 |
Value (£): |
1,039,798
|
EPSRC Research Topic Classifications: |
Bioinformatics |
Information & Knowledge Mgmt |
|
EPSRC Industrial Sector Classifications: |
Pharmaceuticals and Biotechnology |
|
|
Related Grants: |
|
Panel History: |
Panel Date | Panel Name | Outcome |
03 Mar 2020
|
EPSRC ICT Prioritisation Panel March 2020
|
Announced
|
|
Summary on Grant Application Form |
It has long been recognized that nested data models -- in which information is modelled as collections of tuples whose attributes may in turn take values that are collections -- are the most natural modelling formalism for a wide variety of information management scenarios. Query languages that support nested data have been developed decades ago. But even as emerging applications have made the need for querying of nested data more crucial, and even as many of the most important big data management frameworks assume programmatic interfaces based on nested data, processing large-scale nested data remains extremely cumbersome, radically more so than in the case of flat data. Our research hypothesis is that fundamental problems in querying and integrating nested data need to be resolved for this situation to change.
This project will provide new foundations for both querying and integration nested data. On the side of querying we will establish a standard processing pipeline for queries over nested data. This will include a foundational study of the basic transformations involved in any such pipeline, such as the "shredding" of nested queries into relational queries. It will also include the development of algorithms and tools that implement this pipeline, working on top of scalable infrastructure for flat data, such as the Apache Spark project. On the side of integration, we will establish the foundations of specifying and querying virtual data sources consisting of nested data, and develop middleware that can implement queries over virtual data on top of heterogenous nested data sources.
The impact of QUINTON is both practical and foundational. We will build infrastructure for querying and integration, but we also investigate the fundamental problems of scalable querying over materialized and virtual datasources, providing the foundations that can guide the research community in future implementations. We will also drill down into a particular compelling and timely application of nested data integration and management, working with an industrial partner to build components and novel analyses in the area of management for biomedical data. Our partner deals with unified interfaces to diverse biomedical datasources -- clinical, imaging, and genomic data -- and their usecases are a perfect fit for the technology we are developing.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
|
Further Information: |
|
Organisation Website: |
http://www.ox.ac.uk |