EPSRC logo

Details of Grant 

EPSRC Reference: EP/N027426/1
Title: ProvTemp: Provenance templates as a method for facilitating provenance capture and simulating provenance data
Principal Investigator: Curcin, Dr V
Other Investigators:
Researcher Co-Investigators:
Project Partners:
Imperial College London
Department: Health and Social Care Research
Organisation: Kings College London
Scheme: First Grant - Revised 2009
Starts: 01 June 2016 Ends: 30 November 2017 Value (£): 100,892
EPSRC Research Topic Classifications:
Information & Knowledge Mgmt
EPSRC Industrial Sector Classifications:
Healthcare Information Technologies
Related Grants:
Panel History:
Panel DatePanel NameOutcome
15 Mar 2016 EPSRC ICT Prioritisation Panel - Mar 2016 Announced
Summary on Grant Application Form
Our world is increasingly driven by data. Medical, economic and political decisions are made based on the results of automatically analysing ever-growing volumes of data. Whether these are patient treatment decisions or stock trading recommendations, if we are to trust the decisions being made, we need to have insight into the workings of these systems and achieve understanding of their outputs - referred to as their provenance.

Related to the issue of trust is the concept of reproducibility in scientific discovery, as the ultimate test of findings' validity. Science is now all but impossible without data-intensive infrastructures, but these changes make research harder to verify and follow using traditional "pen-and-paper" methods, and new techniques are required to ensure correctness. A number of recent studies looked into published research in certain areas, only to find that a minority could be reproduced using the information provided. Understanding the provenance of the data and processes that we are relying on has never been more critical.

Data provenance is a research field dedicated to standardised, uniform, representation of the network of data products, tasks that create and use those data, and the human and software actors who perform these tasks - typically represented as provenance graphs. Popular in "computational" disciplines that have long relied on scientific software, provenance is now becoming relevant and necessary to areas which have only recently become data-driven and which operate using multiple disjointed software tools.

In order to facilitate the adoption of provenance in these disciplines, ProvTemp project is modeling provenance templates - the provenance graph fragments that multiple software tools can compose into a unified, meaningful trace of conducted research. A set of templates is defined by the scientists, describing the research details that need to be captured, and these are then translated into concrete provenance data. This theoretical work has two immediate applications. The first is a method for introducing provenance into scientific environments by integrating with existing software tools, minimising the effort needed for the developers of those tools to start capturing provenance. Second is a mechanism for using the templates to simulate realistic provenance data that would be produced from those templates, allowing them to be tested to ensure they are sufficiently informative for the intended purpose, e.g. publishing details of research task, providing legally required audit trail etc.

The ProvTemp approach shall be evaluated on the example of modelling a clinical trial. The medical research community is a typical example of a non-computational discipline becoming increasingly data-driven, and it is currently moving towards big data enabled, intelligent infrastructures through use of data routinely captured in Electronic Health Record systems. The trend in medical research is towards Learning Health Systems, which seek to maximise and optimise the use and benefit of EHR data in clinical research and practice. The EU TRANSFoRm project, implemented a prototype software infrastructure for the Learning Health System, and conducted an international clinical trial, driven by EHR data. ProvTemp approach will replicate the trial execution using provenance templates, and examine the produced provenance data to ensure our method is valid and applicable to future clinical trials.

In addition to the clinical trial work, we shall work closely with UK's Software Sustainability Institute which promotes sustainable software technologies. SSI shall assist in ensuring that ProvTemp is generalisable and relevant to other scientific disciplines. We shall also engage the public in defining the wider questions around reproducibility and quality of research. Finally, ProvTemp will produce a roadmap for further research, taking stock of the work done and identifying future opportunities.

Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: