EPSRC Reference: |
EP/V007556/1 |
Title: |
Statistical Network Analysis: Model Selection, Differential Privacy, and Dynamic Structures |
Principal Investigator: |
Yao, Professor Q |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Statistics |
Organisation: |
London School of Economics & Pol Sci |
Scheme: |
Standard Research |
Starts: |
01 January 2021 |
Ends: |
31 December 2023 |
Value (£): |
501,156
|
EPSRC Research Topic Classifications: |
Statistics & Appl. Probability |
|
|
EPSRC Industrial Sector Classifications: |
|
Related Grants: |
|
Panel History: |
|
Summary on Grant Application Form |
In this proposal we tackle some challenging problems in the following three aspects of statistical network analysis.
1. Jittered resampling for selecting network models
The first and arguably the most important step in statistical modelling is to choose an appropriate model for a given data set. While there exist many data-driven model-selection methods in statistics in general, including those based on data reuse (i.e., bootstrap resampling, cross-validation), their application to network data is problematic. Therefore it remains common to choose a network model subjectively. The major difficulty in the reuse of network data is to mimic the underlying probability mechanisms. A few existing attempts include cross-validation under some specific settings. We propose a new `bootstrap jittering' or `jittered resampling' method for selecting an appropriate network model. The method does not impose any specific forms/conditions, therefore providing a generic tool for network model selection.
2. Edge differential privacy for network data
In network data individuals are typically represented by nodes and their inter-relationships are represented by edges. Therefore network data often contain sensitive individual/personal information. On the other hand the information of interest in the data should be perserved. Hence the primary concern for data privacy is two-folded: (a) to release only a sanitized version of the original network data to protect privacy, and (b) the sanitized data should preserves the information of interest such that the analysis based on the sanitized data is still meaningful. This is a vibrant research area now as data privacy becomes ever increasingly sensitive and important with available abundant personal information in digital format in this information age, though the contribution from statistics is still at a preliminary stage. We will adopt the so-called dyadwise randomized response approach. While such a scheme is differentially private, the inference based on the released data is largely unknown. Our initial investigation reveals some attractive features of this approach, suggesting more efficient statistical inference than those based on other data release mechanism. We will further develop this scheme to handle networks with additional node features/attributes (e.g., social networks with additional information on age, gender, hobby, occupation etc).
3. Modelling and forecasting dynamic networks
Most existing statistical inference methods for networks are confined to static network data, though a substantial proportion of real networks are dynamic in nature. Understanding and being able to forecast the changes over time are of immense importance for, e.g., monitoring anomalies in internet traffic networks, predicting demand and setting pricing in electricity supply networks, managing natural resources in environmental readings in sensor networks, and understanding how news and opinion propagates in online social networks. Unfortunately the development of the foundation for dynamic networks is still in its infancy, and the available modelling and inference tools are sparse. As for dealing with dynamic changes of networks, most available techniques are based on the evolution analysis of snapshot networks over time without really modelling the changes dynamically. Although this reflects the fact that most networks change slowly over time, it does not provides any insight on the dynamics underlying the changes and is almost powerless for future prediction for which it is essential to build appropriate stochastic models to capture dynamic dependence and dynamic changes explicitly. Combining recent developments on tensor decomposition and factor-driven dimension reduction with the efficient time series tools such as exponential smoothing and Kalman filters, we will take on this challenge to build some new dynamic models.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
|
Further Information: |
|
Organisation Website: |
http://www.lse.ac.uk |