There is a silent but steady revolution happening in all sectors of the economy, from agriculture through manufacturing to services. In virtually all activities in these sectors, processes are being constantly monitored and improved via data collection and analysis. While there has been tremendous progress in data collection through a panoply of new sensor technologies, data analysis has revealed to be a much more challenging task. Indeed, in many situations, the data generated by sensors often comes in quantities so large that most of it ends up being discarded. Also, many times, sensors collect different types of data about the same phenomenon, the so-called multimodal data. However, it is hard to determine how the different types of data relate to each other or, in particular, what one sensing modality tells about another sensing modality.
In this project, we address the challenge of making sensing of multimodal data, that is, data that refers to the same phenomenon, but reveals different aspects from it and is usually presented in different formats. For example, several modalities can be used to diagnose cancer, including blood tests, imaging technologies like magnetic resonance (MR) and computed tomography (CT), genetic data, and family history information. Each of these modalities is typically insufficient to perform an accurate diagnosis but, when considered together, they usually lead to an undeniable conclusion.
Our departing point is the realization that different sensing modalities have different costs, where "cost" can be financial, refer to safety or societal issues, or both. For instance, in the above example of cancer diagnosis, CT imaging involves exposing patients to X-ray radiation which, ironically, can provoke cancer. MR imaging, on the other hand, exposes patients to strong magnetics fields, a procedure that is generally safe. A pertinent question is then whether we can perform both MR and CT imaging, but use a lower dose of radiation in CT (obtaining a poor-resolution CT) and, afterward, improve the resolution of CT by leveraging information from MR. This, of course, requires learning what type of information can be transferred between different modalities. Another example scenario is autonomous driving, in which sensors like radar, LiDAR, or infrared cameras, although much more expensive than conventional cameras, collect information that is critical to driving in safe conditions. In this case, is it possible to use cheaper, lower-resolution sensors and enhance them with information from conventional cameras? These examples also demonstrate that many of the scenarios in which we collect multimodal data also have robustness requirements, namely, precision of diagnosis in cancer detection and safety in autonomous driving.
Our goal is then to develop data processing algorithms that effectively capture common information across multimodal data, leverage these structures to improve reconstruction, prediction, or classification of the costlier (or all) modalities, and are verifiable and robust. We do this by combining learning-based approaches with model-based approaches. Over the last years, learning-based approaches, namely deep learning methods, have reached unprecedented performance, and work by extracting information from large datasets. Unfortunately, they are vulnerable to so-called generalization errors, which occur when the data to which they are applied differs significantly from the data used in the learning process. On the other hand, model-based methods tend to be more robust, but have poorer performance in general. The approaches we propose to explore use learning-based techniques to determine correspondences across modalities, extracting relevant common information, and integrate that common information into model-based schemes. Their ultimate goal is to compensate cost and quality imbalances across the modalities while, at the same time, providing robustness and verifiability.
|