Skip to main content
Diagnostic imaging

Diagnostic imaging

Diagnostic AI algorithm focuses on privacy protection

02 Jun 2021 Tami Freeman
TUM researchers
TUM researchers Rickmer Braren (left) and Daniel Rueckert (right) are exploring the potential of using artificial intelligence for medical image analysis. (Courtesy: Andreas Heddergott/TUM)

Artificial intelligence (AI) techniques are increasingly employed for biomedical data analysis, for applications such as helping clinicians detect cancers in medical images, for example. AI models require large and diverse training datasets, most commonly anonymized or pseudonymized patient data, which are sent to the clinics where the algorithm is being trained. Current anonymization processes, however, provide insufficient protection against re-identification attacks. What’s needed is an improved way to preserve the privacy of sensitive data.

One option is federated learning (FL), a computation technique in which the machine-learning models are distributed to the data owners for decentralized training, rather than centrally aggregating datasets. To truly preserve privacy, however, FL must be augmented by additional privacy-enhancing techniques.

With this aim, a team headed up at Technical University of Munich (TUM) has developed PriMIA (privacy-preserving medical image analysis), an open-source software framework that combines several data-protection processes to provide end-to-end privacy-preserving deep learning on multi-institutional medical imaging data.

“To keep patient data safe, it should never leave the clinic where it is collected,” emphasizes project leader and first author Georgios Kaissis in a press statement. Kaissis and collaborators, also from OpenMined and Imperial College London, publish their findings in Nature Machine Intelligence.

Decentralized training

The team tested PriMIA in a real-life case study in which a deep convolutional neural network (CNN) was employed to classify paediatric chest X-rays as either normal, viral pneumonia or bacterial pneumonia. To train the CNN model, a central server sends the untrained model to three data owners (hospitals). The models are trained in the hospitals using local data, so that the data owners do not have to share their data.

Intermittently during training, secure multi-party computation (SMPC) is used to securely aggregate the network weight updates; and then the updated model is redistributed for another round of training. This SMPC protocol guarantees that the individual models cannot be exposed by other participants and acts as a protection against “stealing” the model. PriMIA also implements differential privacy (DP) to prevent privacy loss of individual patients in the datasets. The training concludes with all participants holding a copy of the fully trained final model.

The researchers examined the computational and classification performance of FL models trained with and without the privacy-enhancing techniques. They compared these against a model trained centrally on the entire pooled dataset (a centralized data sharing scenario) and personalized models trained on individual hospital’s data.

The FL model trained with neither secure aggregation nor DP performed best, demonstrating equivalent classification performance to the centrally trained model. Adding secure aggregation only slightly reduced this performance. Both of these models significantly outperformed two expert radiologists. The DP training procedure significantly reduced the model’s performance, although it still performed similarly to the human observers. The team cites “methods to improve the training of DP models” as a promising direction for future research.

The personalized models showed drastically diminished performance. This highlights the fact that including larger quantities of more diverse training data from multiple sources, enabled through FL, can lead to models with better classification performance.

Privacy attacks

The researchers also evaluated the framework’s resilience to gradient-based model inversion attacks that aim to reconstruct features or entire dataset records (chest radiographs in this case) and threaten patient privacy. Attacks on the centrally trained model could reconstruct similar radiographs to the original. However, attacks against the FL model trained with secure aggregation or DP were unsuccessful and could not reconstruct any usable data.

Data protection

The researchers note that PriMIA is highly adaptable to a variety of medical imaging analyses. To demonstrate this, they present a supplementary case study focused on liver segmentation in abdominal CT scans. They are convinced that the technology, by safeguarding the private sphere of patients, can make an important contribution to the advancement of digital medicine.

“To train good AI algorithms, we need good data,” says Kaissis. “And we can only obtain these data by properly protecting patient privacy,” adds co-author Daniel Rueckert.

Copyright © 2024 by IOP Publishing Ltd and individual contributors