
Artificial intelligence (AI) holds great potential for a range of data-intensive healthcare tasks: detecting cancer in diagnostic images, segmenting images for adaptive radiotherapy and perhaps one day even fully automating the radiation therapy workflow.
Now, for the first time, a team at Northwestern Medicine in Illinois has integrated a generative AI tool into a live clinical workflow to draft radiology reports on X-ray images. In routine use, the AI model increased documentation efficiency by an average of 15.5%, while maintaining diagnostic accuracy.
Medical images such as X-ray scans play a central role in diagnosing and staging disease. To interpret an X-ray, a patient’s imaging data are typically input into the hospital’s PACS (picture archiving and communication system) and sent to radiology reporting software. The radiologist then reviews and interprets the imaging and clinical data and creates a report to help guide treatment decisions.
To speed up this process, Mozziyar Etemadi and colleagues proposed that generative AI could create a draft report that radiologists could then check and edit, saving them from having to start from scratch. To enable this, the researchers built a generative AI model specifically for radiology at Northwestern, based on historical data from the 12-hospital Northwestern Medicine network.
They then integrated this AI model into the existing radiology clinical workflow, enabling it to receive data from the PACS and generate a draft AI report. Within seconds of image acquisition, this report is available within the reporting software, enabling radiologists to create a final report from the AI-generated draft.
“Radiology is a great fit [for generative AI] because the practice of radiology is inherently generative – radiologists are looking very carefully at images and then generating text to summarize what is in the image,” Etemadi tells Physics World. “This is similar, if not identical, to what generative models like ChatGPT do today. Our [AI model] is unique in that it is far more accurate than ChatGPT for this task, was developed years earlier and is thousands of times less costly.”
Clinical application
The researchers tested their AI model on radiographs obtained at Northwestern hospitals over a five month period, reporting their findings in JAMA Network Open. They first examined the AI model’s impact on documentation efficiency for 23 960 radiographs. Unlike previous AI investigations that only used chest X-rays, this work covered all anatomies, with 18.3% of radiographs from non-chest sites (including the abdomen, pelvis, spine, and upper and lower extremities).
Use of the AI model increased report completion efficiency by 15.5% on average – reducing mean documentation time from 189.2 s to 159.8 s – with some radiologists achieving gains as high as 40%. The researchers note that this corresponds to a time saving of more than 63 h over the five months, representing a reduction from roughly 79 to 67 radiologist shifts.
To assess the quality of the AI-based documentation, they investigated the rate at which addenda (used to rectify reporting errors) were made to the final reports. Addenda were required in 17 model-assisted reports and 16 non-model reports, suggesting that use of AI did not impact the quality of radiograph interpretation.
To further verify this, the team also conducted a peer review analysis – in which a second radiologist rates a report according to how well they agree with its findings and text quality – in 400 chest and 400 non-chest studies, split evenly between AI-assisted and non-assisted reports. The peer review revealed no differences in clinical accuracy or text quality between AI-assisted and non-assisted interpretations, reinforcing the radiologist’s ability to create high-quality documentation using the AI.
Rapid warning system
Finally, the researchers applied the model to flag unexpected life-threatening pathologies, such as pneumothorax (collapsed lung), using an automated prioritization system that monitors the AI-generated reports. The system exhibited a sensitivity of 72.7% and specificity of 99.9% for detecting unexpected pneumothorax. Importantly, these priority flags were generated between 21 and 45 s after study completion, compared with a median of 24.5 min for radiologist notifications.
Etemadi notes that previous AI systems were designed to detect specific findings and output a “yes” or “no” for each disease type. The team’s new model, on the other hand, creates a full text draft containing detailed comments.
“This precise language can then be searched to make more precise and actionable alerts,” he explains. “For example, we don’t need to know if a patient has a pneumothorax if we already know they have one and it is getting better. This cannot be done with existing systems that just provide a simple yes/no response.”

Radiology societies call for critical evaluation of AI, building the UK’s quantum workforce
The team is now working to increase the accuracy of the AI tool, to enable more subtle and rare findings, as well as expand beyond X-ray images. “We currently have CT working and are looking to expand to MRI, ultrasound, mammography, PET and more, as well as modalities beyond radiology like ophthalmology and dermatology,” says Etemadi.
The researchers conclude that their generative AI tool could help alleviate radiologist shortages, with radiologist and AI collaborating to improve clinical care delivery. They emphasize, though, that the technology won’t replace humans. “You still need a radiologist as the gold standard,” says co-author Samir Abboud in a press statement. “Our role becomes ensuring every interpretation is right for the patient.”