An artificial intelligence (AI)-powered dermatology algorithm that can identify a range of skin conditions doesn’t work effectively on black skin. That’s the finding of researchers in Uganda and Sweden, who tested the software on adults in Uganda. The study highlights the potential risks of AI-based healthcare and the need to ensure diversity in datasets used to train algorithms, the authors explain (bioRxiv 10.1101/826057).
Skin Image Search is an AI app that helps people identify skin conditions. The user uploads a photograph of the problem area and the app suggests the three most likely skin diseases. The latest version of the AI algorithm, released in June this year, has an accuracy of about 80% for the top three suggestions, while its top suggestion is around 45% accurate, according to First Derm, the company that developed the platform. The company also runs a telemedicine service, where uploaded images are assessed by a qualified dermatologist.
The convolutional neural network that powers the AI-based app was trained using more than 300,000 photographs of skin diseases collected by this service, ranging from inflammatory conditions, such as acne and psoriasis, to skin cancers.
Founder and CEO of First Derm, Alexander Börve, tells Physics World that he reached out to his co-authors on the paper for their help to conduct the research as he knew that images of black skin only make up about 5–10% of the company’s database. According to Börve, the majority of images in the database (70%) are from the US, 15% are from the UK and 5% are from Sweden, with the rest coming from all over the world.
In Uganda, researchers at The Medical Concierge Group tested the app on 123 photographs of skin diseases collected by a local telemedicine company. All images were from adults with Fitzpatrick skin type 6 (dark brown to black), 62% of whom were female and 38% male.
The team used an older version of the app, launched in April 2018, that suggests five – rather than three – possible skin conditions, with an accuracy of 70%. On the Uganda data, however, the AI was only able to place the correct skin condition in its top five suggestions for 17% of the images. It failed to return any correct suggestions for 102 of the 123 photographs.
There was also a marked difference in performance with different skin diseases. The app worked well for dermatitis, identifying it as the most likely condition – giving it the top spot in its suggestions – with an accuracy of 80%. But while fungal conditions were the most common images, the software failed to identify any of them, achieving an accuracy of 0% for its top five suggestions. The study authors also note that the app performed slightly better on females than males.
Börve says that the research will serve as a benchmark for the company, so that it can make sure that it improves with future updates. Going forward, he says that the company will be working to increase the diversity of images that the AI is trained on. “Basically, when we train these neural networks, we need at least 500 images per skin disease and when it comes to black skin, we also need 500 images of that skin disease and that skin type,” he explains.
Disease presentation is also an issue, Börve says. In Africa, due to issues with healthcare access, patients are often first seen by a doctor when their problem is at a more advanced stage than in Europe and the US. This means that skin diseases may show a different, more advanced presentation than the AI has been trained on.
Börve says that this could be part of the problem with fungal diseases. As fungal conditions progress, he explains, they can impact hair growth and cause the skin to become white, and the AI struggles with this, misidentifying conditions as hair loss or psoriasis.
“This is the problem with artificial intelligence, with the development of these systems, you need correct data and you need a lot of data representing the different diseases,” Börve says.