Data used to build algorithms detecting skin disease is too white
Public skin image datasets that are used to train algorithms to detect skin problems don’t include enough information about skin tone, according to a new analysis. And within the datasets where skin tone information is available, only a very small number of images are of darker skin — so algorithms built using these datasets might not be as accurate for people who aren’t white.
The study, published today in The Lancet Digital Health, examined 21 freely accessible datasets of images of skin conditions. Combined, they contained over 100,000 images. Just over 1,400 of those images had information attached about the ethnicity of the patient, and only 2,236 had information about skin color. This lack of data limits researchers’ ability to…