A disgusting invasion of privacy.
An AI artist going by the name Lapine says she discovered that private medical photos from nearly ten years ago were being used in an image set that trains AI, called LAION-5B.
Lapine made this unnerving discovery by using Have I Been Trained, a site that lets artists to check if their work has been used in the image set. When Lapine performed a reverse image search of her face, two of her private medical photos unexpectedly cropped up.
“In 2013 a doctor photographed my face as part of clinical documentation,” Lapine explained on Twitter. “He died in 2018 and somehow that image ended up somewhere online and then ended up in the data set — the image that I signed a consent form for my doctor — not for a data set.”
Chain of Custody
LAION-5B is supposed to only use publicly available images on the web. And by all ethical and reasonable accounts, you’d think that would mean private photos of medical patients would be excluded. Apparently not.
Somehow, those photos were taken from her doctor’s files and ended up online, and eventually, in LAION’s image set. Ars Technica, following up on Lapine’s discovery, found plenty more “potentially sensitive” images of patients in hospitals.
LAION gathered those images using web scraping, a process where bots scour the internet for content — and who knows what they might dredge up.
A LAION engineer said in the organization’s public Discord that the database doesn’t actually host the images, so the best way to remove one is “to ask for the hosting website to stop hosting it,” as quoted by Ars.
But as Lapine points out, that process often requires you to divulge personal information to the site in question.
In the end, accountability might be tricky to pin down. Did the hospital or doctor mess up by not properly securing the photos, or are web scrapers like LAION too invasive? That may be a false dichotomy, as the answers aren’t mutually exclusive.
Regardless, it’s bad enough that AIs are assimilating artists’ works without their consent. But allowing private medical photos to be looked at by AI? That should be ringing everyone’s alarm bells. If even those aren’t sacrosanct, what is?