Five years ago, my father-in-law leaned his finger too hard on the button of my iPhone, and a burst of 10 holiday snaps of my wife and me was uploaded to Google’s photo cloud. Days later, I received a photo of a moment that had never happened. Without the intervention of any person, an algorithm had taken my wife’s smile from one picture in that burst, and convincingly pasted it on to her face in another photo, so that it seemed as if we had smiled in the same moment. At the time, this was a minor curiosity, and my blog piece about it generated a few interviews and articles.
Fast forward to today, when face recognition is being rolled out into national CCTV systems, Chinese police are using AR goggles that can tag potential offenders in their line-of-sight, and deep fake videos are poised to influence the next American election. Bursts of images of all of us are now continuously scanned by algorithms, the people in those images are electronically identified, and automated technology can easily manipulate images and video to create moments that never happened. My provocative blog entitled The AIs are now rewriting history could be coming true, and it may be time to reconsider what history tells us about image analysis and its effects on society.
It may be time to reconsider what history tells us about image analysis and its effects on society.
The first scientific system for identifying criminals from images was invented by Alphonse Bertillon, who (in 1882) not only developed the mug shot, but a system of physical measurements to uniquely identify criminals. This system was used internationally for many years despite its flaws. For example, in 1903, a black man named Will West was incarcerated in Leavenworth in Kansas only to find another black man with the same name and the same measurements imprisoned there as well. Errors of this sort were inevitable since the Bertillon measures reflected the racial biases of the era; simplifying measures, based on how people look, cannot avoid generalisations about the people portrayed.
Due to mistakes like the Will West case, fingerprints (which were developed in 1892 by Francis Galton, Darwin’s first cousin, the inventor of eugenics, and the inspiration for Bertillon’s work) rapidly replaced the Bertillon system. However, generalisations about offenders based on images persisted. Around the same time as Bertillon was developing his system, Italian criminologist Cesare Lombroso created his method of “criminal anthropology” which directly generalised about the relationship between face measurements and an individual’s criminal potential. While the inherent prejudice in this system has been historically deprecated, criminal anthropology lives on in the form of “offender profiling”.
If you pay for the magazine you should always take it. Vendors are working for a hand up, not a handout.
We now live in a world inundated with image data, provided by dense CCTV, the rise of police body cams, and ubiquitous smartphone cameras. Careful processes of image evaluation by people are increasingly impractical, and therefore, scrutiny of images by AI algorithms is rapidly becoming the only way to cope with this glut of visual data.
Many would assume that algorithms solve the problem of unreliable and biased witnesses because algorithms are thought to be inherently objective. However, the hidden reality is that algorithms rely on simplified metrics, not unlike those in the systems of Bertillon and Lombroso. Only now a sea of mathematical and computational techniques hide those simplifications. Because of their vast complexity, no one can query an algorithm as they could an eyewitness. Thus algorithmic biases can easily be overlooked. Due to flaws in design and biases in the data used to create them, algorithms can be inherently racist, as is being demonstrated in many strands of recent research. These biases are not easily overcome.
Moreover, the reliability of the image streams that algorithms provide is due to become increasingly suspect. “AI-enhanced” imaging is becoming an aspect of cameras, and “deep fake” video techniques will soon be a part of the repertoire of autonomous algorithms. Human eyewitnesses’ vulnerability to incentives and persuasion has long been a problem in their reliability, and likewise, what an algorithm recalls as a real image is liable to be influenced by the incentives of the algorithm’s operators. A stretched criminal justice system, increasingly evaluated on efficiency metrics, may unconsciously turn its algorithms towards pictures that are more likely to get a conviction. Likewise, the algorithms involved in individual social media feeds are likely to deliver videos that reinforce a user’s currently held beliefs.
These effects could lead to a worldview that either dangerously enforces a common, centrally dictated perspective, or shatters reality into opposing viewpoints that polarise and erode trust between people. How can we agree on a shared view of the world when the images we see of it may be entirely manipulated, possibly by forces we cannot see, understand, or control?
Ironically, human trust may be the only solution.
Ironically, human trust may be the only solution. Several organisations are now creating technology that can aid users in determining the sources of news content, and these technologies are sure to eventually include ways of determining the sources of images and video. Any source could, of course, manipulate the reality seen in online content, but if people could rely on the regulated standards of institutions (for instance, journalistic standards), it may be possible to create a stable, shared view of the world. Ironically, the dissolution of confidence in the mechanically recorded image may force people to redevelop their faith in one another’s human worldview.
Robert Elliot Smith is an AI engineer and the author of Rage Inside the Machine: The Prejudice of Algorithms, and How to Stop the Internet Making Bigots of Us All (Bloomsbury, £20)