morals in the machine

The first time I took on the role of a lead engineer, a few years ago, I had a really hard time learning how to prioritize and delegate work. For much of my early career, I had simply never needed any planning skills beyond “say yes to everything and work yourself into the ground”. One of the best pieces of professional advice I’ve ever received came during this time, from a mentor who told me to delegate the things I was already good at. If I’m good at something, it means I’m actually equipped to evaluate whether my team is doing a good job. It also means I don’t need the practice as much, so delegating frees me up to improve other skills.

There’s an oft-repeated myth about artificial intelligence that says that since we all know that humans are prone to being racist and sexist, we should figure out how to create moral machines that will treat human beings more equitably than we could. You’ve seen this myth in action if you’ve ever heard someone claim that using automated systems to make sentencing decisions will lead to more fairness in the criminal legal system. But if we all know that humans are racist and sexist and we need the neutrality of machines to save us—in other words, if we should delegate morality to AI—how will we ever know if the machines are doing the job we need them to do? And how will we humans ever get better?

AI is a funny multifaceted beast. What the media colloquially terms “AI” is generally actually talking about a subtype called “machine learning”. The most common form of machine learning involves collecting large datasets illustrating the kinds of things you would like a machine to be able to do, feeding those datasets into a neural network that has layers upon layers of internal parameters for making decisions, and letting the model iteratively teach itself how to imitate the skill captured in the dataset. For example, if you want to build a speech recognition tool that lets you dictate text messages to your phone, you need to feed your machine learning model thousands of hours of people speaking along with transcripts of what they said, so that it can learn which sounds correspond to which letter combinations.¹

Two of the most common strategies for building a model involve supervised and unsupervised learning. In supervised learning, individual data points in a collection are labelled with the “ground truth”, which is the empirical, factual description of that data. For example, an audio clip in a voice dataset either does or does not feature someone saying “the quick brown fox jumps over the lazy dog”. In unsupervised learning, the model is fed unlabelled data and makes best guesses about the outcome based on pattern-matching. In either case, engineers then evaluate the model with test cases that check how “correct” its assessments were, and adjust accordingly.

This means that the strength of a machine learning model rises and falls on the strength of its training data. “Garbage in, garbage out” is a frequent refrain in the industry; if your voice dataset only contains deep tenor voices, the resulting model is going to be pretty bad at recognizing the speech of people who have a higher voice. When models trained on bad data get released into the real world, you end up with things like Twitter’s racist photo cropping algorithm that consistently cropped out Black faces in favour of white faces, or one of the world’s largest natural language models that can’t seem to write about Muslims without calling them terrorists.

Every dataset that goes into a training pipeline involves encoding and curation decisions that slice away the nuance of the world it is meant to represent. This is not a knock against computers or an argument for the analog—it’s no different than how our brains ingest the world around us through the filter of what is visible and audible to our physiology, how we focus our limited attention and leaky memory on the things we know how to interpret. But it does mean that the resulting shape of each dataset is determined by who gets to make those curation decisions and their unique histories and worldviews—to say nothing of business pressures to reach for the cheapest and easiest source of training data.

(A research AI named the Delphi Oracle recently made headlines when users got it to conclude that being white was more morally acceptable than being Black and that genocide was okay as long as it made everyone happy. It may not surprise you to hear that it was trained on text scraped from Reddit’s “Am I The Asshole” subreddit.)

“Garbage in, garbage out” implies the possibility of “treasure in, treasure out”. But what gold standard dataset would you be able to feed a model if you wanted to create a moral machine, and who would be equipped to evaluate whether the outputs are correct or not? The truly unanswerable question behind whether it’s possible to build moral machines is: moral according to whom? There’s no such thing as a universal code of ethics; the concept of justice cannot be extricated from the contextual value system it’s applied in. It seems unlikely that machine learning engineers will be able to reconcile what moral philosophers have been yelling at each other about for thousands of years, with no consensus in sight.

Theoretically, a world in which moral AIs make all the difficult judgment calls of life should be paradise, with no human foibles to get in the way of the pursuit of justice. But the idea of a future in which we look to machines for moral guidance has always struck me as deeply dystopian, because it belies a weary skepticism that humans could ever learn to treat each other better than we have in the past.

Probably just about the only thing you can uncontroversially say about cryptocurrency is that one of its central tenets is “trustlessness”. You’re meant to be able to engage in transactions without being forced to trust in the goodwill of your opposing party, or even a third-party broker like a bank. In its most rudimentary form, a trustless system relies on decentralized technical systems to enforce contractual agreements, and decentralized ledgers that transparently and irreversibly prove the validity of a transaction.

There’s a truism in network security that’s since percolated into the cryptocurrency world best summed up by the quote “trust is a vulnerability, and like all vulnerabilities, should be eliminated”. This refers to the need to verify every transaction that takes place in a system rather than assuming that someone is trustworthy based on their user credentials, but my first thought when I encountered this principle was how easily this might also describe a governing philosophy for life.

Not to make a tech essay all about feelings (even though I love to make a tech essay all about feelings), but the network engineers are not wrong that trust entails vulnerability, and vulnerability by definition gives someone else the ability to harm you. Trust is scary. You can never really know what’s in someone else’s heart, and on some level we’re all taking it on faith that we’re not being lied to every minute of every day².

I can understand the philosophical appeal of a system, financial or otherwise, that promises that you don’t have to trust anyone but seemingly neutral machines. But embedded in that promise is a core belief that everyone is untrustworthy, and that a system based on humans trusting one another will always be worse than one in which we never have to take a risk on interdependence.

I can’t prove one way or another whether this belief is correct. I do know that a worldview that sees human nature as intrinsically bad and immune to improvement is incredibly bleak. It ignores social science research about human altruism, our willingness to collaborate, and how we are affected by structural incentives that can always be restructured to create different incentives. “Tragedy of the commons” is frequently invoked as evidence of our instinct for selfishness, and like many flawed shorthands that have captured the popular imagination, it comes preloaded with a political agenda—in this case, one that favours privatization and protectionism³.

Of course, no system built by humans can ever actually be trustless, just as no system built by humans can ever actually be neutral. We can’t help but replicate the power structures that govern our lives in the mechanisms we design; Safiya Noble wrote an entire dang book on how search engines, far from being a neutral filter of information, consistently perpetuate the racism of their underlying sources. Even cryptocurrency advocates readily admit that you’re still putting your trust in the technologies that enable those transactions, and by extension the integrity of the people who built them. You are also, wittingly or not, trusting in the further concentration of power to the technical and the well-resourced, in systems architected by an overwhelmingly homogenous group that bears little resemblance to the unbanked that they theoretically want to help.

In Automating Inequality, Virginia Eubanks chronicles an automated intake screening tool implemented in Allegheny County in Pennsylvania that’s meant to help child welfare workers make risk assessments about when to intervene to prevent possible abuse. Child welfare services have a long and ugly history of racist violence, and seem like exactly the kind of system that proponents of ethical AI would want to target for improvement⁴.

If you look at the intake system’s official documentation, it warns that it’s not meant to be used for decision-making and should not override human judgment. Nevertheless, intake workers tended to see a discrepancy between their own assessment and the computer’s conclusion as a sign that they missed something, rather than that the model might be faulty. This, even though workers often had far more context—about nuisance calls or bigoted neighbours—than the model was designed to account for.

The mere fact of technical systems seems to make us want to trust in their virtue, maybe even at the cost of our own judgment. It might be learned helplessness engendered by the complexity of computers, or it might simply be bias laundering: evading culpability for harms caused by machines by pretending the outcomes are neutral simply because they are digital, instead of taking accountability for the tools we build.

Despite the fallibility of human social workers, at least they are capable of learning. As the parents who have been put under the microscope of the automated intake system told Eubanks: with a human worker you can build a history and a relationship, teach them how you want to be treated, maybe overcome their initial biases. There’s no such negotiation with a stark computer-generated number meant to encapsulate your fitness as a parent.

This inability to evolve independently is common to all models right now, regardless of their level of sophistication. They can be fine-tuned if you retrain them on additional data, but the process requires significant human intervention: the model researchers need to recognize the existence of a bias, devise a strategy for identifying data that would correct this bias, actually acquire the data, re-train all the models, and release the updated versions into the world. This costs significant money, labour, and computing resources that their institutional backers have to decide they’re willing to incur.

In the end, machine learning models still need us to learn. Until they can identify errors in their own judgment, independently seek out additional data, and retrain themselves, they are little more than static snapshots of a very dynamic world. Though perhaps it is reassuring to be given an answer that won’t change as the world around you changes, more quickly every year; perhaps this constancy is exactly the reason we want to trust in these singular evaluations.

Currently, voice recognition systems tend to work better for men than for women because the datasets they’re trained on are often pulled from public sources like YouTube, which by and large tend to feature more men speaking (and thus a lower vocal frequency range). If someone wanted to correct this imbalance, they might try to make sure that the training data includes an equal number of women and men speaking.

Of course, even something as apparently straightforward as this betrays embedded social norms. Gender—even if you expand its scope beyond the painfully limited binary that currently persists in many machine learning datasets—is at best an imperfect proxy for the prevalence of certain pitches and intonations and registers. You could just as easily split speech data into soprano, alto, or tenor by directly analyzing the frequencies present in the audio, which is much closer to a ground truth than a social identity. The fact that the latter categorization is rarely used to label audio datasets speaks to the cultural ubiquity of gender as a tool for grouping people.

Unfortunately, given how much humans enjoy putting other humans into boxes, even characteristics we might think of as being ground truths can take on moral dimensions. For example, weight and height are basically objective facts, but compare the two against one another for the purposes of life insurance underwriting and the resulting ratio suddenly becomes an “index” by which we make public health policies. BMI (and the obesity epidemic moral panic writ large) are broken in too many ways for me to summarize, but for the purposes of this discussion it’s important to note that researchers have found time and time again that Black people do not experience the same adverse health outcomes at higher BMIs as white people⁵.

Imagine a medical machine learning system that allocates organ donations based on predictions of who will live the longest after receiving said organ, which might well include BMI as one of its parameters for future mortality. Even assuming that such a system would not immediately become captive to the financial interests of insurance companies, would it know to correct its analysis depending on race? Can it account for historical inequities in the quality of medical care received by racialized groups? Will this differential analysis be used to justify providing less medical intervention to racialized groups in other scenarios? And how would such a model even identify race? Would it be self-reported, or would the system try to find “biological” measures of race and fall headlong into scientific racism?

It’s a quick hop and a skip from a simple measure of weight and height to a complex question of identity and how it can be weaponized. Os Keyes writes beautifully about how data science is fundamentally threatening to queer existence, because the encoding of a life into data points that can be slotted into categories is in direct opposition to the fluid and transitional nature of queerness. It’s not clear to me whether any computational system dependent on the concept of ground truth can be compatible with the rich context of human relationships. It’s not clear to me that we should want it to.

I know practically every generation has thought that the end times were nigh, but I also know that one day one generation will be right. For what it’s worth, I don’t think it’s going to be this one, but it’s no wonder that it feels harder and harder to trust in other humans when the world seems to be falling apart around you.

In her famous 1989 lecture series The Real World of Technology, Ursula Franklin suggested that technical systems generally reduce or eliminate reciprocity. Franklin defined this as a contextual give-and-take between people that isn’t predetermined or by design, but which can lead to renegotiations of their joint understanding. Technology, she thought, mediates this interaction and turns it into a one-way flow, leaving little room for response and leaving us more isolated in turn.

Our tendency to trust the decisions of computer systems as perfectly rational despite ample evidence to the contrary feels like an abdication of our responsibility to reciprocate. If our fear of the chaos raging around us leads us to put our trust in machines so that we don’t have to trust one another, we relinquish the reciprocity that lets us advance our shared humanity, and those bleak assumptions about humans being immutably selfish turn into self-fulfilling prophecy.

Nine days out of ten, I am terrified about the future of the world, the rising seas and the burning skies. The only thing I’ve found solace in are the people who do still believe in reciprocity, the mutual aid efforts that have sprung up in the wake of the systemic failures of governments, the altruism that shines through when disaster unfolds, the same $20 that gets Venmo’d back and forth depending on whose need is greater at that time. This duty of care is, I think, the height of what it means to be human, and our commitments to one another are what shapes our senses of self, what we consider right and wrong. Who we are is inextricable from our relationships and our community.

We are so excited by the idea of machines that can write, and create art, and compose music, with seemingly little regard for how many wells of creativity sit untapped because many of us spend the best hours of our days toiling away, and even more can barely fulfill basic needs for food, shelter, and water. I can’t help but wonder how rich our lives could be if we focused a little more on creating conditions that enable all humans to exercise their creativity as much as we would like robots to be able to. Instead of building machines to isolate us from the bleak future we have foretold for ourselves, I would rather spend my energy trying to build the world I actually want to live in. A world that places people above profit, that looks at the promises of automation and artificial intelligence and sees leisure and self-actualization, not increasingly vicious fights in a zero-sum system to work harder for fewer scraps in order to prove that we deserve to live.

In Race After Technology, Ruha Benjamin defines The New Jim Code as:

“the employment of new technologies that reflect and reproduce existing inequities but that are promoted and perceived as more objective or progressive than the discriminatory systems of a previous era”

Machines cannot dismantle the power structures that perpetuate the brutal injustices wrought by entrenched oppression; only we can do that. My dearest wish for intelligent machines is for them to augment our reciprocal relationships, instead of replacing them. I want to harness these incredible innovations for material conditions that give us the breathing room to fight for a better version of our selves. I want to delegate to them the rote drudgery that we already know how to do, leaving us free to wrestle with the hard questions of ethics and morality, the interstitial frictions in human interactions where progress is made. And I want us to take a breath and trust in each other more than in technology, and consider the possibility that a life well-lived doesn’t have sharp answers to murky questions but moral meaning, jointly created, in communion with one another.

My gratitude to Kathy, Andrey, Natalie, Riley, and Jamie for reading early drafts and providing invaluable feedback.

This is obviously a massive oversimplification. For a good layperson’s overview of the nuances of AI, I recommend You Look Like a Thing And I Love You by Janelle Shane, the creator of the delightful AI Weirdness blog. ↩
I suspect this is part of why the Bad Art Friend saga so ignited The Discourse: the thought that maybe our friends secretly hated us all along is viscerally horrifying. ↩
The history of the American ecologist who popularized “tragedy of the commons” in the 1960s and who wielded it to promote eugenicist policies is fascinatingly insidious but slightly off-topic, so here’s a good external summary. ↩
I should note that the Allegheny system is a traditional statistical predictive model rather than a machine learning one, but this doesn’t change how we relate to it. ↩
I highly recommend the podcast Maintenance Phase if you’re at all interested in this subject, and their episodes on BMI, the obesity epidemic, and the impact of weight on health are especially good. ↩