Is AI male or female?
Let me tell you a story:
A boy and his father are in a car crash. The father is killed outright, the son needs emergency brain surgery. At the hospital, the senior brain surgeon takes one look and yells at the nearest nurse: “Fetch my second-in-command – I can’t operate on my own son!”
How is this possible?
In several decades of telling this riddle, I’ve been amazed
by the creativity of the solutions proposed: adoption, mistaken identity, baby-boys-swapped-at-the-hospital,
Witness Protection Program, resurrection, gay dads, prosopagnosia, you name it.
Only about one in ten people have come up with what should
be the glaringly obvious answer: the brain surgeon is the boy’s mother.
Why don’t most people get it?
Despite the fact that most English words are on the surface
gender-neutral (we don’t say “brain-surgery-man”), English speakers tend to
make strong gender assumptions about nouns. We unconsciously expect important
jobs, like brain surgeons, CEO’s and Ministers of Defence, to be held by men;
just as we expect support-service jobholders to be female – hands up everyone
who assumed the yelled-at nurse in the riddle was a woman.
But we also associate other areas of language with
gender. A character who yells at
subordinates, who is described as senior, who gives orders to a
second-in-command…all these words and phrases nudge the reader towards assuming
maleness. Not convinced? Substitute “squealed”, “junior”, “pleaded”,
“job-sharer” and tell me you don’t assume the person referred to is female.
The truth is, we all take on the assumptions built into our
culture, and despite evidence that there are precious few genuine
gender differences, one of the deepest assumptions in Western
culture is that men and women are dissimilar in aspects that go far beyond the
physical.
But why do those assumptions carry over to robots? Lines of code don’t have a gender!
Ah, but they do.
He-robots and she-robots
Robot-slaves – sorry, digital personal assistants – like
Alexa or Siri have female names and default-female voices. Robot-equals, problem-solving AI like
AlphaGo, may sometimes not be given an official gender but, like brain surgeons
and generals, the default assumption is that they are male rather than female:
robot equals who talk, such as IBM Watson, are all given male voices.
We should not be surprised that our human-society
assumptions spill over into the virtual world.
Once code or mechanical parts start doing stuff people do, we treat them
as in some way like us, and that means having a gender, as well as a name and
even the suggestion of a personality.
But the almost-universal demarcation of power-robots as male
and slave-robots as female does not reflect reality. In the sensual world one
in eight brain surgery residents are female – not an overwhelming
proportion, but not zero either.
Why aren’t one in eight power-robots female?
A male conspiracy?
The first explanation that comes to mind is robot
programmers’ gender.
Only 22% of people
working on AI identify as female – a third of the female share of the workforce
in other industries – and the pipeline is shrinking rather than expanding: in 1984 37%
of US graduates in computer science were female; in 2014 just 18%.
Not only are there few women, they don’t get the best AI jobs:
while men are more likely to hold positions such as head of engineering, women
are more often found in junior or support roles such as data analysts or
researchers.
Shockingly, though perhaps unsurprisingly given the above, many
robots simply do not recognise women. An analysis by MIT of the
three most popular facial recognition AI systems found that all consistently identified
men more accurately than women, with dark-skinned women being mislabelled
anything from 21% to 34% more often than light-skinned men.
So are robots sexist and racist?
Perhaps, but that might not be men’s
fault. There’s another relevant factor
that is rarely discussed: maths.
Correlation and causation
Most artificial intelligence, from image recognition to game
strategy to natural language processing, works by calculating probability of
match between a specific example and a general criterion – how likely it is
that a particular photo depicts a cat, say.
This calculation is based on correlations – if almost every cat-labelled
photo shows whiskers and a pink squishy nose, and other photos (of dogs, say)
don’t, the presence of whiskers and a pink squish nose correlates strongly with
cat-ness – if you see those features, it’s pretty likely you’re looking at Kitty
rather than Rover.
There are just two problems with using correlation as the
basis for intelligent decisions. The
first involves the correlation training data you feed the system. If your training data shows largely marmalade
cats, for instance, the trained system may fail to recognise a Siamese – it will
over-correlate orange fur with cat-ness.
Similarly if it shows mostly face-on pictures of cats, it may pass on a
puss in profile. Size of database honestly
is not what makes the difference: Amazon thought it was being representative by
training its sexist
recruitment AI on a decade’s worth of CV’s submitted to the company. Unfortunately these CV’s were not
gender-representative, so the system learned to associate maleness with success,
screened out CV’s that mentioned the word “women’s”, and was dumped because the
team could not work out how to remove less obvious instances of bias, such as
bonuses awarded to candidates whose CV’s featured “masculine” words like “executed”
rather than “collaborated” (or at least how to do it quickly and easily). Getting
representative training data is laborious, time-consuming, expensive and, worst
of all, non-obvious. Not a great
business model in the world of move fast and break things.
The second problem is more fundamental: it’s the difference
between correlation and causation.
Correlation mostly works fine if you’re solving group-level or repeat-data
problems such as how many men’s compared to women’s loos you should provide at
a tech conference; or whether the next programmer to walk up to the coffee
stand is more likely to be a man or a woman (tech conferences bring out the
moronic gamer in all of us). But what we need to know to solve specific
problems is not which features most often accompany success, but which features
cause or drive success – How can I predict the best hire? Which is the right surgical procedure to use on
patient X? How safe is it to give Guilty
Defendant Y a non-custodial sentence? When
and where can police best intervene to stop a potential riot?
The good news about causative factors is that there are not
very many of them – a handful of mental and behavioural factors predict high performance
in the vast majority of jobs, for instance.
The bad news is that identifying those causal factors is not a quick or
easy process. In 2009, Google thought it
had found a way
to predict epidemics; by 2013 it turned out that its systems were doing nothing
of the kind. We may have more data
points from digital sources today – in an online month we can gather an offline
millenium’s worth of data – but some data is time-dependent: it cannot be speeded
up. A top performer only emerges as a top performer months, sometimes years,
post-hire. You can gather all the information
you like at the start, but only time will tell when it comes to results.
Luckily, when it comes to gender and work, time has
told. And the answer is a big, fat zero
causal relationship. Meta-analyses of
almost a century of research studies have been highly consistent in pointing to
the factors that drive superior performance – so much so that I wonder why
Google decided
to reinvent the wheel. All agree:
gender has absolutely no causal relationship with results.
Should we just scrap gender for AI?
That’s what Google seem to be doing.
A few months ago, Gmail started suggesting click’n’paste
text for emails. Personally I
thought the main user gripe would be the inanity of the suggested replies, but
it turned out that the big problem was gender. One of the researchers behind
SmartCompose was allegedly
typing an email referring to an investor (was s/he running a startup on the
side?), when Gmail chirpily suggested “Do you want to meet him?” – assuming,
wrongly this time, that the investor was male.
The Gmail team apparently tried all kinds of whizzo fixes until, like
Amazon, they gave up. But being Google,
their way of giving up was to redefine the English language. Now SmartCompose gives no gendered
suggestions whatsoever, which leads to some pretty awkward phraseology but does
at least cut the sexism off at the screen.
I wonder…for a generation growing up communicating largely
through app text, will their attitudes change if they don’t have “brain surgeon
= male” dinned into them? My teenage daughter
– who as a small child refused to watch any television programme that did not
have at least a simple majority of female characters (“Maman, is that a
girl-dragon or a boy-dragon?”) and insisted all bedtime stories were populated
entirely by female characters – seems less constrained by sexist stereotypes
than I was at her age. We’ve just
learned that screen
time isn’t necessarily bad for children – could it be that screens could
actually be a force for good?
Could robots be our new teachers, our new moral arbiters? Or do they lack the fundamental human touch?
I’ll be looking at that moral superiority issue from a
different standpoint – the robot as artist – in my next blog.

Comments
Post a Comment