Is AI male or female?

Let me tell you a story:

A boy and his father are in a car crash. The father is killed outright, the son needs emergency brain surgery. At the hospital, the senior brain surgeon takes one look and yells at the nearest nurse: “Fetch my second-in-command – I can’t operate on my own son!”

How is this possible?

In several decades of telling this riddle, I’ve been amazed by the creativity of the solutions proposed: adoption, mistaken identity, baby-boys-swapped-at-the-hospital, Witness Protection Program, resurrection, gay dads, prosopagnosia, you name it.

Only about one in ten people have come up with what should be the glaringly obvious answer: the brain surgeon is the boy’s mother.

Why don’t most people get it?

Despite the fact that most English words are on the surface gender-neutral (we don’t say “brain-surgery-man”), English speakers tend to make strong gender assumptions about nouns. We unconsciously expect important jobs, like brain surgeons, CEO’s and Ministers of Defence, to be held by men; just as we expect support-service jobholders to be female – hands up everyone who assumed the yelled-at nurse in the riddle was a woman.

But we also associate other areas of language with gender. A character who yells at subordinates, who is described as senior, who gives orders to a second-in-command…all these words and phrases nudge the reader towards assuming maleness. Not convinced? Substitute “squealed”, “junior”, “pleaded”, “job-sharer” and tell me you don’t assume the person referred to is female.

The truth is, we all take on the assumptions built into our culture, and despite evidence that there are precious few genuine gender differences, one of the deepest assumptions in Western culture is that men and women are dissimilar in aspects that go far beyond the physical.

But why do those assumptions carry over to robots? Lines of code don’t have a gender!

Ah, but they do.

He-robots and she-robots

Robot-slaves – sorry, digital personal assistants – like Alexa or Siri have female names and default-female voices. Robot-equals, problem-solving AI like AlphaGo, may sometimes not be given an official gender but, like brain surgeons and generals, the default assumption is that they are male rather than female: robot equals who talk, such as IBM Watson, are all given male voices.

We should not be surprised that our human-society assumptions spill over into the virtual world. Once code or mechanical parts start doing stuff people do, we treat them as in some way like us, and that means having a gender, as well as a name and even the suggestion of a personality.

But the almost-universal demarcation of power-robots as male and slave-robots as female does not reflect reality. In the sensual world one in eight brain surgery residents are female – not an overwhelming proportion, but not zero either.

Why aren’t one in eight power-robots female?

A male conspiracy?

The first explanation that comes to mind is robot programmers’ gender.

Only 22% of people working on AI identify as female – a third of the female share of the workforce in other industries – and the pipeline is shrinking rather than expanding: in 1984 37% of US graduates in computer science were female; in 2014 just 18%.

Not only are there few women, they don’t get the best AI jobs: while men are more likely to hold positions such as head of engineering, women are more often found in junior or support roles such as data analysts or researchers.

Shockingly, though perhaps unsurprisingly given the above, many robots simply do not recognise women. An analysis by MIT of the three most popular facial recognition AI systems found that all consistently identified men more accurately than women, with dark-skinned women being mislabelled anything from 21% to 34% more often than light-skinned men.

So are robots sexist and racist?

Perhaps, but that might not be men’s fault. There’s another relevant factor that is rarely discussed: maths.

Correlation and causation

Most artificial intelligence, from image recognition to game strategy to natural language processing, works by calculating probability of match between a specific example and a general criterion – how likely it is that a particular photo depicts a cat, say. This calculation is based on correlations – if almost every cat-labelled photo shows whiskers and a pink squishy nose, and other photos (of dogs, say) don’t, the presence of whiskers and a pink squish nose correlates strongly with cat-ness – if you see those features, it’s pretty likely you’re looking at Kitty rather than Rover.

There are just two problems with using correlation as the basis for intelligent decisions. The first involves the correlation training data you feed the system. If your training data shows largely marmalade cats, for instance, the trained system may fail to recognise a Siamese – it will over-correlate orange fur with cat-ness. Similarly if it shows mostly face-on pictures of cats, it may pass on a puss in profile. Size of database honestly is not what makes the difference: Amazon thought it was being representative by training its sexist recruitment AI on a decade’s worth of CV’s submitted to the company. Unfortunately these CV’s were not gender-representative, so the system learned to associate maleness with success, screened out CV’s that mentioned the word “women’s”, and was dumped because the team could not work out how to remove less obvious instances of bias, such as bonuses awarded to candidates whose CV’s featured “masculine” words like “executed” rather than “collaborated” (or at least how to do it quickly and easily). Getting representative training data is laborious, time-consuming, expensive and, worst of all, non-obvious. Not a great business model in the world of move fast and break things.

The second problem is more fundamental: it’s the difference between correlation and causation. Correlation mostly works fine if you’re solving group-level or repeat-data problems such as how many men’s compared to women’s loos you should provide at a tech conference; or whether the next programmer to walk up to the coffee stand is more likely to be a man or a woman (tech conferences bring out the moronic gamer in all of us). But what we need to know to solve specific problems is not which features most often accompany success, but which features cause or drive success – How can I predict the best hire? Which is the right surgical procedure to use on patient X? How safe is it to give Guilty Defendant Y a non-custodial sentence? When and where can police best intervene to stop a potential riot?

The good news about causative factors is that there are not very many of them – a handful of mental and behavioural factors predict high performance in the vast majority of jobs, for instance. The bad news is that identifying those causal factors is not a quick or easy process. In 2009, Google thought it had found a way to predict epidemics; by 2013 it turned out that its systems were doing nothing of the kind. We may have more data points from digital sources today – in an online month we can gather an offline millenium’s worth of data – but some data is time-dependent: it cannot be speeded up. A top performer only emerges as a top performer months, sometimes years, post-hire. You can gather all the information you like at the start, but only time will tell when it comes to results.

Luckily, when it comes to gender and work, time has told. And the answer is a big, fat zero causal relationship. Meta-analyses of almost a century of research studies have been highly consistent in pointing to the factors that drive superior performance – so much so that I wonder why Google decided to reinvent the wheel. All agree: gender has absolutely no causal relationship with results.

Should we just scrap gender for AI?

That’s what Google seem to be doing.

A few months ago, Gmail started suggesting click’n’paste text for emails. Personally I thought the main user gripe would be the inanity of the suggested replies, but it turned out that the big problem was gender. One of the researchers behind SmartCompose was allegedly typing an email referring to an investor (was s/he running a startup on the side?), when Gmail chirpily suggested “Do you want to meet him?” – assuming, wrongly this time, that the investor was male. The Gmail team apparently tried all kinds of whizzo fixes until, like Amazon, they gave up. But being Google, their way of giving up was to redefine the English language. Now SmartCompose gives no gendered suggestions whatsoever, which leads to some pretty awkward phraseology but does at least cut the sexism off at the screen.

I wonder…for a generation growing up communicating largely through app text, will their attitudes change if they don’t have “brain surgeon = male” dinned into them? My teenage daughter – who as a small child refused to watch any television programme that did not have at least a simple majority of female characters (“Maman, is that a girl-dragon or a boy-dragon?”) and insisted all bedtime stories were populated entirely by female characters – seems less constrained by sexist stereotypes than I was at her age. We’ve just learned that screen time isn’t necessarily bad for children – could it be that screens could actually be a force for good?

Could robots be our new teachers, our new moral arbiters? Or do they lack the fundamental human touch?

I’ll be looking at that moral superiority issue from a different standpoint – the robot as artist – in my next blog.

Search This Blog

Robots and Me