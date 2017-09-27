This is an experimental feature. Give us your feedback. Thank you for your feedback.

Last week, I visited the Wellcome Trust’s Genome Campus, a verdant patch of land in Cambridgeshire, where otters and bats inhabit the woods surrounding the offices of geneticists and computer scientists. It is one of the world’s largest genomics hubs, where scientists are probing the secrets of human DNA.

I met Sumit Jamuar, a former engineer who runs the start-up Global Gene Corp. The company, which has received funding from the Singapore government, wants to sequence anonymised genetic data from the South Asian population, starting with India, where Jamuar is originally from.

He threw some eye-opening stats at me. More than 80 per cent of genomic data, which underpins much of how genetic medicine works, comes from Caucasians. Of the remaining pie, about 14 per cent comes from Asian populations, while African and Hispanic populations together make up a measly 3.5 per cent.

Why does this matter? It means gene-based diagnostic tests and drugs targeted at specific genetic mutations will be less effective — maybe even dangerous — for certain ethnic populations, because of the innate genetic differences in the DNA codes of different races.

For instance, a recent study found that 650,000 African- Americans may have undiagnosed Type 2 diabetes because of a genetic quirk that fools a commonly used diabetic blood test. Other studies indicate that African-Americans have been consistently misdiagnosed with a certain type of heart disease, while Indians may be being mistakenly diagnosed and treated for epilepsy — all because their genomes haven’t been studied as deeply. “This data challenge isn’t just an inconvenience; it’s a matter of life and death,” Jamuar told me.

Researchers at scientific journal Nature said findings from its own investigation on the diversity of these data sets “prompted warnings that a much broader range of populations should be investigated to avoid genomic medicine being of benefit merely to ‘a privileged few’ ”.

This insidious data prejudice made me curious about other unintended biases in the tech world. Several new consumer technologies — often conceived by, built by and tested overwhelmingly on Caucasian males — are flawed due to biases in their design.

In 2014, Danah Boyd, principal researcher at Microsoft Research, penned an article entitled “Is the Oculus Rift sexist?” She described how 3D virtual environments had made her and other female colleagues nauseous. A friend of hers then came across a footnote in an army research paper, which noted that women seemed to get sick at higher rates than men in virtual environments. In 2000, before the Rift was even invented, Boyd published the surprising results of a multiyear study into how male and female brains processed 3D visual stimuli differently. “In other words, men are more likely to use the cues that 3D virtual reality systems relied on,” she wrote.

Last year, roboticist Carol Reiley, co-founder of the US self-driving car start-up Drive.ai, described how she was unable to get a voice-activated surgical robot that she had built using Microsoft speech recognition software to respond to her voice. The system, she wrote in a TechCrunch blog, “had been built mainly by 20-30-year-old men . . . I had to lower my pitch in order for it to work. As a result, I was not able to present my own work . . . a male graduate always had to lead the demonstration.”

Instances of Google’s automated image-labelling system classing African-Americans as gorillas, and Microsoft and HP’s cameras being reportedly unable to track dark-skinned faces, also demonstrate this basic flaw; the systems just hadn’t been trained on enough examples of non-white faces. Imagine what this could mean if the camera system on a self-driving car failed to recognise a darker face on the road.

As data-driven algorithms increasingly underpin the design of new products in medicine, transport and infrastructure, biased training data are not just awkward oddities, but deadly flaws.

Madhumita Murgia is the FT’s European technology correspondent

Illustration by Christopher de Lorenzo