One of Google’s artificial intelligence projects, a natural language processing system called BERT, has been deployed in several areas, including the search engine itself. However, research into BERT has shown that the AI system seems to be picking up biases based on where it sources its data from. Primarily among these were gender biases based on professions and an inability to recognize female pronouns, suggesting that historically, the English language has had a distinct bias towards men.
Historical Data from Decades Ago
BERT’s training dealt with consuming a large amount of printed material that covered material as far back as researchers could offer. The AI’s systems developed a particular view of the language based on the way writing was done over decades. As a result, BERT was able to process language just as efficiently as a human being. The only caveat was that the AI’s system had a bias that was built into the language. Researchers into AI have been aware of the biases introduced when training AI on large volumes of data, and Google addressed the problem, stating they were taking the necessary measures to address it.
Complex Systems Make it Hard to Predict
While language scholars are aware of the unspoken bias that exists in the language, it was unclear how BERT automatically adopted the same bias. The only explanation that seemed plausible was that the writing itself was to blame. Complex systems like BERT are unpredictable in how they process and output data. Director of Science at Primer, a San Francisco startup that deals with Natural Language Processing, says that BERT is indeed a game-changer, but that researchers need to approach the AI along the same lines as a biologist would deal with studying how a cell performs its functions. By figuring out how the AI sees things and processes them, these researchers stand a better chance of picking up on the biases within the language that are communicated to the language processing systems of the AI.