Google’s algorithm for detecting hate speech is racially biased

Algorithms meant to spot hate speech online are far more likely to label tweets “offensive” if they were posted by people who identify as African-American.

AI systems meant to spot abusive online content are far more likely to label tweets “offensive” if they were posted by people who identify as African-American.

The news: Researchers built two AI systems and tested them on a pair of data sets of more than 100,000 tweets that had been annotated by humans with labels like “offensive,” “none,” or “hate speech.” One of the algorithms incorrectly flagged 46% of inoffensive tweets by African-American authors as offensive. Tests on bigger data sets, including one composed of 5.4 million tweets, found that posts by African-American authors were 1.5 times more likely to be labeled as offensive. When the researchers then tested Google’s Perspective, an AI tool that the company lets anyone use to moderate online discussions, they found similar racial biases.

A hard balance to strike: Mass shootings perpetrated by white supremacists in the US and New Zealand have led to growing calls from politicians for social-media platforms to do more to weed out hate speech. These studies underline just how complicated a task that is. Whether language is offensive can depend on who’s saying it, and who’s hearing it. For example, a black person using the “N word” is very different from a white person using it. But AI systems do not, and currently cannot, understand that nuance.

Blog