1742704139 image770x420cropped.jpg

The struggle to maintain linguistic variety in Artificial Intelligence: Watch your words

Sundar Pichai, Google’s chief executive, looked the part of a Silicon Valley tech visionary as he addressed the AI Action Summit in Paris on February 10. Sporting his distinctive geeky glasses and a headset reminiscent of a TED Talk, Pichai stood on the Grand Palais podium and heralded a new era of technological innovation.

“Leveraging AI techniques, we’ve expanded Google Translate by over 110 new languages last year, welcoming half a billion global speakers,” announced Pichai, scrutinizing his notes. “This brings our total to 249 languages, with 60 African languages included – more will follow.”

His monotone delivery failed to stir much reaction from the summit attendees, a crowd of international leaders, researchers, NGOs, and tech executives.

Joseph Nkalwo Ngoula
© Permanent Mission of Canada

For those advocating for linguistic diversity in AI, Pichai’s statement was a quiet triumph, the result of two years of intense negotiations in the realm of digital diplomacy.

“This demonstrates that our message is resonating with tech companies,” said Joseph Nkalwo Ngoula, a UN digital policy advisor for the International Organisation of La Francophonie headquartered in New York.

A Linguistic Divide

Pichai’s remarks came in stark contrast to the early days of generative AI, which often struggled with non-English languages.

OpenAI’s ChatGPT launch in 2022 quickly revealed its limitations for non-English speakers. A query in English would return a detailed response, while the same question in French might result in a brief, apologetic statement.

The root of this divide lies in the way AI tools operate, using large language models trained on vast amounts of English data from the internet.

Given the predominantly Anglophone nature of the internet, with only 20% of the world population speaking English at home and nearly half of the training data for major AI models being in English, the linguistic gap in AI-generated content persists today.

Shifting Priorities

“There’s more and more up-to-date information available in English,” explained Mr. Nkalwo Ngoula. The default for AI development, training, and deployment is English, leaving other languages playing catch-up.

Furthermore, AI can ‘hallucinate’ or generate incorrect information when training in a given language is inadequate. This could manifest as an AI model inventing facts or careers for historical figures.

A Black Box Issue

“It’s similar to a black box absorbing data,” said Mr. Nkalho Ngoula. “The responses may be coherent and structured, but can be factually wrong.”

Language models also tend to overlook language variations like regional dialects and multilingual expressions, which could confuse A.I. systems.

La Francophonie’s Shadow Campaign

La Francophonie, which represents 93 states and governments promoting the French language, has focused on this digital divide in its strategy. They lobbied for linguistic diversity as a central principle in the UN Global Digital Compact, particularly through the Francophone Ambassadors’ Group at the UN.

Despite some progress, challenges remain. Francophone content is often overshadowed by algorithms favoring popularity, and there’s still a lack of English-dominated AI training data. Mr. Nkalho Ngoula argues that linguistic diversity should remain central to La Francophonie’s advocacy efforts.

Given the rapid development of AI, these concerns need urgent attention to ensure that technology serves all of humanity equitably.

Source: https://news.un.org/feed/view/en/story/2025/03/1161406

6158.jpg

Jim Chalmers warns that the Coalition’s signals of reducing the National Disability Insurance Scheme (NDIS) might cause significant concern for recipients.

23int uk starmer01 photo wphl facebookjumbo.jpg

Labour Party Leader Starmer Discusses US President Trump, Russo-Ukraine Conflict, and the Delicate Partnership between Europe and the United States

Leave a Reply