Pumza FihlaniBBC News in Johannesburg
(1eb6f890-874b-11f0-9a58-57fdbdb13ad4).webp)
BBC
Although Africa is home to a huge proportion of the world’s languages – well over a quarter according to some estimates – many are missing when it comes to the development of AI.
This is both an issue of a lack of investment and readily available data.
Most AI tools, such as Chat GPT, used today are trained on English as well as other European and Chinese languages.
These have vast quantities of online text to draw from.
But as many African languages are mostly spoken rather than written down, there is a lack of text to train AI on to make it useful for speakers of those languages.
For millions across the continent this means being left out.
Researchers who have been trying to address this issue have recently released what is thought to be the largest known dataset of African languages.
“We think in our own languages, dream in them and interpret the world through them. If technology doesn’t reflect that, a whole group risks being left behind,” the University of Pretoria’s Prof Vukosi Marivathe, who worked on the project, tells the BBC.
“We’re going through this AI revolution, imagining all that can be done with it. Now imagine there’s a part of the population that just doesn’t have that access because all the information is in English.”
The Africa Next Voices project brought together linguists and computer scientists to create AI-ready datasets in 18 African languages.
That may just be a small portion of the more than 2,000 languages estimated to be spoken across the continent but those involved in the project say they hope to expand in the future.
In two years, the team recorded 9,000 hours of speech across Kenya, Nigeria and South Africa, capturing everyday scenarios in farming, health and education.
The languages recorded included Kikuyu and Dholuo in Kenya, Hausa and Yoruba in Nigeria and isiZulu and Tshivenda in South Africa, some of which are spoken by millions of people.
“You need some basis to start off with and that’s what Africa Next Voices is and then people will build on top of that and add their own innovations,” says Prof Marivathe, who led the research in South Africa.
His Kenyan counterpart, computational linguist Lilian Wanzare, says recording the speech on the continent meant creating data aimed at reflecting how people really live and speak.
“We gathered voices from different regions, ages and backgrounds so it’s as inclusive as possible. Big tech can’t always see those nuances,” she says.
The project was made possible by a $2.2m (£1.6m) Gates Foundation grant.
The data will be open access, allowing developers to build tools that translate, transcribe and respond in African languages.
There are already small examples of how indigenous languages used in AI can be used to solve real-life challenges in Africa, according to Prof Marivathe.
Kelebogile Mosime walking through a field with green crops. Two farmworkers can be seen behind her spraying the crops.
Farmer Kelebogile Mosime manages a 21-hectare site in Rustenburg, the heart of South Africa’s platinum region.
The 45-year-old works with a small team to cultivate rows of vegetables – including beans, spinach, cauliflower and tomatoes.
She only began three years ago, with a cabbage crop, and to help she uses an app called AI-Farmer, which recognises several South African languages, including Sesotho, isiZulu and Afrikaans, to help solve various problems.
“As someone still learning to farm, you face a lot of challenges,” Ms Mosime says.
“Daily, I see the benefits of being able to use my home language Setswana on the app when I run into problems on the farm, I ask anything and get a useful answer.
“For somebody in the rural areas like me who is not exposed to technology it’s useful. I can ask about different options for insect control, it’s also been useful with diagnosing sick plants,” she beams underneath a wide-brim sunhat.
Lelapa AI is a young South African company building AI tools in African languages for banks and telecoms firms.
For its CEO Pelonomi Moiloa, what is currently available is very restrictive.
“English is the language of opportunity. For many South Africans who don’t speak it, it’s not just inconvenient – it can mean missing out on essential services like healthcare, banking or even government support,” she tells the BBC.
“Language can be a huge barrier. We’re saying it shouldn’t be.”
But this is more than being about business and convenience.
For Prof Marivathe there is also a danger that without African language initiatives, something else could be lost
“Language is access to imagination,” he says.
“It’s not just words – it’s history, culture, knowledge. If indigenous languages aren’t included, we lose more than data; we lose ways of seeing and understanding the world.”
You may also be interested in:
Getty Images/BBC
Source: https://www.bbc.com/news/articles/crkzgkkpx0lo?at_medium=RSS&at_campaign=rss