Anchoring in the African AI Ecosystem
The emergence of artificial intelligence (AI) on the African continent is not only a matter of technological advancement but also one deeply rooted in community involvement and collaborative efforts. A prime example of this is the WAXAL project, which has made significant strides toward elevating the African AI ecosystem through local partnerships focused on data collection and sharing. By placing African voices and languages at the heart of this initiative, the project sets a precedent for how AI can be developed in an inclusive and effective manner.
Collaborative Framework
At the core of the WAXAL project lies a commitment to collaboration with local academic and community organizations. This participatory approach has ensured that the data collection effort is not only led by African stakeholders but is also tailored to meet the specific needs and contexts of the communities it aims to serve. The collaboration is underscored by the unique contributions of various institutions.
Among the most notable partners are Makerere University, which focused on Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) data collection across nine different languages, and the University of Ghana, which concentrated on eight languages. Their efforts were guided by principles of world-class data collection practices, ensuring high-quality outputs driven by local expertise.
Ownership and Open Access
One of the most revolutionary aspects of the WAXAL project’s framework is its dedication to data ownership and open access. All partners retain ownership of the data they collect, which aligns with a broader commitment to make these datasets openly available. This foundational principle is not just about making data accessible; it signals a shift toward empowering local researchers and organizations to leverage this data for various applications.
This open-access philosophy has already borne fruit, enabling a variety of derivative research opportunities. For instance, the project has facilitated the creation of a unique cookbook aimed at guiding community-driven collections focused on impaired speech. This cookbook has led to the development of the first open-source dataset for Akan speakers with conditions like cerebral palsy and stammering. Notably, it demonstrated that in-person, image-prompted elicitation techniques are more effective than traditional text-based prompts for engaging these populations.
Building Robust Speech Technologies
The WAXAL project has also played a pivotal role in laying the groundwork for building robust ASR and TTS systems that are finely attuned to the linguistic diversity of West Africa. A significant study introduced a remarkable 5,000-hour speech corpus for five Ghanaian languages: Akan, Ewe, Dagbani, Dagaare, and Ikposo. By employing a controlled crowdsourcing approach, researchers captured natural and spontaneous intonations that reflect the richness of these languages.
The collaboration has also spearheaded benchmarking efforts, where vital research has investigated four state-of-the-art models, including Whisper and XLS-R, across 13 African languages. This extensive analysis sheds light on how performance improves with increased training data while emphasizing the importance of linguistic complexity and domain alignment. Such findings are crucial for informing future projects and ensuring that AI systems are tailored to the languages and cultures they serve.
The Need for Comprehensive Evaluation Metrics
In addition to practical applications, the WAXAL project highlights the importance of comprehensive evaluation metrics for speech technology. A recent systematic literature review revealed 74 datasets across 111 African languages, laying bare the current state of speech technology on the continent. This research underscores the pressing need for multi-domain conversational corpora and the integration of linguistically informed metrics such as Character Error Rate (CER). Such metrics are vital for evaluating AI performance in morphologically rich and tonal language contexts, enhancing the precision and applicability of these technologies.
Through this framework of collaboration, open access, and rigorous research, the WAXAL project is not merely an isolated initiative but rather a beacon for the future of AI in Africa. By centering local languages and communities, it paves the way for a more equitable and inclusive AI landscape that honors the rich linguistic diversity of the continent.

