Updates

Progress notes and research updates as ÒreAyò develops. Updated periodically with honest reporting on what's working, what's challenging, and what comes next.

LatestDecember 2024

Stage 1 Progress: Building the Foundations

Summary

Stage 1 focused on building and validating the infrastructure required to train speech models for Nigerian languages. No production models were trained during this stage.

What Stage 1 covered

• Data processing and preparation pipelines
• Training and experimentation infrastructure
• Configuration systems for multilingual research

Stage 1 intentionally excluded model training, speech recognition, and user-facing features.

What now exists

By the end of Stage 1, ÒreAyò has a complete system capable of running speech model training end-to-end in a controlled research environment.

Key learnings

Nigerian speech data is significantly more limited than expected

Multilingual balance introduces non-trivial trade-offs

Early models will require iterative refinement rather than one-off training

These insights will directly inform Stage 2.

Known limitations

Stage 1 does not yet enable:

• Speech recognition or transcription
• Real-world evaluation
• User-facing functionality

This stage was strictly foundational.

What Stage 1 enables

Stage 1 enables systematic experimentation with self-supervised learning for Nigerian languages.

What comes next

Stage 2 will focus on training and evaluating the first production-scale self-supervised speech encoder, while validating whether current data volumes are sufficient.

Stage 1 was about understanding the problem properly. Stage 2 will test whether that understanding holds under real training conditions.

Want to follow along?

Join the waitlist to receive updates as ÒreAyò progresses through research and development.

Join the waitlist