- Catalyze the industry to create better low-resource language AI models by creating a popular, highly referenced destination for researchers (and the press) to compare models.
- Provide a place where organizations can determine which AI Speech recognition and translation models work best for their particular use case, especially with low-resource languages, starting with Indian languages.
- Long-term environment for Language data (a HuggingFace version).
- Producers. Interested in making better models. Researchers primarily.
- Consumers. Interested in using better models.
- NGOs like ARTPARK, Wadhwani, etc.
- Grassroots NGOs like Avanti, DigitalGreen, Pratham (education), etc.
- Private Orgs like PayTM, Setu, etc.
- Indic language content producers - InShorts, DailyHunt, KukuFM, etc.
- Usage by organisations - Get 20/30 organisations that use the leaderboard. Get their translations and publish. Get diversity in languages. 8-10.
- Citations (for the white paper).
- Engagement with researchers.
To launch a basic version of the leaderboard with essential functionalities.
- How seriously are you evaluating AI for solutions?
- What is important in the models that you have chosen?
- Do you believe these solutions would require low-resource language data?
- Are you using any AI users to understand your users today? Choose from openAI,
- What are common questions that their customers would ask? In local language. Just record and send in whatsapp.
- Make a small email for the asks. Design a questionnare on how orgs would use the leaderboard. Also would help in demand testing. (eg. DigitalGreen
- Write a white-paper.
- Leaderboard Interface: A simple user interface showcasing basic comparison results.
- Indian Languages Map: Integration of a map displaying the diversity of Indian languages, highlighting the need for low-resource language AI models.
- Initial Model Set: Incorporation of basic AI models for initial testing, focusing on Hindi, Kannada, and Telugu.
- Data Submission Interface (?): Allow organizations to submit their audio samples for testing (Gooey.ai will be a provider mentioned but since it is open-source, we can expect other companies to add such a CTA to this leaderboard).
- Basic Comparison Algorithm: A simple algorithm to compare model performance using submitted data.
Cycle 1 - Cycle 2 - March 1st 2023 (4 cycles from January).
- Is the timeline feasible?
- What would this version cost? Does this need sponsorship?
To enhance the leaderboard with more features and broader language support.
- Expanded Language Support: Addition of more low-resource Indian languages.
- Enhanced Comparison Algorithm: Improved algorithm for more accurate model comparisons.
- Partners Integration: Onboard new partners and integrate their models into the leaderboard.
To position the leaderboard as a globally recognized platform for low-resource language AI model comparison.
- Global Language Inclusion: Expand the leaderboard to include low-resource languages from around the world.
- Community Engagement: Foster a community for discussions, feedback, and collaborative improvements.
- Marketing and Outreach: Intensify marketing efforts for global reach and recognition.
Timeline: October 2023.