Skip to content

PeoplePlusAI/Low-Resource-Language-Leaderboard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Goals:

  1. Catalyze the industry to create better low-resource language AI models by creating a popular, highly referenced destination for researchers (and the press) to compare models.
  2. Provide a place where organizations can determine which AI Speech recognition and translation models work best for their particular use case, especially with low-resource languages, starting with Indian languages.
  3. Long-term environment for Language data (a HuggingFace version).

Personas

  1. Producers. Interested in making better models. Researchers primarily.
  2. Consumers. Interested in using better models.
  • NGOs like ARTPARK, Wadhwani, etc.
  • Grassroots NGOs like Avanti, DigitalGreen, Pratham (education), etc.
  • Private Orgs like PayTM, Setu, etc.
  • Indic language content producers - InShorts, DailyHunt, KukuFM, etc.

How do we measure our success?

  1. Usage by organisations - Get 20/30 organisations that use the leaderboard. Get their translations and publish. Get diversity in languages. 8-10.
  2. Citations (for the white paper).
  3. Engagement with researchers.

Version 0 (v0): Initial Launch PoC

Objective:

To launch a basic version of the leaderboard with essential functionalities.

Hypotheses - what we want to get from the organisations.

  1. How seriously are you evaluating AI for solutions?
  2. What is important in the models that you have chosen?
  3. Do you believe these solutions would require low-resource language data?
  4. Are you using any AI users to understand your users today? Choose from openAI,
  5. What are common questions that their customers would ask? In local language. Just record and send in whatsapp.

Steps:

  1. Make a small email for the asks. Design a questionnare on how orgs would use the leaderboard. Also would help in demand testing. (eg. DigitalGreen
  2. Write a white-paper.

Features:

  1. Leaderboard Interface: A simple user interface showcasing basic comparison results.
  2. Indian Languages Map: Integration of a map displaying the diversity of Indian languages, highlighting the need for low-resource language AI models.
  3. Initial Model Set: Incorporation of basic AI models for initial testing, focusing on Hindi, Kannada, and Telugu.
  4. Data Submission Interface (?): Allow organizations to submit their audio samples for testing (Gooey.ai will be a provider mentioned but since it is open-source, we can expect other companies to add such a CTA to this leaderboard).
  5. Basic Comparison Algorithm: A simple algorithm to compare model performance using submitted data.

Timeline:

Cycle 1 - Cycle 2 - March 1st 2023 (4 cycles from January).

Open Questions

  1. Is the timeline feasible?
  2. What would this version cost? Does this need sponsorship?

Version 1 (v1): Expanded Capabilities with Sponsorship

Objective:

To enhance the leaderboard with more features and broader language support.

Features:

  1. Expanded Language Support: Addition of more low-resource Indian languages.
  2. Enhanced Comparison Algorithm: Improved algorithm for more accurate model comparisons.
  3. Partners Integration: Onboard new partners and integrate their models into the leaderboard.

Version 2 (v2): Advanced Features and Global Expansion

Objective:

To position the leaderboard as a globally recognized platform for low-resource language AI model comparison.

Features:

  1. Global Language Inclusion: Expand the leaderboard to include low-resource languages from around the world.
  2. Community Engagement: Foster a community for discussions, feedback, and collaborative improvements.
  3. Marketing and Outreach: Intensify marketing efforts for global reach and recognition.

Timeline: October 2023.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published