Skip to content
This repository has been archived by the owner on Mar 8, 2023. It is now read-only.

New package Clips-Mitads and Importer for Clips dataset #50

Open
2 of 5 tasks
Mte90 opened this issue Mar 13, 2020 · 1 comment
Open
2 of 5 tasks

New package Clips-Mitads and Importer for Clips dataset #50

Mte90 opened this issue Mar 13, 2020 · 1 comment
Labels
dataset help wanted Extra attention is needed

Comments

@Mte90
Copy link
Member

Mte90 commented Mar 13, 2020

Ref: http://www.clips.unina.it/it/index.jsp

Tasks:

For the first 3 steps

We need to parse the txt of every recording to generate a unique CSV and package this csv with all the wav and remove the rest of the files.

New package name Clips-Mitads, just as reference.

CSV to create

wav_filename,wav_filesize,transcript
common_voice_it_19574474.wav,175148,ben degna di ammirazione
common_voice_it_19574387.wav,291884,noi possiamo benissimo non ritrovarci in quello che facciamo

Scripts unfinished: https://gist.github.com/Mte90/116e5d8a17973b7bd9bd9050662736dd

  • The csv is missing the wav filesize
  • The extraction of the rar need to avoid overwrites and get the files from the "etichettate" folder if exist
@Mte90 Mte90 added the dataset label Mar 13, 2020
@Mte90 Mte90 changed the title Importer for CLips dataset audio+text Importer for Clips dataset audio+text Mar 14, 2020
@Mte90 Mte90 added the help wanted Extra attention is needed label Mar 16, 2020
@Mte90 Mte90 changed the title Importer for Clips dataset audio+text New package Clips-Mitads and Importer for Clips dataset Mar 16, 2020
@Mte90 Mte90 closed this as completed Nov 8, 2020
@nefastosaturo
Copy link
Collaborator

reopen it for further analysis

@nefastosaturo nefastosaturo reopened this Dec 14, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
dataset help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants