-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data quality issues #4
Comments
@huangfw |
@huangfw Besides, we have performed quality grading on all automatically annotated data in DocGemone, assigning it to three quality levels: tier-1, tier-2, and tier-3, as shown in Figure 3-(b) in DocGenome paper. You can choose data with a quality level of tier-1. |
@huangfw Hello, we have added the information about the different quality levels of the trainset for reference. Later on we will also go into more detail about the various components of the trainset. |
Hello, thank you for your impressive work! When I was visualizing the data, I found that some documents had problems such as missing detection boxes and incomplete formula detection boxes. Please help confirm whether there are problems with data quality. Thank you!
The data I downloaded from huggingface, the randomly selected files are:
astro-ph.CO/1804.05921
astro-ph.CO/1005.1278
The text was updated successfully, but these errors were encountered: