Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add info attribute to datasets #239

Open
mariosasko opened this issue Dec 13, 2020 · 1 comment
Open

Add info attribute to datasets #239

mariosasko opened this issue Dec 13, 2020 · 1 comment
Assignees
Labels
dataset Issues or pull requests related to datasets feature New feature or request
Milestone

Comments

@mariosasko
Copy link
Collaborator

mariosasko commented Dec 13, 2020

As discussed via Slack, it's a good idea to add some metadata to our datasets. This idea is partially inspired by HF datasets.

To do so, we will define a new DatasetInfo dataclass like this:

@dataclass
class DatasetInfo:
    """..."""
    citation: str
    description: str
    homepage: Optional[str] = None

Each dataset will have a new class-level info (or metadata) attribute to store this data. In DatasetABC, this attribute will be set to None.

Feel free to comment (e.g. if you think we should include some other metadata).
cc @mttk @ivansmokovic @FilipBolt

@mariosasko mariosasko added feature New feature or request dataset Issues or pull requests related to datasets labels Dec 13, 2020
@FilipBolt
Copy link
Collaborator

FilipBolt commented Dec 14, 2020

Some other metadata to potentially add: size in MB, row number (if row-like dataset). Citation could be optional as not all datasets have a published paper.

@FilipBolt FilipBolt added this to the 1.1.0 milestone Dec 17, 2020
@FilipBolt FilipBolt added this to Pending work in Podium - external release Dec 18, 2020
@mttk mttk moved this from Pending work to Deferred in Podium - external release Jan 15, 2021
@mttk mttk moved this from Deferred (ext. release) to Pending work in Podium - external release Jan 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset Issues or pull requests related to datasets feature New feature or request
Projects
Podium - external release
  
Pending work
Development

No branches or pull requests

3 participants