Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subject values: evaluate relevancy of the existing list or update #249

Open
zzacharo opened this issue Nov 6, 2024 · 11 comments
Open

Subject values: evaluate relevancy of the existing list or update #249

zzacharo opened this issue Nov 6, 2024 · 11 comments
Assignees
Milestone

Comments

@zzacharo
Copy link
Contributor

zzacharo commented Nov 6, 2024

@zzacharo zzacharo converted this from a draft issue Nov 6, 2024
@zzacharo zzacharo added this to the SSPN milestone Nov 6, 2024
@PaulinaBaranowska
Copy link
Collaborator

I will also look into this

@PaulinaBaranowska
Copy link
Collaborator

  • Add missing subjects to CDS KB (Quantum Technology & Education and Outreach)
  • Find records with subjects in CDS that have values outside of this list (for CDS to provide to SIS)
  • Clean the values (SIS)

@PaulinaBaranowska
Copy link
Collaborator

Questions:

  • Other => XX (should it be the other way round? We do not need it as we already have Other Subjects?)
  • Particle Physics-Experiment is twice in CDS KB (once as AB and once as e), e should be deleted.

@PaulinaBaranowska PaulinaBaranowska closed this as completed by moving to For review in CDS-RDM - Library tasks Nov 7, 2024
@zzacharo zzacharo reopened this Nov 7, 2024
@ntarocco
Copy link
Contributor

ntarocco commented Nov 7, 2024

* Other => XX (should it be the other way round? We do not need it as we already have `Other Subjects`?)

@PaulinaBaranowska in the new CDS, it does not make any more sense to have a Other field, as you can insert free text. OK to drop the Other value in the new CDS (but keep it in current CDS)?

* Particle Physics-Experiment is twice in CDS KB (once as `AB` and once as `e`), e should be deleted.

Thanks, we will fix it. We can keep the AB, drop the e and check if we need to bulk-update records. What do you think?

@michamos
Copy link
Collaborator

michamos commented Nov 7, 2024

@ntarocco do you mean it won't be a controlled vocabulary? or it will be but there is also an escape hatch?

@zzacharo
Copy link
Contributor Author

zzacharo commented Nov 7, 2024

@ntarocco do you mean it won't be a controlled vocabulary? or it will be but there is also an escape hatch?

It means that in the system if a user doesn't find the value in the controlled vocabulary they can always add it as free text.

@michamos
Copy link
Collaborator

I don't think we want to allow free text subjects here, that defeats the purpose. Can't they use keywords for that? or are subjects and keywords the same thing? It might be useful if you give some more info on how this would look in the schema.

@zzacharo
Copy link
Contributor Author

In the new system, user will autocomplete from the subjects vocabulary and if they do not find what they are looking for then they add it as free text. We store both the controlled values in the subjects field as follows:

"subjects": [{
    "id": "Accelerators and Storage Rings",
    "subject": "Accelerators and Storage Rings",
    "scheme": "CERN"
  },
  {
    "subject": "myvalue"
}],

Subjects without a specific id are considered keywords. These are shown at the moment like below in the record's detail page:

Screenshot 2024-11-14 at 09 57 33

You can see for example this record: https://dev-cds-rdm.web.cern.ch/records/mddtr-zvt57

@michamos
Copy link
Collaborator

Thanks @zzacharo for the explanation. Note that we rely on the subjects for the annual report stats (see bottom diagram on the dashboard, but not sure to what extent we care about those for the rest of CDS. We will discuss further with @agentilb.

@PaulinaBaranowska
Copy link
Collaborator

PaulinaBaranowska commented Nov 15, 2024

After some discussion with @agentilb, the approach to autocomplete from the list of subjects, and then if they don't find what they are looking for to input it as freetext, that seems like a good solution.

Would it be possible to extract any values in 65017_a that are outside of the values from the Knowledge base and send them to us? We can then clean them, or if that is not necessary or too much work, we can migrate them to the keywords in the new CDS.

@zzacharo
Copy link
Contributor Author

zzacharo commented Nov 27, 2024

@PaulinaBaranowska here is the subject_values.csv file with all the available 65017_a values on CDS that do not belong to the KB.

On SSPN, we haven't seen any value outside the controlled vocabulary so if you agree we can go ahead with proposed subject implementation and you can clean the values for the next collections. Is this fine?

@zzacharo zzacharo moved this from For review to In Progress in CDS-RDM - Library tasks Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

No branches or pull requests

4 participants