Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize and make consistent how name/alternate institution queries are crafted in both WoS and Pubmed #1062

Open
peetucket opened this issue May 17, 2019 · 2 comments
Labels

Comments

@peetucket
Copy link
Member

peetucket commented May 17, 2019

Currently the alternate institution lists need to be edited before crafting the query (to remove things like &, university, etc.). This is now done differently in both WoS vs Pubmed. We also have a different way of creating (or not) alternate naming variants to send to the query. We may want to create methods in a consistent way to do this for both harvesters if possible.

@peetucket
Copy link
Member Author

see #1060 for the work that added pubmed query editing

@peetucket
Copy link
Member Author

peetucket commented May 17, 2019

For example, in addition to what we do in the https://github.com/sul-dlss/sul_pub/blob/master/lib/pubmed/query_author.rb class, it appears we already have some code that is doing something similar for the WoS search, this class: https://github.com/sul-dlss/sul_pub/blob/master/lib/agent/author_institution.rb

It is stripping things like "and" and "university". It is used here to construct a list of institutions to add to the query:

https://github.com/sul-dlss/sul_pub/blob/master/lib/web_of_science/query_author.rb#L40-L42

We also end up creating name variants in https://github.com/sul-dlss/sul_pub/blob/master/lib/agent/author_name.rb that is used in the WoS queries, that we don't take advantage of in the Pubmed queries.

It would be nice to use the classes in lib/agent for both WoS and Pubmed for consistency.

Thoughts on re-using this logic? Note that the reason we ended up stripping "University" and "Institution" and "College" in WoS queries is I believe for a similar reason (it was picking up extra stuff), which is perhaps not a problem for Pubmed. But wanted to acknowledge a bit of duplication here for consideration.

author=Author.find(37959)
WebOfScience::QueryAuthor.new(author).send(:institutions)
=> ["stanford", "oregon health & science", "washington"]

@peetucket peetucket changed the title Normalize and make consistent how alternate institution queries are crafted in both WoS and Pubmed Normalize and make consistent how name/alternate institution queries are crafted in both WoS and Pubmed Jun 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant