Skip to content

Commit

Permalink
More tests added.
Browse files Browse the repository at this point in the history
  • Loading branch information
ptth222 committed Oct 16, 2023
1 parent 58734ff commit 4a0bce7
Show file tree
Hide file tree
Showing 115 changed files with 38,726 additions and 44,664 deletions.
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ and use the example there to create it initially. The add_authors command can he
with building the Authors section if you already have a csv file with author
information. A good tool to help track down pesky JSON syntax errors is `here <https://csvjson.com/json_validator>`__.
There are also examples in the `example_configs <https://github.com/MoseleyBioinformaticsLab/academic_tracker/tree/main/example_configs>`__
directory of the GitHub repo. There are also more example in the supplemental
directory of the GitHub repo. There are also more examples in the supplemental
material of the paper https://doi.org/10.6084/m9.figshare.19412165.


Expand Down
2 changes: 2 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,5 +31,7 @@ API
:members:
.. automodule:: academic_tracker.webio
:members:
.. automodule:: academic_tracker.emails_and_reports_helpers
:members:


42 changes: 42 additions & 0 deletions docs/changelog.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
Change Log
==========

Version 2.0.0
~~~~~~~~~~~~~

Changes
-------
In the 1.0.0 version each source was queried in a certain order and if later sources found the
same publicaiton as a previous one it was simply ignored. Now a best attempt is made to try and
merge information from the previous source with information from later sources. An additional
"queried_sources" attribute was added to the publication object created for each publication to
indicate all of the sources where the publication was found. It is a list field, and each source
is appended to it as it is found.

Enhancements
------------
A "references" attribute was added to the publication object for each publication and the references
for the publication will appear there if available. It is a list of objects that have the attributes
"citation", "title", "PMID", "PMCID", and "DOI". Fields that can't be determined will have a null value.

More information is able to be obtained from PubMed, such as DOI author affiliations, and author ORCIDs.

Collective authors can now be specified and are handled appropriately when present on information from
queried sources.

All new publication attributes were added to the reporting and the documentation updated.

The raw queries from each source can now be saved using the --save-all-queries option. An "all_results.json"
file will be saved in the output if the option is given.

The --keep-duplicates option was added to reference_search. This allows the user to force the search
not to drop what it deems as duplicates. The default is that they are still dropped automatically, but
this option allows for an override when the program thinks, incorrectly, that 2 references are the same.

Bug Fixes
---------
Crossref publication dates will now have day and month when available. A bug made it so only the year
was captured even if month and day were available.



1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Documentation index:
api
license
todo
changelog


Indices and tables
Expand Down
165 changes: 10 additions & 155 deletions docs/jsonschema.rst
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,11 @@ to search for goes. Every author in this section will be queried during author_s

The first_name and last_name attributes are for the author's first and last names
respectively, and are used to validate that the author under search is the same
as the queried author.
as the queried author. There is a special type of author known as collective authors.
These are not individuals, but are instead a collective and are published that way.
Use the collective_name attribute to indicate that an author is a collective. This
attribute takes priority, so if it is present the author will be treated as a collective
author even if they have first_name and last_name attributes.

pubmed_name_search is used as the query string when querying sources. This is so
the user can specify exactly what to query rather than simply querying the first
Expand Down Expand Up @@ -170,161 +174,12 @@ gen_reports_and_emails_auth

Validating Schema
-----------------
.. code-block:: console

{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Configuration JSON",
"description": "Input file that contains information for how the program should run.",
"type": "object",
"properties": {
"project_descriptions" : {
"type": "object",
"minProperties": 1,
"additionalProperties": {
"type":"object",
"properties":{
"grants": {"type": "array", "minItems":1, "items": {"type": "string", "minLength": 1}},
"cutoff_year": {"type": "integer"},
"affiliations": {"type": "array", "minItems":1, "items": {"type": "string", "minLength": 1}},
"project_report": {"type": "object",
"properties":{
"columns": {"type": "object",
"minProperties":1,
"additionalProperties": {"type": "string", "minLength":1}},
"sort": {"type": "array", "uniqueItems":True, "items": {"type": "string", "minLength":1}, "minItems":1},
"separator":{"type":"string", "maxLength":1, "minLength":1},
"column_order":{"type":"array", "uniqueItems":True, "items": {"type": "string", "minLength":1}, "minItems":1},
"file_format":{"type":"string", "enum":["csv", "xlsx"]},
"filename":{"type":"string", "minLength":1},
"template": {"type": "string", "minLength":1},
"from_email": {"type": "string", "format": "email"},
"cc_email": {"type": "array", "items": {"type": "string", "format": "email"}},
"to_email": {"type": "array", "items": {"type": "string", "format": "email"}},
"email_body": {"type": "string", "minLength":1},
"email_subject": {"type": "string", "minLength":1},},
"dependentRequired":{
"from_email": ["email_body", "email_subject"],
"to_email": ["from_email", "email_body", "email_subject"]}},
"collaborator_report": {"type": "object",
"properties":{
"columns": {"type": "object",
"minProperties":1,
"additionalProperties": {"type": "string", "minLength":1}},
"sort": {"type": "array", "uniqueItems":True, "items": {"type": "string", "minLength":1}, "minItems":1},
"separator":{"type":"string", "maxLength":1, "minLength":1},
"column_order":{"type":"array", "uniqueItems":True, "items": {"type": "string", "minLength":1}, "minItems":1},
"file_format":{"type":"string", "enum":["csv", "xlsx"]},
"filename":{"type":"string", "minLength":1},
"template": {"type": "string", "minLength":1},
"from_email": {"type": "string", "format": "email"},
"cc_email": {"type": "array", "items": {"type": "string", "format": "email"}},
"to_email": {"type": "array", "items": {"type": "string", "format": "email"}},
"email_body": {"type": "string", "minLength":1},
"email_subject": {"type": "string", "minLength":1},},
"dependentRequired":{
"from_email": ["email_body", "email_subject"],
"to_email": ["from_email", "email_body", "email_subject"]},},
"authors": {"type": "array", "minItems":1, "items": {"type": "string", "minLength": 1}},
},
"required": ["grants", "affiliations"]
}
},
"ORCID_search" : {"type":"object",
"properties": {
"ORCID_key": {"type": "string", "minLength":1},
"ORCID_secret": {"type": "string", "minLength":1}},
"required": ["ORCID_key", "ORCID_secret"]},
"PubMed_search" : {"type":"object",
"properties": {
"PubMed_email": {"type": "string", "format":"email"}},
"required":["PubMed_email"]},
"Crossref_search" : {"type":"object",
"properties": {
"mailto_email": {"type": "string", "format":"email"}},
"required":["mailto_email"]},
"summary_report" : {"type": "object",
"properties":{
"columns": {"type": "object",
"minProperties":1,
"additionalProperties": {"type": "string", "minLength":1}},
"sort": {"type": "array", "uniqueItems":True, "items": {"type": "string", "minLength":1}, "minItems":1},
"separator":{"type":"string", "maxLength":1, "minLength":1},
"column_order":{"type":"array", "uniqueItems":True, "items": {"type": "string", "minLength":1}, "minItems":1},
"file_format":{"type":"string", "enum":["csv", "xlsx"]},
"filename":{"type":"string", "minLength":1},
"template": {"type": "string", "minLength":1},
"from_email": {"type": "string", "format": "email"},
"cc_email": {"type": "array", "items": {"type": "string", "format": "email"}},
"to_email": {"type": "array", "items": {"type": "string", "format": "email"}},
"email_body": {"type": "string", "minLength":1},
"email_subject": {"type": "string", "minLength":1},},
"dependentRequired":{
"from_email": ["email_body", "email_subject", "to_email"]}},
"Authors" : { "type": "object",
"minProperties": 1,
"additionalProperties": {
"type": "object",
"properties":{
"first_name": {"type": "string", "minLength":1},
"last_name":{"type": "string", "minLength":1},
"pubmed_name_search": {"type": "string", "minLength":1},
"email":{"type": "string", "format":"email"},
"ORCID":{"type": "string", "pattern":"^\d{4}-\d{4}-\d{4}-\d{3}[0,1,2,3,4,5,6,7,8,9,X]$"},
"grants": {"type": "array", "minItems":1, "items": {"type": "string", "minLength": 1}},
"cutoff_year": {"type": "integer"},
"affiliations": {"type": "array", "minItems":1, "items": {"type": "string", "minLength": 1}},
"scholar_id": {"type": "string", "minLength":1},
"project_report": {"type": "object",
"properties":{
"columns": {"type": "object",
"minProperties":1,
"additionalProperties": {"type": "string", "minLength":1}},
"sort": {"type": "array", "uniqueItems":True, "items": {"type": "string", "minLength":1}, "minItems":1},
"separator":{"type":"string", "maxLength":1, "minLength":1},
"column_order":{"type":"array", "uniqueItems":True, "items": {"type": "string", "minLength":1}, "minItems":1},
"file_format":{"type":"string", "enum":["csv", "xlsx"]},
"filename":{"type":"string", "minLength":1},
"template": {"type": "string", "minLength":1},
"from_email": {"type": "string", "format": "email"},
"cc_email": {"type": "array", "items": {"type": "string", "format": "email"}},
"to_email": {"type": "array", "items": {"type": "string", "format": "email"}},
"email_body": {"type": "string", "minLength":1},
"email_subject": {"type": "string", "minLength":1},},
"dependentRequired":{
"from_email": ["email_body", "email_subject"],
"to_email": ["from_email", "email_body", "email_subject"]}},
"collaborator_report": {"type": "object",
"properties":{
"columns": {"type": "object",
"minProperties":1,
"additionalProperties": {"type": "string", "minLength":1}},
"sort": {"type": "array", "uniqueItems":True, "items": {"type": "string", "minLength":1}, "minItems":1},
"separator":{"type":"string", "maxLength":1, "minLength":1},
"column_order":{"type":"array", "uniqueItems":True, "items": {"type": "string", "minLength":1}, "minItems":1},
"file_format":{"type":"string", "enum":["csv", "xlsx"]},
"filename":{"type":"string", "minLength":1},
"template": {"type": "string", "minLength":1},
"from_email": {"type": "string", "format": "email"},
"cc_email": {"type": "array", "items": {"type": "string", "format": "email"}},
"to_email": {"type": "array", "items": {"type": "string", "format": "email"}},
"email_body": {"type": "string", "minLength":1},
"email_subject": {"type": "string", "minLength":1},},
"dependentRequired":{
"from_email": ["email_body", "email_subject"],
"to_email": ["from_email", "email_body", "email_subject"]},},
},
"required" : ["first_name", "last_name", "pubmed_name_search"]
}
}
},
"required": ["project_descriptions", "ORCID_search", "PubMed_search", "Crossref_search", "Authors"]
}
.. literalinclude:: ../src/academic_tracker/tracker_schema.py
:start-at: config_schema
:end-before: ## config_end
:language: none



Example
Expand Down
Loading

0 comments on commit 4a0bce7

Please sign in to comment.