Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature 213 db load instructions #214

Merged
merged 32 commits into from
Jul 25, 2023
Merged

Conversation

bikegeek
Copy link
Collaborator

Pull Request Testing

  • Describe testing already performed for these changes:

    Verified scripts and configuration files work on 'mohawk'

  • Recommend testing for the reviewer(s) to perform, including the location of input datasets, and any additional instructions:

    Verify that formatting and instructions are correct in the documentation:

    https://metdataio.readthedocs.io/en/feature_213_db_load_instructions/Users_Guide/load_data.html#

    Used this YAML config file for 'mohawk' and data in /scratch/mwin (remove the .txt extension before using)

mohawk_data_loading_config.yaml.txt

  • Do these changes include sufficient documentation updates, ensuring that no errors or warnings exist in the build of the documentation? [Yes ]

  • Do these changes include sufficient testing updates? [NA]

  • Will this PR result in changes to the test suite? [No]

    If yes, describe the new output and/or changes to the existing output:

  • Please complete this pull request review by Before coordinated release.

Pull Request Checklist

See the METplus Workflow for details.

  • Review the source issue metadata (required labels, projects, and milestone).
  • [x Complete the PR definition above.
  • Ensure the PR title matches the feature or bugfix branch name.
  • Define the PR metadata, as permissions allow.
    Select: Reviewer(s)
    Select: Organization level software support Project or Repository level development cycle Project
    Select: Milestone as the version that will include these changes
  • After submitting the PR, select Development issue with the original issue number.
  • After the PR is approved, merge your changes. If permissions do not allow this, request that the reviewer do the merge.
  • Close the linked issue and delete your feature or bugfix branch from GitHub.

@bikegeek bikegeek added this to the METdataio-2.1.0 milestone Jul 24, 2023
@bikegeek bikegeek requested a review from jprestop July 24, 2023 18:49
@jprestop jprestop linked an issue Jul 24, 2023 that may be closed by this pull request
21 tasks
@jprestop
Copy link
Collaborator

@bikegeek I added a Status and Cycle under "Projects" and also linked with the issue under "Development". I will start reviewing soon.

@jprestop
Copy link
Collaborator

@bikegeek
I am following this part of the instructions:

Generate the new XML specification file by running the following:

cd path-to-METdataio-source/METdataio/METdbLoad/sql/scripts

*Replace path-to-METdataio-source to the location where the METdataio source code is saved.

python generate_xml_spec.py path-to/data_loading_config.yaml

*Replace the path-to with the path to the directory you created to store the copy of the data_loading_config.yaml
file as specified earlier.

When I run into an error:

mohawk:jpresto:/d2/personal/jpresto/METdataio/git/METdataio-feature_213_db_load_instructions/METdataio/METdbLoad/sql/scripts> python generate_xml_spec.py /d2/personal/jpresto/METdataio/pr_214/mohawk_data_loading_config.yaml
  File "generate_xml_spec.py", line 22
    db_name: str
           ^
SyntaxError: invalid syntax

I realized that Python was pointing to Python 2:

mohawk:jpresto:/d2/personal/jpresto/METdataio/git/METdataio-feature_213_db_load_instructions/METdataio/METdbLoad/sql/scripts> python --version
Python 2.7.16

so I tried running with Python 3, but then I get a different error:

mohawk:jpresto:/d2/personal/jpresto/METdataio/git/METdataio-feature_213_db_load_instructions/METdataio/METdbLoad/sql/scripts> python3 generate_xml_spec.py /d2/personal/jpresto/METdataio/pr_214/mohawk_data_loading_config.yaml
Traceback (most recent call last):
  File "generate_xml_spec.py", line 7, in <module>
    import yaml
ModuleNotFoundError: No module named 'yaml'

Can you please send the path to the version of Python you used on mohawk for testing? Thanks!

@bikegeek
Copy link
Collaborator Author

Here is the yaml env file:

name: metdataio_310
channels:

  • defaults
    dependencies:
  • python=3.10
  • mamba
  • lxml=4.9.1
  • numpy=1.24.2
  • pandas=1.5.2
  • pip=22.2.2
  • pymysql=1.0.2
  • pytest=7.2.1
  • python-dateutil=2.8.2
  • pyyaml=6.0
  • ca-certificates
  • certifi
  • openssl

@jprestop
Copy link
Collaborator

Thanks @bikegeek. What version of Python did you use on mohawk for testing?

@jprestop
Copy link
Collaborator

Actually, I don't mean to ask what version, but rather can you please send the path to the Python that you used?

@bikegeek
Copy link
Collaborator Author

bikegeek commented Jul 25, 2023 via email

@bikegeek
Copy link
Collaborator Author

bikegeek commented Jul 25, 2023 via email

@jprestop
Copy link
Collaborator

Thanks @bikegeek, the path is just what I needed!

When running this part of the instructions:

python met_db_load.py /path-to/load_met.xml

* Replace the path-to with the location where the load_met.xml file was saved.  This is the same directory
  you created to save the copy of the data_loading_config.yaml file.

I get an error:

mohawk:jpresto:/d2/personal/jpresto/METdataio/git/METdataio-feature_213_db_load_instructions/METdataio/METdbLoad/ush> **/d2/personal/mwin/miniconda3/envs/metdataio_310/bin/python met_db_load.py /d2/personal/jpresto/METdataio/pr_214/load_met.xml** 
INFO:root:METdbload Version: __version__ = "2.1.0-rc1"
INFO:root:--- *** --- Start METdbLoad --- *** ---
INFO:root:Begin time: 2023-07-25 16:45:25.212910
INFO:root:User name is: jpresto
INFO:root:Reading XML Load file
INFO:root:Database name is: mv_test
INFO:root:Initial number of files: 144
DEBUG:root:[--- Start read_data ---]
DEBUG:root:Lines in /scratch/mwin/grid_stat_000000L_20220206_053500V.stat: 50
... (more DEBUG messages here)
DEBUG:root:Shape of all_stat before transforms: (4244, 129)
DEBUG:root:Shape of all_stat after transforms: (4244, 131)
INFO:root:    >>> Read time: 0:00:02.063934
DEBUG:root:[--- End read_data ---]
ERROR:root:*** (1049, "Unknown database 'mv_test'") in run_sql ***
*** Error when connecting to database

I'm not sure what I need to do - apologies.

@bikegeek
Copy link
Collaborator Author

my apologies, I cleaned up the database, I re-created it and loaded the schema.

Copy link
Collaborator

@jprestop jprestop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bikegeek, I verified that the formatting and instructions are correct in the documentation to the best of my knowledge. As someone who as very little knowledge of this work, the instructions were easy to follow. Thank you.

I am wondering if it would be helpful to have the error I received:

ERROR:root:*** (1049, "Unknown database 'mv_test'") in run_sql ***
*** Error when connecting to database

in the Troubleshooting section? I defer to your judgement on that, as you will know best. In either case, I approve this request. Thank you for all of your work in putting all of this together! I think it will be very helpful!

@bikegeek
Copy link
Collaborator Author

Thanks for the suggestion, I added that error to the troubleshooting table. I kept the original format of the table, as all attempts to modify the table to make Error and Solution into separate columns made the text difficult to read:
https://metdataio.readthedocs.io/en/feature_213_db_load_instructions/Users_Guide/load_data.html#load-data

@jprestop
Copy link
Collaborator

@bikegeek I think it looks great!

@bikegeek bikegeek merged commit 7c8aa20 into develop Jul 25, 2023
@bikegeek bikegeek deleted the feature_213_db_load_instructions branch July 25, 2023 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Add documentation for loading data into a database
2 participants