Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor load_mongodump.sh script #150

Merged
merged 15 commits into from
Sep 23, 2024

Conversation

iantei
Copy link
Contributor

@iantei iantei commented Sep 10, 2024

Changes updated in load_mongodump.sh identical to e-mission/op-admin-dashboard#122

Refactor changes:

Better error handling, debugging information and integration with Docker compose configuration file for database host configuration.

…ebugging information, and integration with Docker Compose configuration file for database host configuration.
…/DB_NAME. Add docker-compose.dev.yml into gitignore. Update README.md file with information about not to make changes into docker-compose.dev.yml and update about forced push if changes are made to docker-compose.dev.yml.
@iantei
Copy link
Contributor Author

iantei commented Sep 10, 2024

Test Scenario and execution:

Tried testing with four different scenarios with cortezebikes and vail datasest:

A. Copied the cortezebikes dataset under the em-public-dashboard directory. There was pre-existing openpath_prod_cortezebikes database in the MongoDB. [Some issues]

  1. Ran the script load_mongodump.sh - was getting E11000 duplicate key error collection:;
    stopped the script.
  2. Cleared the database manually by entering into Mongo container - used db.dropDatabase()
  3. Re-executed the script, still got the below error:
    E11000 duplicate key error collection
Execution and error log:

``` ge_updateable_models index: _id_ dup key: { _id: ObjectId('6562e2cd9627a7b9fa5b80f4') } 2024-09-10T16:51:27.149+0000 continuing through error: E11000 duplicate key error collection: openpath_prod_cortezebikes.Stage_updateable_models index: _id_ dup key: { _id: ObjectId('6562e2cf9627a7b9fa5b80f5') } 2024-09-10T16:51:27.149+0000 continuing through error: E11000 duplicate key error collection: openpath_prod_cortezebikes.Stage_updateable_models index: _id_ dup key: { _id: ObjectId('6562e2d19627a7b9fa5b80f6') } 2024-09-10T16:51:27.149+0000 continuing through error: E11000 duplicate key error collection: openpath_prod_cortezebikes.Stage_updateable_models index: _id_ dup key: { _id: ObjectId('65643417ac99012d039e6f41') } 2024-09-10T16:51:27.152+0000 [########################] openpath_prod_cortezebikes.Stage_updateable_models 2.45GB/2.45GB (100.0%) 2024-09-10T16:51:27.152+0000 restoring indexes for collection openpath_prod_cortezebikes.Stage_updateable_models from metadata 2024-09-10T16:51:32.848+0000 finished restoring openpath_prod_cortezebikes.Stage_updateable_models (5 documents, 542 failures) 2024-09-10T16:54:36.468+0000 finished restoring openpath_prod_cortezebikes.Stage_timeseries (1070889 documents, 2427000 failures) 2024-09-10T16:54:36.476+0000 1663213 document(s) restored successfully. 2460158 document(s) failed to restore. Database restore complete. ashrest2-35384s:em-public-dashboard ashrest2$ ```

B. Copied vail dataset under the em-public-dashboard directory. The stage database was empty. [Execution successful]

  1. Ran the script load_mongodump.sh from em-public-dashboard directory.
  2. Everything ran perfectly
Execution log:

2024-09-10T17:11:01.450+0000 [######################..] Stage_database.Stage_timeseries 2.05GB/2.16GB (94.9%) 2024-09-10T17:11:04.450+0000 [#######################.] Stage_database.Stage_timeseries 2.14GB/2.16GB (98.7%) 2024-09-10T17:11:06.504+0000 [########################] Stage_database.Stage_timeseries 2.16GB/2.16GB (100.0%) 2024-09-10T17:11:06.504+0000 restoring indexes for collection Stage_database.Stage_timeseries from metadata 2024-09-10T17:12:40.880+0000 finished restoring Stage_database.Stage_timeseries (3165925 documents, 0 failures) 2024-09-10T17:12:40.883+0000 3485167 document(s) restored successfully. 0 document(s) failed to restore. Database restore complete. ashrest2-35384s:em-public-dashboard ashrest2$

C. Attempt to reload the script with vail dataset. The stage database was not empty. [Execution successful]

  1. Ran the script load_mongodump.sh from em-public-dashboard directory.
  2. Everything ran perfectly.
Execution log:

``` ashrest2-35384s:em-public-dashboard ashrest2$ bash viz_scripts/docker/load_mongodump.sh vail_2022-05-09.tar.gz Script Directory: viz_scripts/docker Configuration File Path: viz_scripts/docker/../../docker-compose.dev.yml MongoDump File Path: vail_2022-05-09.tar.gz Configuration file details: -rw-r--r-- 1 ashrest2 NREL_NT\Domain Users 1019 Sep 10 10:05 viz_scripts/docker/../../docker-compose.dev.yml ```

This worked fine.

D. Attempt to reload the script with cortezebikes dataset. [Execution successful]

  1. Removed the vail dataset, and copied cortezebikes dataset under em-public-dashboard directory.
  2. Ran the script load_mongodump.sh from em-public-dashboard directory.
  3. Everything ran perfectly.
Execution log:

``` 2024-09-10T17:49:25.126+0000 [########################] openpath_prod_cortezebikes.Stage_updateable_models 2.45GB/2.45GB (100.0%)

2024-09-10T17:49:28.126+0000 [########################] openpath_prod_cortezebikes.Stage_updateable_models 2.45GB/2.45GB (100.0%)
2024-09-10T17:49:28.908+0000 [########################] openpath_prod_cortezebikes.Stage_updateable_models 2.45GB/2.45GB (100.0%)
2024-09-10T17:49:28.909+0000 restoring indexes for collection openpath_prod_cortezebikes.Stage_updateable_models from metadata
2024-09-10T17:49:50.402+0000 finished restoring openpath_prod_cortezebikes.Stage_updateable_models (5 documents, 0 failures)
2024-09-10T17:49:50.402+0000 4122829 document(s) restored successfully. 0 document(s) failed to restore.

</p>
</details> 


This worked fine.

Copy link
Member

@Abby-Wheelis Abby-Wheelis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of quick notes! I have also tried this, it seemed to work well!

.gitignore Outdated
@@ -132,3 +132,6 @@ dmypy.json

# Pyre type checker
.pyre/

# docker-compose-dev.yml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should both docker-compose files be excluded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good idea. I have excluded both docker-compose files to be excluded.

@@ -25,7 +25,7 @@ services:
depends_on:
- db
environment:
- DB_HOST=db
- DB_HOST=mongo://db/DB_NAME
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make this change in the other docker-compose file as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs to be mongodb and not mongo

Copy link
Member

@Abby-Wheelis Abby-Wheelis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more small changes!

.gitignore Outdated
# docker-compose yml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you get rid of this extra line please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I have removed the comment

@@ -25,7 +25,7 @@ services:
depends_on:
- db
environment:
- DB_HOST=db
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be helpful to make these same changes in docker-compose.yml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent of using load_mongodump.sh is to load the dataset into mongo db.
A couple of observations:

  • We are loading the config file in load_mongodump.sh from docker-compose.dev.yml.
    CONFIG_FILE="$SCRIPT_DIR/../../docker-compose.dev.yml"
  • We would need to make some changes in load_mongodump.sh if we want to configure for both docker-compose.yml and docker-compose.dev.yml.
  • We can anyways load the dataset with the current changes for the public dashboard db, do you think the additional changes would be a necessity? Moreover, it's implemented as in with op-admin dashboard.

Please let me know what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was that the updated default with the formatting might make it easier to switch that dataset in use when editing the docker file (ie update from mongodb://db/DB_NAME to mongodb://db/openpath_prod_open_acess instead of from db, but it sounds like there are more technical reasons that differ between the two files, so I'm ok with leaving the other dockerfile alone

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, users don't need to rename DB_HOST=mongodb://db/DB_NAME manually; the script edits the file directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverting back to DB_HOST=db such that it's consistent between docker-compose.dev.yml and docker-compose.yml

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, I don't understand why you reverted this change. I am also not sure how it works, given that load_mongodump.sh expects a mongodb:// URL so that it can change the DB_NAME appropriately, I bet that this was not tested, because I don't see how it could work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have documented my testing below #150 (comment)
In the load_mongodump.sh:

# Update the docker-compose configuration file with the actual DB_HOST
DB_HOST="mongodb://db/$DB_NAME"
sed -i.bak "s|DB_HOST:.*|DB_HOST: $DB_HOST|" "$CONFIG_FILE"

I understand, it would assign mongdb://db$DB_NAME to the configuration file.

I have also re-done testing and updated detailed logs below. #150 (comment)

@Abby-Wheelis
Copy link
Member

@iantei Can you mark this as ready for review?

Copy link
Contributor

@shankari shankari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A. Copied the cortezebikes dataset under the em-public-dashboard directory. There was pre-existing openpath_prod_cortezebikes database in the MongoDB. [Some issues]

As you know, this is not supposed to happen. Have you tried to debug this? What are the other logs that you saw, particularly around DB_NAME?

.gitignore Outdated Show resolved Hide resolved
@@ -25,7 +25,7 @@ services:
depends_on:
- db
environment:
- DB_HOST=db
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, users don't need to rename DB_HOST=mongodb://db/DB_NAME manually; the script edits the file directly.

# Directory of the script
SCRIPT_DIR="$(dirname "$0")"

# Path to the configuration file (one level up)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should be two levels up

if [ "$#" -ne 1 ]; then
echo "Usage: $0 <mongodump-file>"
echo " <mongodump-file> : The path to the MongoDB dump file to be restored."
echo " run git add -f <docker compose file> after using this command"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this since it contradicts the README instructions. I know it was in the original PR, but was removed in response to review feedback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iantei the entire if check does not contradict the README, only the last echo. I will fix this before merging so that we don't have to wait for yet another round of reviews.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have re-introduced the check while removing the last echo statement.

@iantei
Copy link
Contributor Author

iantei commented Sep 18, 2024

As you know, this is not supposed to happen. Have you tried to debug this? What are the other logs that you saw, particularly around DB_NAME?

This was my first time running the script.
I looked up for the error code E11000 duplicate key error collection. The log was polluted mostly with the E11000 duplicate key error collection error. I didn't observe any other errors in particular.
The issue usually shows up when there is already some unique identical key present, when updating the document in MongoDB. I am surprised how dropping the database using db.dropDatabase() didn't resolve it.
Since, I have other PR under priority, and following attempts with other dataset and same dataset was success. I didn't look further to debug it; rather documented it for now. I can look into it once I am done with changes related to baseMode color changes and Inferred label bars.

@iantei
Copy link
Contributor Author

iantei commented Sep 18, 2024

I tried running the script again, under the same condition for the database - openpath_prod_cortezebikes.
Everything ran smoothly.

2024-09-18T20:45:06.740+0000    [########################]  openpath_prod_cortezebikes.Stage_updateable_models  2.45GB/2.45GB  (100.0%)
2024-09-18T20:45:09.739+0000    [########################]  openpath_prod_cortezebikes.Stage_updateable_models  2.45GB/2.45GB  (100.0%)
2024-09-18T20:45:12.738+0000    [########################]  openpath_prod_cortezebikes.Stage_updateable_models  2.45GB/2.45GB  (100.0%)
2024-09-18T20:45:15.740+0000    [########################]  openpath_prod_cortezebikes.Stage_updateable_models  2.45GB/2.45GB  (100.0%)
2024-09-18T20:45:15.992+0000    finished restoring openpath_prod_cortezebikes.Stage_timeseries (3497889 documents, 0 failures)
2024-09-18T20:45:18.643+0000    [########################]  openpath_prod_cortezebikes.Stage_updateable_models  2.45GB/2.45GB  (100.0%)
2024-09-18T20:45:18.644+0000    restoring indexes for collection openpath_prod_cortezebikes.Stage_updateable_models from metadata
2024-09-18T20:45:21.245+0000    finished restoring openpath_prod_cortezebikes.Stage_updateable_models (5 documents, 0 failures)
2024-09-18T20:45:21.246+0000    4122829 document(s) restored successfully. 0 document(s) failed to restore.

@Abby-Wheelis
Copy link
Member

@iantei it looks like you have addressed the new review comments, can you mark it as not a draft so it is ready for review again?

@iantei iantei marked this pull request as ready for review September 18, 2024 22:49
@iantei
Copy link
Contributor Author

iantei commented Sep 21, 2024

With the docker-compose.dev.yml as:

    environment:
      - DB_HOST=db
      - WEB_SERVER_HOST=0.0.0.0
      - CRON_MODE=
      - STUDY_CONFIG=stage-program
Details re-testing for load_mongodump.sh script


ashrest2-35384s:em-public-dashboard ashrest2$ bash viz_scripts/docker/load_mongodump.sh openpath-prod-usaid-laos-ev-snapshot-dec-20.tar.gz 
Script Directory: viz_scripts/docker
Configuration File Path: viz_scripts/docker/../../docker-compose.dev.yml
MongoDump File Path: openpath-prod-usaid-laos-ev-snapshot-dec-20.tar.gz
Configuration file details:
-rw-r--r--  1 ashrest2  NREL_NT\Domain Users  1003 Sep 18 13:23 viz_scripts/docker/../../docker-compose.dev.yml
openpath_prod_usaid_laos_ev
Database Name: openpath_prod_usaid_laos_ev
Updated docker-compose file:
version: "3"
services:
  dashboard:
    image: em-pub-dash-dev/frontend
    build:
        context: frontend
        dockerfile: docker/Dockerfile.dev
    depends_on:
      - db
    ports:
      # DASH in numbers
      - "3274:6060"
    volumes:
      - ./frontend:/public
      - ./plots:/public/plots
    networks:
       - emission
  notebook-server:
    image: em-pub-dash-dev/viz-scripts
    build:
      context: viz_scripts
      dockerfile: docker/Dockerfile.dev
      args:
        SERVER_IMAGE_TAG: ${SERVER_IMAGE_TAG}
    depends_on:
      - db
    environment:
      - DB_HOST=db
      - WEB_SERVER_HOST=0.0.0.0
      - CRON_MODE=
      - STUDY_CONFIG=stage-program
    ports:
      # ipynb in numbers
      - "47962:8888"
    networks:
      - emission
    volumes:
      - ./viz_scripts:/usr/src/app/saved-notebooks
      - ./plots:/plots
  db:
    image: mongo:4.4.0
    volumes:
      - mongo-data:/data/db
    networks:
       - emission

networks:
  emission:

volumes:
  mongo-data:

Successfully copied 282MB to em-public-dashboard-db-1:/tmp
Clearing existing database
MongoDB shell version v4.4.0
connecting to: mongodb://127.0.0.1:27017/openpath_prod_usaid_laos_ev?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID(“***”) }
MongoDB server version: 4.4.0
{ "dropped" : "openpath_prod_usaid_laos_ev", "ok" : 1 }
Restoring the dump from openpath-prod-usaid-laos-ev-snapshot-dec-20.tar.gz to database openpath_prod_usaid_laos_ev
dump/
dump/openpath_prod_usaid_laos_ev/
dump/openpath_prod_usaid_laos_ev/Stage_timeseries_error.metadata.json
dump/openpath_prod_usaid_laos_ev/Stage_analysis_timeseries.metadata.json
dump/openpath_prod_usaid_laos_ev/Stage_pipeline_state.metadata.json
dump/openpath_prod_usaid_laos_ev/Stage_usercache.metadata.json
dump/openpath_prod_usaid_laos_ev/Stage_push_token_mapping.metadata.json
dump/openpath_prod_usaid_laos_ev/Stage_timeseries.metadata.json
dump/openpath_prod_usaid_laos_ev/Stage_uuids.metadata.json
dump/openpath_prod_usaid_laos_ev/Stage_Profiles.metadata.json
dump/openpath_prod_usaid_laos_ev/Stage_updateable_models.metadata.json
dump/openpath_prod_usaid_laos_ev/Stage_analysis_timeseries.bson
dump/openpath_prod_usaid_laos_ev/Stage_pipeline_state.bson
dump/openpath_prod_usaid_laos_ev/Stage_timeseries.bson
…
2024-09-21T14:42:59.305+0000    [######################..]  openpath_prod_usaid_laos_ev.Stage_timeseries  2.31GB/2.42GB  (95.4%)
2024-09-21T14:43:02.307+0000    [#######################.]  openpath_prod_usaid_laos_ev.Stage_timeseries  2.34GB/2.42GB  (96.7%)
2024-09-21T14:43:05.305+0000    [#######################.]  openpath_prod_usaid_laos_ev.Stage_timeseries  2.36GB/2.42GB  (97.6%)
2024-09-21T14:43:08.314+0000    [#######################.]  openpath_prod_usaid_laos_ev.Stage_timeseries  2.41GB/2.42GB  (99.5%)
2024-09-21T14:43:09.263+0000    [########################]  openpath_prod_usaid_laos_ev.Stage_timeseries  2.42GB/2.42GB  (100.0%)
2024-09-21T14:43:09.264+0000    restoring indexes for collection openpath_prod_usaid_laos_ev.Stage_timeseries from metadata
2024-09-21T14:45:17.772+0000    finished restoring openpath_prod_usaid_laos_ev.Stage_timeseries (3494052 documents, 0 failures)
2024-09-21T14:45:17.776+0000    4441126 document(s) restored successfully. 0 document(s) failed to restore.
Database restore complete.

@iantei
Copy link
Contributor Author

iantei commented Sep 21, 2024

Re-executed the load_mongodump.sh script after the last commit.


2024-09-21T16:47:36.438+0000    [##################......]  openpath_prod_usaid_laos_ev.Stage_timeseries  1.85GB/2.42GB  (76.6%)
2024-09-21T16:47:39.438+0000    [###################.....]  openpath_prod_usaid_laos_ev.Stage_timeseries  1.93GB/2.42GB  (79.7%)
2024-09-21T16:47:42.438+0000    [###################.....]  openpath_prod_usaid_laos_ev.Stage_timeseries  2.01GB/2.42GB  (83.1%)
2024-09-21T16:47:45.439+0000    [####################....]  openpath_prod_usaid_laos_ev.Stage_timeseries  2.06GB/2.42GB  (85.1%)
2024-09-21T16:47:48.439+0000    [####################....]  openpath_prod_usaid_laos_ev.Stage_timeseries  2.09GB/2.42GB  (86.2%)
2024-09-21T16:47:51.438+0000    [####################....]  openpath_prod_usaid_laos_ev.Stage_timeseries  2.11GB/2.42GB  (87.4%)
2024-09-21T16:47:54.430+0000    [#####################...]  openpath_prod_usaid_laos_ev.Stage_timeseries  2.15GB/2.42GB  (88.7%)
2024-09-21T16:47:57.433+0000    [#####################...]  openpath_prod_usaid_laos_ev.Stage_timeseries  2.20GB/2.42GB  (91.0%)
2024-09-21T16:48:00.430+0000    [######################..]  openpath_prod_usaid_laos_ev.Stage_timeseries  2.23GB/2.42GB  (92.3%)
2024-09-21T16:48:03.430+0000    [######################..]  openpath_prod_usaid_laos_ev.Stage_timeseries  2.29GB/2.42GB  (94.6%)
2024-09-21T16:48:06.431+0000    [######################..]  openpath_prod_usaid_laos_ev.Stage_timeseries  2.31GB/2.42GB  (95.5%)
2024-09-21T16:48:09.430+0000    [#######################.]  openpath_prod_usaid_laos_ev.Stage_timeseries  2.35GB/2.42GB  (97.3%)
2024-09-21T16:48:12.430+0000    [#######################.]  openpath_prod_usaid_laos_ev.Stage_timeseries  2.39GB/2.42GB  (99.0%)
2024-09-21T16:48:14.801+0000    [########################]  openpath_prod_usaid_laos_ev.Stage_timeseries  2.42GB/2.42GB  (100.0%)
2024-09-21T16:48:14.801+0000    restoring indexes for collection openpath_prod_usaid_laos_ev.Stage_timeseries from metadata
2024-09-21T16:49:36.672+0000    finished restoring openpath_prod_usaid_laos_ev.Stage_timeseries (3494052 documents, 0 failures)
2024-09-21T16:49:36.674+0000    4441126 document(s) restored successfully. 0 document(s) failed to restore.
Database restore complete.

Looks good

@iantei
Copy link
Contributor Author

iantei commented Sep 21, 2024

The config for docker-compose.dev.yml in public-dashboard is DB_HOST=xx than DB_HOST:xx in op-admin dashboard.
458e0f8 will update the docker-compose.dev.yml automatically for the right DB_HOST.

@iantei
Copy link
Contributor Author

iantei commented Sep 21, 2024

Final testing:

2024-09-21T17:42:49.331+0000    [#######################.]  openpath_prod_usaid_laos_ev.Stage_timeseries  2.38GB/2.42GB  (98.2%)
2024-09-21T17:42:51.791+0000    [########################]  openpath_prod_usaid_laos_ev.Stage_timeseries  2.42GB/2.42GB  (100.0%)
2024-09-21T17:42:51.791+0000    restoring indexes for collection openpath_prod_usaid_laos_ev.Stage_timeseries from metadata
2024-09-21T17:44:18.794+0000    finished restoring openpath_prod_usaid_laos_ev.Stage_timeseries (3494052 documents, 0 failures)
2024-09-21T17:44:18.797+0000    4441126 document(s) restored successfully. 0 document(s) failed to restore.
Database restore complete.

Looks good

@shankari
Copy link
Contributor

@iantei the testing here is not just if the load_mongodump.sh command runs - it will always run. But if it does not modify docker-compose.dev.yml correctly, then the public dashboard will not be able to use the newly added data because it will still try to use the Stage_database which will not exist.

Having said that, I see what you mean by the sed command around DB_HOST. I would suggest that, as part of testing, you report how the docker-compose.dev.yml was edited. I can then merge this change.

In a future commit, I think it would be helpful to unify the way that the environment variables are represented in the public and admin dashboards, but we can deal with that as part of e-mission/e-mission-docs#1082

@shankari
Copy link
Contributor

Please also indicate the diff between the scripts on the admin and public dashboards. I would not have caught the fact that the entire check in #150 (comment) had been removed if it is was not flagged by a comment.

@iantei
Copy link
Contributor Author

iantei commented Sep 22, 2024

report how the docker-compose.dev.yml was edited.

In short, 458e0f8 handles the task of assigning right DB_HOST to the docker-compose.dev.yml file

Detailed testing scenario:

  • Default docker-compose.dev.yml
services:
 notebook-server:
    environment:
      - DB_HOST=mongodb:db
      ...
      - STUDY_CONFIG=stage-program

Execution steps:


ashrest2-35384s:em-public-dashboard ashrest2$ docker-compose -f docker-compose.dev.yml build
ashrest2-35384s:em-public-dashboard ashrest2$ docker-compose -f docker-compose.dev.yml up
docker exec -it em-public-dashboard-db-1 mongo

show dbs
 openpath_prod_usaid_laos_ev  0.756GB

Detailed execution of load_mongodump.sh script:

Script Directory: viz_scripts/docker
Configuration File Path: viz_scripts/docker/../../docker-compose.dev.yml
MongoDump File Path: openpath-prod-usaid-laos-ev-snapshot-dec-20.tar.gz
Configuration file details:
-rw-r--r--  1 ashrest2  NREL_NT\Domain Users  1011 Sep 22 08:16 viz_scripts/docker/../../docker-compose.dev.yml
openpath_prod_usaid_laos_ev
Database Name: openpath_prod_usaid_laos_ev
Updated docker-compose file:
version: "3"
services:
  dashboard:
    image: em-pub-dash-dev/frontend
    build:
        context: frontend
        dockerfile: docker/Dockerfile.dev
    depends_on:
      - db
    ports:
      # DASH in numbers
      - "3274:6060"
    volumes:
      - ./frontend:/public
      - ./plots:/public/plots
    networks:
       - emission
  notebook-server:
    image: em-pub-dash-dev/viz-scripts
    build:
      context: viz_scripts
      dockerfile: docker/Dockerfile.dev
      args:
        SERVER_IMAGE_TAG: ${SERVER_IMAGE_TAG}
    depends_on:
      - db
    environment:
      - DB_HOST=mongodb://db/openpath_prod_usaid_laos_ev
      - WEB_SERVER_HOST=0.0.0.0
      - CRON_MODE=
      - STUDY_CONFIG=stage-program
    ports:
      # ipynb in numbers
      - "47962:8888"
    networks:
      - emission
    volumes:
      - ./viz_scripts:/usr/src/app/saved-notebooks
      - ./plots:/plots
  db:
    image: mongo:4.4.0
    volumes:
      - mongo-data:/data/db
    networks:
       - emission

networks:
  emission:

volumes:
  mongo-data:

Copying file to Docker container
Successfully copied 282MB to em-public-dashboard-db-1:/tmp
Clearing existing database
MongoDB shell version v4.4.0
connecting to: mongodb://127.0.0.1:27017/openpath_prod_usaid_laos_ev?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("aac1e2ee-fddf-4646-8c46-ce95560c08a2") }
MongoDB server version: 4.4.0
{ "dropped" : "openpath_prod_usaid_laos_ev", "ok" : 1 }
Restoring the dump from openpath-prod-usaid-laos-ev-snapshot-dec-20.tar.gz to database openpath_prod_usaid_laos_ev
dump/
dump/openpath_prod_usaid_laos_ev/
dump/openpath_prod_usaid_laos_ev/Stage_timeseries_error.metadata.json
dump/openpath_prod_usaid_laos_ev/Stage_analysis_timeseries.metadata.json
dump/openpath_prod_usaid_laos_ev/Stage_pipeline_state.metadata.json
dump/openpath_prod_usaid_laos_ev/Stage_usercache.metadata.json
dump/openpath_prod_usaid_laos_ev/Stage_push_token_mapping.metadata.json
...
2024-09-22T15:30:04.312+0000    [######################..]  openpath_prod_usaid_laos_ev.Stage_timeseries  2.26GB/2.42GB  (93.4%)
2024-09-22T15:30:07.311+0000    [######################..]  openpath_prod_usaid_laos_ev.Stage_timeseries  2.32GB/2.42GB  (95.7%)
2024-09-22T15:30:10.311+0000    [#######################.]  openpath_prod_usaid_laos_ev.Stage_timeseries  2.35GB/2.42GB  (97.0%)
2024-09-22T15:30:13.312+0000    [#######################.]  openpath_prod_usaid_laos_ev.Stage_timeseries  2.39GB/2.42GB  (98.9%)
2024-09-22T15:30:16.312+0000    [#######################.]  openpath_prod_usaid_laos_ev.Stage_timeseries  2.41GB/2.42GB  (99.5%)
2024-09-22T15:30:17.671+0000    [########################]  openpath_prod_usaid_laos_ev.Stage_timeseries  2.42GB/2.42GB  (100.0%)
2024-09-22T15:30:17.672+0000    restoring indexes for collection openpath_prod_usaid_laos_ev.Stage_timeseries from metadata
2024-09-22T15:31:36.181+0000    finished restoring openpath_prod_usaid_laos_ev.Stage_timeseries (3494052 documents, 0 failures)
2024-09-22T15:31:36.187+0000    4441126 document(s) restored successfully. 0 document(s) failed to restore.
Database restore complete.

  • After executing the load_mongodump.sh script
  • docker-compose.dev.yml file was modified automatically:
services:
 notebook-server:
    environment:
      - DB_HOST=mongodb://db/openpath_prod_usaid_laos_ev
      ...
      - STUDY_CONFIG=stage-program

STUDY_CONFIG was manually changed to usaid-laos-ev.
Executed the following:

docker-compose -f docker-compose.dev.yml build
docker-compose -f docker-compose.dev.yml up
  • Launch the Jupyter notebook, and execute the cells to generate charts.

@iantei
Copy link
Contributor Author

iantei commented Sep 22, 2024

Please also indicate the diff between the scripts on the admin and public dashboards.

Known differences between the layout and details in admin and public dashboard for docker-compose and load_mongodump script:

A. Different location of docker-compose.dev.yml and load_mongodump.sh files in the repo.

  • Admin dashboard:
    • op-admin-dashboard/docker-compose-dev.yml
    • op-admin-dashboard/docker/load_mongodump.sh
  • Public dashboard:
    • em-public-dashboard/docker-compose.dev.yml
    • em-public-dashboard/viz_scripts/docker/load_mongodump.sh

B. Name of the docker-compose.dev.yml files:

  • docker-compose-dev.yml - Admin Dashboard
  • docker-compose.dev.yml - Public Dashboard

C. Use of = vs : for assignment of environment variables:

  • Admin Dashboard
    - services:
        environment:
          DB_HOST:mongodb://db/DB_NAME
  • Public Dashboard
    - services:
        environment:
          DB_HOST=db

Difference in the load_mongodump script used in admin and the public dashboards:

A. Directory level difference of files:

  • Admin:
# Path to the configuration file (one level up)
CONFIG_FILE="$SCRIPT_DIR/../docker-compose-dev.yml"
  • Public:
# Path to the configuration file (two levels up)
CONFIG_FILE="$SCRIPT_DIR/../../docker-compose.dev.yml"

B. As depicted above there is difference in naming convention for .yml files.

C. Difference in sed command: use of DB_HOST: vs DB_HOST= because of difference in assignment operators in admin vs public dashboard .yml files.

  • Admin:
DB_HOST="mongodb://db/$DB_NAME"
sed -i.bak "s|DB_HOST:.*|DB_HOST: $DB_HOST|" "$CONFIG_FILE"
  • Public:
DB_HOST="mongodb://db/$DB_NAME"
sed -i.bak "s|DB_HOST=.*|DB_HOST=$DB_HOST|" "$CONFIG_FILE"

I tried to tabulate the above information, but it wasn't working as expected. Therefore, I enlisted these instead.

@shankari
Copy link
Contributor

I was looking for the output of diff, not a listing of the differences in text that does not describe how it was generated. If it was generated manually by you, it may be missing in some areas, some of which may be important.

Regardless, this is an improvement on the current load script, and it works, so I am merging it in the spirit of incremental improvements. I would like to see the diff while starting the next round of cleanup.

@shankari shankari merged commit 3c01450 into e-mission:main Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Tasks completed
Development

Successfully merging this pull request may close these issues.

3 participants