The EnviroMetaAnalysis study queries journal articles published between 2013-2023 from a subset of journals related to the Environmental Science to explore the trends in environmental research around the world. These metadata were gathered from OpenAlex, stored on a MongoDB server, and analyzed with Python using Jupyter notebooks.
This project is designed to support the International Panel on Chemical Pollution, established as part of United Nations Environment Assembly resolution 5/8. This work is completed through the Indiana University FADS Summer 2023 program for the Luddy School of Informatics, Computing, and Engineering.
The src
folder contains a script, alex2mongo.py
, to query journal article metadata from OpenAlex and write them into local MongoDB server as a document for each article. To save some time and start working with this dataset directly, one can undertake the following steps:
- Install Docker
- Acquire
db.tar.gz
from project admin and save it to the./data/
directory - From the
./data/
directory run the following command to extract the MongoDB database files:
tar xzf db.tar.gz db
- Build the container with the following command from this root
EnviroMetaAnalysis
directory:
docker run -d --name mongodb_fads \
-v <absolute-path-to-repo>/EnviroMetaAnalysis/data/db:/data/db \
-p 27017:27017 mongo:latest
This will create and run a Docker container running MongoDB
with the queried data from OpenAlex. The data is reachable at this URI: mongodb://localhost:27017
, in the journals
collection of the OpenAlexEnvironmental
database.
flowchart TB
subgraph G["Data Source"]
oa[("OpenAlex\nAPI")]
end
oa--query OpenAlex--> inbound
inbound[[./src/alex2mongo.py]]
inbound--write new records-->B
compose[docker run]--create container-->B
subgraph B["Docker Container"]
mongo[("Mongo Server\nlocalhost:27017")]
volume(["MongoDB Volume\n./data/db"])
volume<-->mongo
end
B--query MongoDB-->C;
subgraph C["Visualizations"]
stats(("Statistics\nWordCloud\nPlots and Graphs"))
end