Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/0.1.x' into dev-Ife
Browse files Browse the repository at this point in the history
  • Loading branch information
ifeoluwaale committed Dec 25, 2023
2 parents 083cc0e + 9b6bf02 commit 17166ab
Show file tree
Hide file tree
Showing 14 changed files with 8,579 additions and 142 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -136,3 +136,4 @@ dmypy.json
# GitHub Copilot
"GitHub Copilot"
tests/data/.DS_Store
notebooks/spline_testing.ipynb
160 changes: 95 additions & 65 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,79 @@
# dplPy
The Dendrochronology Program Library for Python
<p align="center">
<img src="docs/assets/dplpy.png" width="175">

# dplPy -the Dendrochronology Program Library in Python
The Dendrochronology Program Library (DPL) in Python has its roots in both the [original FORTRAN program](https://www.ltrr.arizona.edu/software.html) created by the [legendary Richard Holmes](https://arizona.aws.openrepository.com/handle/10150/262569?show=full) and the subsequent R Project package by Andy Bunn, [dplR](https://github.com/OpenDendro/dplR). Our aim is to provide researchers working with tree-ring data the necessary tools in open-source environments, promoting open science practices, enhancing rigor and transparency in dendrochronology, and eventually allowing reproducible research entirely in a single programming language.

The development of dplPy is supported by a grant from the Paleoclimate program of the US National Science Foundation (AGS-2054516) to Andy Bunn, Kevin Anchukaitis, Ed Cook, and Tyson Swetnam.
<br>


---


## Index

- [Issues](#issues)
- [Requirements](#requirements)
- [Building Environment](#building-environment)
- [Using jupyter](#using-jupyter)
- [Linux, macOS](#linux-macos)
- [dplPy - the Dendrochronology Program Library in Python](#dplpy---the-dendrochronology-program-library-in-python)
- [Index](#index)
- [Requirements](#requirements)
- [Current Version and Changelog](#current-version-and-changelog)
- [Installation](#installation)
- [Building directly from Github](#building-directly-from-github)
- [Using VSCode in your operating system](#using-vscode-in-your-operating-system)
- [Linux or MacOS](#linux-or-macos)
- [Windows](#windows)
- [Functionalities and usage](#functionalities-and-usage)
- [Loading data](#loading-data)
- [Data Summary](#data-summary)
- [Data Stastics](#data-stastics)
- [Data Report](#data-report)
- [Functionalities and Usage](#functionalities-and-usage)
- [Loading data using `readers`](#loading-data-using--readers)
- [Data Summary from `summary`](#data-summary-from-summary)
- [Data Stastics from `stats`](#data-stastics-from-stats)
- [Data Report from `report`](#data-report-from-report)
- [Plotting](#plotting)
- [Detrending using `detrend`](#detrending-using-detrend)
- [Autoregressive (AR) modeling](#autoregressive-ar-modeling)
- [Detrending](#detrending)
- [Chronology](#chronology)

---

## Issues

We're using [ZenHub](https://app.zenhub.com/workspaces/opendendro-60ec698d8790d700171ceee8/board?repos=385244315) to manage our [GitHub Issues](https://github.com/opendendro/dplpy/issues)
- [Build a chronology with `chron`](#build-a-chronology-with-chron)
- [Crossdate with `xdate`](#crossdate-with-xdate)

---

## Requirements

:warning: **Note**: DplPy has been successfully tested on Ubuntu 20, Ubuntu 22, macOS (Intel, M2).

- Python (>=3.10)
- Conda ([Anaconda](https://docs.anaconda.com/anaconda/install/index.html) or [Miniconda](https://docs.conda.io/projects/continuumio-conda/en/latest/user-guide/install/index.html))
- (Suggested) [Mamba](https://mamba.readthedocs.io/en/latest/installation.html)
- (Suggested) [VSCode](https://code.visualstudio.com/)

## Building Environment
Under the hood, dplPy uses `numpy`, `pandas`, `matplotlib`, `statsmodels`, `scipy`, and `csaps`.

:warning: dplPy has been successfully tested thus far on Ubuntu 20, Ubuntu 22, macOS (Intel and M2). Other operating systems may experience unexpected errors or conflicts. Please let the developers know.

## Current Version and Changelog

dplPy is currently at version `v0.1.1` - Changes and new functions are currently merged into the `0.1.x` branch.

## Installation

dplPy is now available to [install via pip](https://pypi.org/project/dplpy/):

```
pip install dplpy
```

You can install a conda virtual environment using the [environment.yml for the project](https://github.com/OpenDendro/dplPy/blob/main/environment.yml):

```
$ conda env create -f environment.yml
```

---

> :warning: **it is recommended to _NOT_ use GitHub Codespaces (as of Mar 2022)**

## Building directly from Github

You can still still install dplPy firectly from Github if you wish:

1\. Clone and change directory to this repository


```
$ git clone https://github.com/OpenDendro/dplPy.git
$ cd dplPy
Expand All @@ -69,36 +99,25 @@ $ conda activate dplpy

Your environment should be successfully built.

4\. Your python environment should be able to import `numpy`, `pandas`, `matplotlib`, `statsmodels` and `csaps`:

![env_3](docs/assets/env_3.png)
4\. Your python environment should be able to import `numpy`, `pandas`, `matplotlib`, `statsmodels` and `csaps`.

---

## Using Jupyter
## Using VSCode in your operating system

The Conda enviroment is essential as it provides will all necessary packages. To execute the code, use Jupyter Notebook.

:warning: **Note**: if using Jupyter from the terminal, you need to ensure that the kernel is findable by doing the following command once the environment is active:

```
python -m ipykernel install --user --name dplpy --display-name "Python (dplpy)"
```

### Linux, MacOS
### Linux or MacOS

1\. In your VSCode terminal, activate the conda environment with `conda activate dplpy3`.
2\. Open a Jupyer Notebook (`<file>.ipynb`) and select the `dplpy3` Kernel when prompted (or from the top right of your screen).
This will automatically load the environment we created.

2\. Open a Jupyer Notebook (`<file>.ipynb`) and select the `dplpy3` Kernel when prompted (or from the top right of your screen). This will automatically load the environment we created.

### Windows

In VSCode:

1\. In your VSCode terminal window, activate the conda environment with `conda activate dplpy3`.
2\. In the same terminal window, start a Jupyter Notebook with `jupyter notebook`. Jupyter will then return URLs that you can copy; *Copy* one of these URLs.

![ipynb_env1](docs/assets/ipynb_env1.jpg)
2\. In the same terminal window, start a Jupyter Notebook with `jupyter notebook`. Jupyter will then return URLs that you can copy; *Copy* one of these URLs.

3\. Open a Jupyter Notebook (`<file>.ipynb`) and from the **bottom right** of the VSCode screen, click **Jupyter Server**;

Expand All @@ -114,15 +133,22 @@ A dropdown menu will open from the top of the screen: select Existing and *paste

## Functionalities and Usage

Import the DplPy tool with
Import the dplPy tool with

```
import dplpy as dpl
```

This will load the necessary functions.
or alternatively:

```
import dplpy
```

This will load the package and its functions.


### Loading data
### Loading data using `readers`

- Description: reads data from supported file types (`csv` and `rwl`) and stores them in a dataframe.
- Options:
Expand All @@ -132,7 +158,7 @@ This will load the necessary functions.
>>> data = dpl.readers("/path/to/file.rwl", header=True)
```

### Data Summary
### Data Summary from `summary`

- Description: generates a summary of each series recorded in `rwl` and `csv` format files
- Usage Example:
Expand All @@ -142,7 +168,7 @@ This will load the necessary functions.
>>> dpl.summary(data)
```

### Data Stastics
### Data Stastics from `stats`

- Description: generates summary statistics for `rwl` and `csv` format files
- Usage Example:
Expand All @@ -152,7 +178,7 @@ This will load the necessary functions.
>>> dpl.stats(data)
```

### Data Report
### Data Report from `report`

- Description: generates a report about absent rings in the data set
- Usage Example:
Expand Down Expand Up @@ -181,23 +207,8 @@ This will load the necessary functions.
>>> dpl.plot(data[[SERIES_1, SERIES_2, SERIES_3]], type="spag")
```

### Autoregressive (AR) modeling

- Description: ontains methods that fit series to autoregressive models and perform functions related to AR modeling.
- Functions:
- `autoreg(data['Name of series'], max_lag)`: returns parameters of best fit AR model with maxlag of 5 (default) or other specified number
- `ar_func(data['Name of series'], max_lag)`: returns residuals plus mean of best fit from AR models with max lag of either 5 (default) or specified number
- Options:
- `max_lag`: default 5, can be specified to user's needs.
- Usage Example:
```
>>> dpl.autoreg(data[SERIES_1])
# or
>>> dpl.ar_func(data[SERIES_2], max_lag=7)
```

### Detrending

### Detrending using `detrend`

- Description: Detrends a given series or data frame, first by fitting data to curve(s), and then by calculating residuals or differences compared to the original data.
- Options:
- `fit="spline"`: default detrending method.
Expand All @@ -218,7 +229,23 @@ This will load the necessary functions.
>>> dpl.detrend(data[[SERIES_1, SERIES_2, SERIES_3]], fit="Hugershoff", method="difference")
```

### Chronology

### Autoregressive (AR) modeling

- Description: ontains methods that fit series to autoregressive models and perform functions related to AR modeling.
- Functions:
- `autoreg(data['Name of series'], max_lag)`: returns parameters of best fit AR model with maxlag of 5 (default) or other specified number
- `ar_func(data['Name of series'], max_lag)`: returns residuals plus mean of best fit from AR models with max lag of either 5 (default) or specified number
- Options:
- `max_lag`: default 5, can be specified to user's needs.
- Usage Example:
```
>>> dpl.autoreg(data[SERIES_1])
# or
>>> dpl.ar_func(data[SERIES_2], max_lag=7)
```

### Build a chronology with `chron`

- Description: creates a mean value chronology for a dataset, typically the ring width indices of a detrended series. **Note: input data has to be detrended first.**
- Options:
Expand All @@ -233,3 +260,6 @@ This will load the necessary functions.
# Perform chronology
>>> dpl.chron(rwi_data, biweight=False, plot=False)
```
### Crossdate with `xdate`
- Description: evaluate the dating accuracy of a set of tree-ring measurements
- Options:
Binary file added docs/assets/dplpy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/nsf.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
30 changes: 0 additions & 30 deletions docs/csv_var.py

This file was deleted.

1 change: 0 additions & 1 deletion dplpy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@
import os
import sys

os.chdir(os.path.dirname(os.path.realpath(__file__)))
lpath = os.path.dirname(os.path.realpath(__file__))
sys.path.append(lpath)

Expand Down
4 changes: 2 additions & 2 deletions dplpy/detrend.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,12 +45,12 @@ def detrend(data: pd.DataFrame | pd.Series, fit="spline", method="residual", plo
res = pd.DataFrame(index=pd.Index(data.index))
to_add = [res]
for column in data.columns:
to_add.append(detrend_series(data[column], column, fit, method, plot, period=None))
to_add.append(detrend_series(data[column], column, fit, method, plot, period))
output_df = pd.concat(to_add, axis=1)
return output_df.rename_axis(data.index.name)

elif isinstance(data, pd.Series):
return detrend_series(data, data.name, fit, method, plot)
return detrend_series(data, data.name, fit, method, plot, period)
else:
raise TypeError("argument should be either pandas dataframe or pandas series.")

Expand Down
80 changes: 40 additions & 40 deletions dplpy/new.ipynb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion dplpy/xdate.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ def xdate(data: pd.DataFrame, prewhiten=True, corr="Spearman", slide_period=50,

# evaluation of current series vs chronology of others by segments of years (the bins created earlier)
for range in bins:
print(range)
# print(range) # useful for debugging but not necessary once operational
start = int(re.split("(?<=\d)-", range)[0])
end = int(re.split("(?<=\d)-", range)[1])
if start >= removed.first_valid_index() and end <= removed.last_valid_index():
Expand Down
Loading

0 comments on commit 17166ab

Please sign in to comment.