Merge pull request #19 from alphasentaurii/release/0.3.0

Release/0.3.0
spacetelescope · Feb 17, 2022 · adf9abd · adf9abd
2 parents 61c6fa6 + 52993ef
commit adf9abd
Show file tree

Hide file tree

Showing 243 changed files with 7,894 additions and 7,556 deletions.
diff --git a/.gitignore b/.gitignore
@@ -13,14 +13,17 @@ __pycache__
 */version.py
 */cython_version.py
 htmlcov
+.benchmarks
 .coverage
 MANIFEST
 .ipynb_checkpoints
 .bashrc
 # Sphinx
 docs/api
 docs/_build
-
+# Pytest (local)
+tmp/
+tests/data/svm/prep/singlevisits/ibl738
 # Packages/installer info
 *.egg
 *.egg-info/
@@ -55,6 +58,9 @@ cal-data/
 # 2021-10-28-1635457222/
 # 2021-08-22-1629663047/
 */dashboard/svm/data/*
+2022-01-16-1642337739.zip
+2022-01-17-1642450556
+models
 # Mac OSX
 .DS_Store
 *.pyc

diff --git a/MANIFEST.IN b/MANIFEST.IN
@@ -1,7 +1,7 @@
 include pyproject.toml
 
 # Include pre-trained neural networks
-include spacekit/skopes/trained_networks/*.zip
+include spacekit/builder/trained_networks/*.zip
 
 # Include the README
 include *.md

diff --git a/README.md b/README.md
@@ -4,55 +4,11 @@
 ![GitHub repo size](https://img.shields.io/github/repo-size/alphasentaurii/spacekit)
 ![GitHub license](https://img.shields.io/github/license/alphasentaurii/spacekit?color=black)
 
+
 Astronomical Data Science and Machine Learning Toolkit
 
-```python
-spacekit
-└── spacekit
-    └── analyzer
-        └── compute.py
-        └── explore.py
-        └── scan.py
-        └── track.py
-    └── builder
-        └── networks.py
-    └── dashboard
-        └── cal
-            └── app.py
-        └── svm
-            └── app.py
-    └── datasets
-        └── hst_cal.py
-        └── hst_svm.py
-        └── k2_exo.py
-    └── extractor
-        └── load.py
-        └── radio.py
-        └── scrape.py
-    └── generator
-        └── augment.py
-        └── draw.py
-    └── preprocessor
-        └── encode.py
-        └── scrub.py
-        └── transform.py
-    └── skopes
-        └── hst
-            └── cal
-                └── train.py
-            └── svm
-                    └── corrupt.py
-                    └── predict.py
-                    └── prep.py
-                    └── train.py
-        └── kepler
-            └── light_curves.py
-        └── trained_networks
-└── setup.py
-└── tests
-└── LICENSE
-└── README.md
-```
+
+![ML Dashboard](./previews/neural-network-graph.png)
 
 ## Setup
 
@@ -70,99 +26,123 @@ $ cd spacekit
 $ pip install -e .
 ```
 
-## Run
 
-**Example: HST Single Visit Mosaic Alignment Classification**
+### Pre-Trained Neural Nets
 
-### Classify new data using pre-trained model:
+**Single Visit Mosaic Alignment (HST)**
 
-1. Preprocess data (scrape from regression test json and fits files, scrub/preprocess dataframe, generate png images for ML)
+[SVM Docs](https://spacekit.readthedocs.io/en/latest/skopes/hst/svm.html)
 
-***from the command line***
+* Preprocessing: ``spacekit.skopes.hst.svm.prep``
+* Predict Image Alignments: ``spacekit.skopes.hst.svm.predict``
+* Train Ensemble Classifier: ``spacekit.skopes.hst.svm.train``
+* Generate synthetic misalignments†: ``spacekit.skopes.hst.svm.corrupt``
+
+*† requires Drizzlepac*
+
+**Calibration Data Pipeline (HST)**
 
-```bash
-$ python -m spacekit.skopes.hst.svm.prep path/to/svmdata -f=svm_data.csv
-```
+[CAL Docs](https://spacekit.readthedocs.io/en/latest/skopes/hst/cal.html)
 
-***from python***
+* ``spacekit.skopes.hst.cal.train``
 
-```python
-from spacekit.skopes.hst.svm.prep import run_preprocessing
-input_path = "/path/to/svm/datasets"
-fname = run_preprocessing(input_path)
-print(fname)
-# svm_data.csv
-
-# This is equivalent to using the default kwargs:
-fname = run_preprocessing(input_path, h5=None, fname="svm_data", output_path=None, json_pattern="*_total*_svm_*.json", crpt=0, draw_images=1)
-print(fname)
-# default is "svm_data.csv"; customize filename and location using kwargs `fname` and `output_path`
-```
 
-Outputs:
-* svm_data.csv
-* raw_svm_data.csv
-* svm_data.h5
-* img/
+**Exoplanet Detection with time-series photometry (K2, TESS)**
 
-2. Generate predictions
+[K2 Docs](https://spacekit.readthedocs.io/en/latest/skopes/kepler/light-curves.html)
 
-***from the command line***
+* ``spacekit.skopes.kepler.light_curves``
 
-```bash
-$ python -m spacekit.skopes.hst.svm.predict svm_data.csv img
-```
 
-***from python***
+### Customizable Model Building Classes
 
-```python
-from spacekit.skopes.hst.svm.predict import predict_alignment
-data_file = "svm_data.csv" # same as `fname` returned in `prep.py` above
-img_path = "img" # default image foldername created above
-predict_alignment(data_file, img_path)
+Build, train and experiment with multiple model iterations using the ``builder.architect.Builder`` classes
 
-# This is equivalent to using the default kwargs:
-predict_alignment(data_file, img_path, model_path=None, output_path=None, size=None)
+Example: Build and train an MLP and 3D CNN ensemble network
+
+- continuous/encoded data for the multi-layer perceptron
+- 3 RGB image "frames" per image input for the CNN
+- Stack mixed inputs and use the outputs of MLP and CNN as inputs for the final ensemble model
+
+```python
+ens = BuilderEnsemble(XTR, YTR, XTS, YTS, name="svm_ensemble")
+ens.build()
+ens.batch_fit()
+
+# Save Training Metrics
+outputs = f"data/{date_timestamp}"
+com = ComputeBinary(builder=ens, res_path=f"{outputs}/results/test")
+com.calculate_results()
 ```
+# Load and plot metrics to evaluate and compare model performance
 
-Outputs:
-* predictions/
-    * clf_report.txt
-    * compromised.txt
-    * predictions.csv
+Analyze and compare results across iterations from metrics saved using ``analyze.compute.Computer`` class objects. Almost all plots are made using plotly and are dynamic/interactive.
 
-----
+```python
+# Load data and metrics
+from spacekit.analyzer.scan import MegaScanner
+res = MegaScanner(perimeter="data/2022-*-*-*")
+res._scan_results()
+```
 
-### Build, train, evaluate new classifier from labeled data
+![ROC](./previews/roc-auc.png)
 
-Run step 1 (prep) above, then:
+![Eval](./previews/model-performance.png)
 
-***from the command line***
 
-```bash
-# Note: there are several option flags you can also include in this command
-$ python -m spacekit.skopes.hst.svm.train svm_data.csv img
-```
+### Preprocessing and Analysis Tools for Space Telescope Instrument Data
 
-***from Python***
+![box](./previews/eda-box-plots.png)
 
 ```python
-# import spacekit training submodule
-from spacekit.skopes.hst.svm.train import run_training
-
-training_data = "svm_data.csv" # preprocessed dataframe (see step 1 above)
-img_path = "img" # preprocessed PNG image files (see step 1 above)
+from spacekit.analyzer.explore import HstCalPlots
+res.load_dataframe()
+hst = HstCalPlots(res.df, group="instr")
+hst.scatter
+```
 
-run_training(training_data, img_path)
+![scatter](./previews/eda-scatterplots.png)
 
-# This is the same as using the default kwargs
-com, val = run_training(
-    training_data, img_path, synth_data=None, norm=0, model_name=None, params=None, output_path=None
-)
 
-# Optional: view plots
-com.draw_plots()
-val.draw_plots()
+```python
+spacekit
+└── spacekit
+    └── analyzer
+        └── compute.py
+        └── explore.py
+        └── scan.py
+        └── track.py
+    └── builder
+        └── architect.py
+        └── blueprints.py
+    └── dashboard
+    └── datasets
+    └── extractor
+        └── load.py
+        └── radio.py
+        └── scrape.py
+    └── generator
+        └── augment.py
+        └── draw.py
+    └── preprocessor
+        └── encode.py
+        └── scrub.py
+        └── transform.py
+    └── skopes
+        └── hst
+            └── cal
+            └── svm
+                └── corrupt.py
+                └── predict.py
+                └── prep.py
+                └── train.py
+        └── kepler
+        └── trained_networks
+└── setup.py
+└── tests
+└── docker
+└── LICENSE
+└── README.md
 ```