You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Track and monitor energy draw for experiments related to model training, model inference in GPUs and CPUS.
Summary
The carbon footprint caused by energy consumption of GPUs and CPUs while doing model training and model inference could be reduced, if properly tracked and taken measures to reduce. By this tool, GPU/CPU usage for model training and model inference will be monitored, and logged.
Technical Overview
The Energy Draw Tool provides the following features:
Easy to use plugin for all your experiments with very few lines of code.
Can be used as a callback in your Tensorflow/Pytorch/Keras experiments
Usage monitoring which provides support for multiple devices.
GPU
CPU:
Intel/Mac Chips
Track the combined Energy Draw for experiments with distributed machines.
The proposed design is chosen over other designs because of a number of reasons:
Some existing designs does not support the new Apple M series Mac Processor.
Existing designs does not monitor an experiment which can be run together in different machines. The proposed design will monitor a combined output from multiple machines.
Easier to use as a callback in your experiments.
Can unify multiple solutions together, so that more categories of devices can be supported.
One of the best alternative approach to this design is CodeCarbon, but the following issues arise for running with codecarbon.
CodeCarbon does not support running with Apple M series Mac Processor.
Running experiments in distributed machines require CodeCarbon API being called in every single one of them.
Drawbacks
Implementation would require testing multiple machines, cost of testing would be higher.
Useful References
What similar work have we already successfully completed?
Energy Draw Tracker
Track and monitor energy draw for experiments related to model training, model inference in GPUs and CPUS.
Summary
The carbon footprint caused by energy consumption of GPUs and CPUs while doing model training and model inference could be reduced, if properly tracked and taken measures to reduce. By this tool, GPU/CPU usage for model training and model inference will be monitored, and logged.
Technical Overview
The Energy Draw Tool provides the following features:
Alternatives
Rationale
The proposed design is chosen over other designs because of a number of reasons:
One of the best alternative approach to this design is CodeCarbon, but the following issues arise for running with codecarbon.
Drawbacks
Useful References
What similar work have we already successfully completed?
Is this something that have already been built by others?: No
Are there useful academic literature or other articles related with this topic? (provide links)
Have we built a relevant prototype previously? : No
Do we have a rough mock for the UI/UX? : No
Do we have a schematic for the system? : No
Unresolved Questions
Parts of the System Affected
Future possibilities
Infrastructure
Detect machine details:
Run energy Tracking
Logging the output
Testing
The testing procedure can be done in the following steps:
A tensorflow example for model training:
A tensorflow example for model inference.
A talos example for hyperparameter tuning.
Documentation
Describe the level of documentation fulfilling this request involves. Consider both end-user documentation and developer documentation.
End User Documentation:
Developer Documentation
Version History.
Version 0.0.1
Recordings.
Work Phases.
Non-Coding.
Implementation.
API
nvidia-smi
command's features.References :
* power draw callback
* GpuStat
References :
* PyRAPL
* EnergyUsage
Docker
Distributed Run
Logging
Visualisation
Documentation.
Write End User documentation, as well as Developer documentation.
End User Documentation:
Developer Documentation
Testing
All the testing can use the Bitcoin price prediction example
For model training:
For model inference :
For hyperparameter tuning (Using Talos for hyperparameter tuning) :
The text was updated successfully, but these errors were encountered: