Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] SamTov measure memory scaling #476

Open
wants to merge 52 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
7909e35
remove GPU keyword
PythonFZ Jan 21, 2022
99ff3d4
Add GPU check and include it in the memory_manager
PythonFZ Jan 21, 2022
5abee31
Merge branch 'main' into gpu_batching
SamTov Jan 21, 2022
d1add32
Merge branch 'main' into gpu_batching
PythonFZ Jan 24, 2022
ab1a4ce
Merge branch 'main' into gpu_batching
PythonFZ Jan 24, 2022
2d85c89
start memory measurement modules.
SamTov Jan 24, 2022
4cd6d00
Intiial commit to scaling function updates.
SamTov Jan 25, 2022
eb6283b
Merge branch 'main' into SamTov_Measure_Memory_Scaling
SamTov Jan 25, 2022
bee4387
run black and isort
SamTov Jan 25, 2022
00f5cd2
Merge remote-tracking branch 'origin/SamTov_Measure_Memory_Scaling' i…
SamTov Jan 25, 2022
83d24f6
remove file call in CI.
SamTov Jan 25, 2022
81c867c
Fix additional flake8 import complaint
SamTov Jan 25, 2022
505acff
add config memory testing and include an override for batching.
SamTov Jan 25, 2022
e320755
remove config argument.
SamTov Jan 25, 2022
d898394
resolve flake8 complaint.
SamTov Jan 25, 2022
c974a78
CI profiling
PythonFZ Feb 1, 2022
25d0fe3
CI profiling
PythonFZ Feb 1, 2022
2b97375
update sqlite
PythonFZ Feb 1, 2022
7f51983
typo
PythonFZ Feb 1, 2022
4b23a73
patch ubuntu version
PythonFZ Feb 1, 2022
4729e28
try conda for newer sqlite version
PythonFZ Feb 1, 2022
43ed753
try conda for newer sqlite version
PythonFZ Feb 1, 2022
a83e48e
update sqlite version
PythonFZ Feb 1, 2022
a8c8af8
bugfix
PythonFZ Feb 1, 2022
883474d
add a plot
PythonFZ Feb 1, 2022
8d77a09
plot everything
PythonFZ Feb 1, 2022
0601c84
plot everything
PythonFZ Feb 1, 2022
aedc169
run ADF memory test
PythonFZ Feb 1, 2022
b9ca27f
run ADF memory test
PythonFZ Feb 1, 2022
492370f
Update test_memory.py
PythonFZ Feb 1, 2022
719de34
Update test_memory.py
PythonFZ Feb 1, 2022
3542778
reduce size even further
PythonFZ Feb 1, 2022
1ebdc2f
Update test_memory.py
PythonFZ Feb 1, 2022
e12368c
Update test_memory.py
PythonFZ Feb 1, 2022
869c047
remove print
PythonFZ Feb 2, 2022
4bf249c
Merge branch 'main' into SamTov_Measure_Memory_Scaling
PythonFZ Feb 2, 2022
1713f8a
clean up a bit
PythonFZ Feb 2, 2022
7898a6b
fix black / flake8
PythonFZ Feb 2, 2022
8743d23
add plot function
PythonFZ Feb 2, 2022
327e538
add update to not spam to PR
PythonFZ Feb 2, 2022
427fd13
only run on push
PythonFZ Feb 2, 2022
f49303e
add package
PythonFZ Feb 2, 2022
3b8ad22
small code cleanup + update
PythonFZ Feb 2, 2022
8f14b22
Update lint.yaml
PythonFZ Feb 2, 2022
963580e
add diffusion + fix plots
PythonFZ Feb 2, 2022
de1ef0d
Merge remote-tracking branch 'origin/SamTov_Measure_Memory_Scaling' i…
PythonFZ Feb 2, 2022
b5cf14c
add continue-on-error to still gather the plot at the end.
PythonFZ Feb 2, 2022
d66ca6e
add GK diffusion
PythonFZ Feb 2, 2022
91e06b9
deselect memory by default
PythonFZ Feb 2, 2022
1eddad1
enable memory management
PythonFZ Feb 2, 2022
a262669
add einstein data range test
PythonFZ Feb 2, 2022
a756c6b
run with / without fixture
PythonFZ Feb 2, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
from zinchub import DataHub

import mdsuite as mds
mds.config.memory_fraction = 1
from mdsuite.utils.testing import assertDeepAlmostEqual
SamTov marked this conversation as resolved.
Show resolved Hide resolved


Expand Down
128 changes: 128 additions & 0 deletions CI/memory_scaling/test_scaling_coefficients.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
"""
MDSuite: A Zincwarecode package.

License
-------
This program and the accompanying materials are made available under the terms
of the Eclipse Public License v2.0 which accompanies this distribution, and is
available at https://www.eclipse.org/legal/epl-v20.html

SPDX-License-Identifier: EPL-2.0

Copyright Contributors to the Zincwarecode Project.

Contact Information
-------------------
email: [email protected]
github: https://github.com/zincware
web: https://zincwarecode.com/

Citation
--------
If you use this module please cite us with:

Summary
-------
Module to test scaling coefficients.
"""
import sqlite3

import numpy as np
import pandas as pd
import pytest

import mdsuite
import mdsuite.transformations


def _build_atomwise(data_scaling: int, system: bool = False):
"""
Build a numpy array of atom-wise data in steps of MBs.

Parameters
----------
data_scaling : int
Number of atoms in the data e.g. zeroth array of the data. 1 atom is 1/10
of a MB of data.
system : bool
If true, the returned array should be (n_confs, 3)

Returns
-------
data_array : np.ones
A numpy array of ones that matches close to 1/10 * data_scaling MBs in
size (~98%).
Notes
-----
TODO: When moved to (confs, n_atoms, dim), this will need to be updated to take the
first column as atoms otherwise the memory scaling will be wrong.

"""
if system:
return np.ones((data_scaling * 4096, 3))
else:
return np.ones((data_scaling, 4096, 3))


@pytest.fixture()
def mdsuite_project(tmp_path) -> mdsuite.Project:
"""
Build an MDSuite project with all data stored in a temp directory for easier
cleanup after the test.

Returns
-------
project : mdsuite.Project
MDSuite project to be used in the tests.
"""
project = mdsuite.Project(storage_path=tmp_path.as_posix())

scaling_sizes = [10, 100, 500, 1000]

return project


def get_memory_usage(database: str, callable_name: str) -> float:
"""
Get the memory used from the dumped sql database.

Parameters
----------
database : str
Path to the sqlite database that will be read.
callable_name : str
Name of the function being measured and therefore, what memory value to
return.

Returns
-------
memory : float
memory used during the calculation.
"""
with sqlite3.connect(database) as db:
data = pd.read_sql_query("SELECT * from TEST_METRICS", db)

data = data.loc[data["ITEM"] == callable_name]

return data["MEM_USAGE"]


def test_rdf_memory(mdsuite_project):
"""
Test the memory of the RDF.

Parameters
----------
mdsuite_project : mdsuite.Project
An mdsuite project with stored files in a tmp directory.

Returns
-------

"""
memory_array = np.zeros((2,))
mdsuite_project.run.RadialDistributionFunction(plot=False)
memory = get_memory_usage("pymon.db", test_rdf_memory.__name__)
memory_array[0] = memory

print(memory_array)
6 changes: 3 additions & 3 deletions CI/unit_tests/memory_manager/test_memory_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
import unittest

import numpy as np

import mdsuite
from mdsuite.memory_management.memory_manager import MemoryManager


Expand Down Expand Up @@ -146,7 +146,6 @@ def test_get_batch_size(self):
# Test correct returns for 1 batch
self.memory_manager.database = TestDatabase(data_size=500, rows=10, columns=10)
self.memory_manager.data_path = ["Test/Path"]
self.memory_manager.memory_fraction = 0.5
self.memory_manager.machine_properties["memory"] = 50000
batch_size, number_of_batches, remainder = self.memory_manager.get_batch_size(
system=False
Expand Down Expand Up @@ -188,7 +187,8 @@ def test_get_optimal_batch_size(self):
the same value that is passed to it.
"""
data = self.memory_manager._get_optimal_batch_size(10)
self.assertEqual(data, data) # Todo: no shit, sherlock
self.assertEqual(data, 10) # Todo: no shit, sherlock
mdsuite.config.memory_scaling_test = True

def test_compute_atomwise_minibatch(self):
"""
Expand Down
2 changes: 1 addition & 1 deletion mdsuite/experiment/experiment.py
Original file line number Diff line number Diff line change
Expand Up @@ -607,7 +607,7 @@ def _store_metadata(self, metadata: TrajectoryMetadata, update_with_pubchempy=Fa
----------
metadata: TrajectoryMetadata
update_with_pubchempy: bool
Load data from pubchempy and add it to fill missing infomration
Load data from pubchempy and add it to fill missing information.
"""
# new trajectory: store all metadata and construct a new database
self.temperature = metadata.temperature
Expand Down
38 changes: 26 additions & 12 deletions mdsuite/memory_management/memory_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,12 @@

Summary
-------
Module to manage the memory use of MDSuite operations.
"""
import logging
from typing import Tuple

import numpy as np
import tensorflow as tf

from mdsuite.database.simulation_database import Database
from mdsuite.utils.meta_functions import get_machine_properties, gpu_available
Expand All @@ -38,6 +38,7 @@
polynomial_scale_function,
quadratic_scale_function,
)
from mdsuite.utils import config

log = logging.getLogger(__name__)

Expand All @@ -58,11 +59,20 @@ class MemoryManager:
Attributes
----------
data_path : list
Path to reference the data in the hdf5 database.
database : Database
Database to look through.
parallel : bool
If true, batch sizes should take into account the use of multiple machines
with shared memory. TODO: This is outdated.
memory_fraction : float
Amount of memory to use TODO: In a perfect scaling, this can be 100 % of the
free memory.
scale_function : dict
Function to use to describe how the memory scaling changes with changing
data size.
gpu : bool
If true, a gpu is available.
"""

def __init__(
Expand Down Expand Up @@ -93,7 +103,8 @@ def __init__(
scale_function : dict
Scaling function to compute the memory scaling of a calculator.
gpu : bool
If true, gpu should be used.
If true, a GPU has been detected and the available memory will be
calculated from the GPU.
offset : int
If data is being loaded from a non-zero point in the database the
offset is used to take this into account. For example, expanding a
Expand All @@ -104,7 +115,6 @@ def __init__(
self.data_path = data_path
self.parallel = parallel
self.database = database
self.memory_fraction = memory_fraction
self.offset = offset

self.machine_properties = get_machine_properties()
Expand All @@ -115,9 +125,6 @@ def __init__(
memory = self.machine_properties["gpu"][item]["memory"]

self.machine_properties["memory"] = memory * 1e6
tf.device("gpu")
else:
tf.device("cpu")

self.batch_size = None
self.n_batches = None
Expand Down Expand Up @@ -209,13 +216,13 @@ def get_batch_size(self, system: bool = False) -> tuple:
)
maximum_loaded_configurations = int(
np.clip(
(self.memory_fraction * self.machine_properties["memory"])
(config.memory_fraction * self.machine_properties["memory"])
/ per_configuration_memory,
1,
n_configs - self.offset,
)
)
batch_size = self._get_optimal_batch_size(maximum_loaded_configurations)
batch_size = self._get_optimal_batch_size(maximum_loaded_configurations, n_configs)
number_of_batches, remainder = divmod((n_configs - self.offset), batch_size)
self.batch_size = batch_size
self.n_batches = number_of_batches
Expand All @@ -241,23 +248,30 @@ def hdf5_load_time(n: int):
return np.log(n)

@staticmethod
def _get_optimal_batch_size(naive_size):
def _get_optimal_batch_size(naive_size, n_configs: int):
"""
Use the open/close and read speeds of the hdf5 database_path as well as the
operation being performed to get an optimal batch size.

This is where the memory scaling test will be enforced.

Parameters
----------
naive_size : int
Naive batch size to be optimized
n_configs : int
Total number of configurations in the database.

Returns
-------
batch_size : int
An optimized batch size
"""
# db_io_time = self.database.get_load_time()
return naive_size
if config.memory_scaling_test:
return n_configs
else:
return naive_size

def _compute_atomwise_minibatch(self, data_range: int):
"""
Expand Down Expand Up @@ -310,7 +324,7 @@ def _compute_atomwise_minibatch(self, data_range: int):
)
batch_size = int(
np.clip(
self.memory_fraction
config.memory_fraction
* self.machine_properties["memory"]
/ per_atom_memory,
1,
Expand All @@ -323,7 +337,7 @@ def _compute_atomwise_minibatch(self, data_range: int):
atom_batch_memory = fraction * per_atom_memory
batch_size = int(
np.clip(
self.memory_fraction
config.memory_fraction
* self.machine_properties["memory"]
/ atom_batch_memory,
1,
Expand Down
14 changes: 14 additions & 0 deletions mdsuite/utils/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@

Summary
-------
A set of configuration parameters for the MDSuite framework. Includes information
regarding memory fraction, scaling test state, jupyter use and so on.
"""
from dataclasses import dataclass

Expand All @@ -36,10 +38,22 @@ class Config:
bokeh_sizing_mode: str
The way bokeh scales plots.
see bokeh / sizing_mode for more information
jupyter : bool
If true, jupyter is being used.
GPU: bool
TODO I think this is outdated.
memory_scaling_test : bool
If true, a scaling test is being performed and therefore, all batch sizes
are set to 1. Should typically be accompanied by the memory fraction being
set to 1 as well.
memory_fraction: bool
The portion of the available memory to be used.
"""

jupyter: bool = False
GPU: bool = False
memory_scaling_test: bool = False
memory_fraction: float = 0.5
Comment on lines +55 to +56
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could expand the config to be config.memory.scaling_test = True instead of config.memory_scaling_test = True with additional dataclasses. This way it could be more structured.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's an interesting idea. Managing how configuration things are set in general is a nice thing to discuss as it can be quite involved. I think having data classes for different things like you mention here would be very nice.

bokeh_sizing_mode: str = "stretch_both"


Expand Down