Memory Management Issue in class ClassificationMetric #526

trivedi-nitesh · 2024-04-19T06:09:03Z

I'm using the provided code to compute SPD (Statistical Parity Difference) on adult datasets. However, upon calling the function get_spd_and_accuracy within a loop, I've noticed that memory consumption gradually increases when class_metrics.statistical_parity_difference() is instantiated with each iteration, and this memory is not being released at the end of each iteration.

from aif360.datasets import AdultDataset, GermanDataset, CompasDataset
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from aif360.datasets import StandardDataset, BinaryLabelDataset
from aif360.metrics import ClassificationMetric
from copy import deepcopy
from sklearn.metrics import accuracy_score



def get_spd_and_accuracy(df, protected, target):
  '''Prepare the data for training and testing'''
  train, test = train_test_split(df, test_size=0.2, shuffle=True)
  X_train = train.drop([protected, target], axis=1).values
  y_train = train[target].values
  y_test = test[target].values
  X_test = test.drop([protected, target], axis=1).values
  
  '''Train the model and predict the labels for the training and testing data'''
  lmod = LogisticRegression(solver='liblinear', class_weight='balanced')
  lmod.fit(X_train,y_train)
  
  y_train_pred = lmod.predict(X_train)
  y_test_pred = lmod.predict(X_test)
  
  '''Prepare the data for the AIF360 metrics'''
  train_transf = StandardDataset(train, 
  label_name=target, 
  favorable_classes=[1], 
  protected_attribute_names=[protected],
  categorical_features=[],
  features_to_drop=[],
  privileged_classes=[[1.0]])
  train_transf_pred = deepcopy(train_transf)
  train_transf_pred.labels = y_train_pred
  un_p=[{protected:0.0}]
  p=[{protected:1.0}]
  
  '''Calculate the Statistical Parity Difference and Accuracy Score'''    
  class_metrics = ClassificationMetric(train_transf,train_transf_pred,unprivileged_groups=un_p, privileged_groups=p)
  print(round(class_metrics.statistical_parity_difference(),2))
  print(round(accuracy_score(y_test, y_test_pred),2))


dataset = AdultDataset()
df = dataset.convert_to_dataframe()[0]
target = df.columns[-1]
protected = 'sex'

for i in range(25):
  get_spd_and_accuracy(df, protected, target)

Any insights or recommendations regarding memory release strategies in this context would be greatly appreciated. Below is the snapshot of increase in memory.

Before executing the code

During the execution of the code

The text was updated successfully, but these errors were encountered:

trivedi-nitesh changed the title ~~Incorrect Memory Management in AIF's ClassificationMetric Class Persists Beyond Function Termination~~ Memory Management Issue in ClassificationMetric Class Persists Beyond Function Termination Apr 19, 2024

trivedi-nitesh changed the title ~~Memory Management Issue in ClassificationMetric Class Persists Beyond Function Termination~~ Memory Management Issue in class ClassificationMetric Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Management Issue in class ClassificationMetric #526

Memory Management Issue in class ClassificationMetric #526

trivedi-nitesh commented Apr 19, 2024 •

edited

Loading

Memory Management Issue in class ClassificationMetric #526

Memory Management Issue in class ClassificationMetric #526

Comments

trivedi-nitesh commented Apr 19, 2024 • edited Loading

trivedi-nitesh commented Apr 19, 2024 •

edited

Loading