Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Management Issue in class ClassificationMetric #526

Open
trivedi-nitesh opened this issue Apr 19, 2024 · 0 comments
Open

Memory Management Issue in class ClassificationMetric #526

trivedi-nitesh opened this issue Apr 19, 2024 · 0 comments

Comments

@trivedi-nitesh
Copy link

trivedi-nitesh commented Apr 19, 2024

I'm using the provided code to compute SPD (Statistical Parity Difference) on adult datasets. However, upon calling the function get_spd_and_accuracy within a loop, I've noticed that memory consumption gradually increases when class_metrics.statistical_parity_difference() is instantiated with each iteration, and this memory is not being released at the end of each iteration.

from aif360.datasets import AdultDataset, GermanDataset, CompasDataset
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from aif360.datasets import StandardDataset, BinaryLabelDataset
from aif360.metrics import ClassificationMetric
from copy import deepcopy
from sklearn.metrics import accuracy_score



def get_spd_and_accuracy(df, protected, target):
  '''Prepare the data for training and testing'''
  train, test = train_test_split(df, test_size=0.2, shuffle=True)
  X_train = train.drop([protected, target], axis=1).values
  y_train = train[target].values
  y_test = test[target].values
  X_test = test.drop([protected, target], axis=1).values
  
  '''Train the model and predict the labels for the training and testing data'''
  lmod = LogisticRegression(solver='liblinear', class_weight='balanced')
  lmod.fit(X_train,y_train)
  
  y_train_pred = lmod.predict(X_train)
  y_test_pred = lmod.predict(X_test)
  
  '''Prepare the data for the AIF360 metrics'''
  train_transf = StandardDataset(train, 
  label_name=target, 
  favorable_classes=[1], 
  protected_attribute_names=[protected],
  categorical_features=[],
  features_to_drop=[],
  privileged_classes=[[1.0]])
  train_transf_pred = deepcopy(train_transf)
  train_transf_pred.labels = y_train_pred
  un_p=[{protected:0.0}]
  p=[{protected:1.0}]
  
  '''Calculate the Statistical Parity Difference and Accuracy Score'''    
  class_metrics = ClassificationMetric(train_transf,train_transf_pred,unprivileged_groups=un_p, privileged_groups=p)
  print(round(class_metrics.statistical_parity_difference(),2))
  print(round(accuracy_score(y_test, y_test_pred),2))


dataset = AdultDataset()
df = dataset.convert_to_dataframe()[0]
target = df.columns[-1]
protected = 'sex'

for i in range(25):
  get_spd_and_accuracy(df, protected, target) 

Any insights or recommendations regarding memory release strategies in this context would be greatly appreciated. Below is the snapshot of increase in memory.

Before executing the code
Screenshot from 2024-04-19 10-59-07

During the execution of the code
Screenshot from 2024-04-19 11-03-13

@trivedi-nitesh trivedi-nitesh changed the title Incorrect Memory Management in AIF's ClassificationMetric Class Persists Beyond Function Termination Memory Management Issue in ClassificationMetric Class Persists Beyond Function Termination Apr 19, 2024
@trivedi-nitesh trivedi-nitesh changed the title Memory Management Issue in ClassificationMetric Class Persists Beyond Function Termination Memory Management Issue in class ClassificationMetric Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant