Updates for 2024 #63
Replies: 7 comments 7 replies
-
Pandas notebook updatesSpecifying numeric data types
String replacement
# Remove price column symbols
car_sales["Price"] = car_sales["Price"].str.replace('[\$\,\.]', '',
regex=True) # Tell pandas to replace using regex
|
Beta Was this translation helpful? Give feedback.
-
Matplotlib notebook updatesGeneral workflow
Trying to plot non-numeric columns
# Note: In previous versions of matplotlib and pandas, have the "Price" column as a string would
# return an error
car_sales["Price"] = car_sales["Price"].astype(str)
# car_sales["Price"] = car_sales["Price"].astype(int) # Turning the Price column into an integer looks better
# Plot a scatter plot (does not look as good as with .astype(int))
car_sales.plot(x="Odometer (KM)", y="Price", kind="scatter"); Seaborn plotting styles namespace change
|
Beta Was this translation helpful? Give feedback.
-
Scikit-Learn notebook updates
RandomForestClassifier
# Hyperparameter grid RandomizedSearchCV will search over
param_distributions = {"n_estimators": [10, 100, 200, 500, 1000, 1200],
"max_depth": [None, 5, 10, 20, 30],
"max_features": ["sqrt", "log2", None],
"min_samples_split": [2, 4, 6],
"min_samples_leaf": [1, 2, 4]}
from sklearn.model_selection import RandomizedSearchCV, train_test_split
np.random.seed(42)
# Split into X & y
X = heart_disease.drop("target", axis=1)
y = heart_disease["target"]
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Set n_jobs to -1 to use all available cores on your machine (if this causes errors, try n_jobs=1)
clf = RandomForestClassifier(n_jobs=-1)
# Setup RandomizedSearchCV
rs_clf = RandomizedSearchCV(estimator=clf,
param_distributions=param_distributions,
n_iter=20, # try 20 models total
cv=5, # 5-fold cross-validation
verbose=2) # print out results
# Fit the RandomizedSearchCV version of clf
rs_clf.fit(X_train, y_train); Creation of train/validation/test setChanged creation of train/validation/test sets from indexing to random splitting. I find this cleaner and less prone to error. from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
# Set the seed
np.random.seed(42)
# Read in the data
heart_disease = pd.read_csv("../data/heart-disease.csv")
# Split into X (features) & y (labels)
X = heart_disease.drop("target", axis=1)
y = heart_disease["target"]
# Training and test split (70% train, 30% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
# Create validation and test split by spliting testing data in half (30% test -> 15% validation, 15% test)
X_valid, X_test, y_valid, y_test = train_test_split(X_test, y_test, test_size=0.5)
clf = RandomForestClassifier()
clf.fit(X_train, y_train)
# Make predictions
y_preds = clf.predict(X_valid)
# Evaluate the classifier
baseline_metrics = evaluate_preds(y_valid, y_preds)
baseline_metrics Pipeline upgrades
pipe_grid = {
"preprocessor__num__imputer__strategy": ["mean", "median"], # note the double underscore after each prefix "preprocessor__"
"model__n_estimators": [100, 1000],
"model__max_depth": [None, 5],
"model__max_features": ["sqrt"],
"model__min_samples_split": [2, 4]
} 4.2.1 Classification model evaluation metrics - ROC Curve
from sklearn.metrics import RocCurveDisplay
roc_curve_display = RocCurveDisplay.from_estimator(estimator=clf,
X=X_test,
y=y_test) |
Beta Was this translation helpful? Give feedback.
-
In the lecture Hyperparameter tuning with RandomizedSearchCVRemove
Read more in scikit-learn documentation: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html |
Beta Was this translation helpful? Give feedback.
-
Getting
|
Beta Was this translation helpful? Give feedback.
-
TensorFlow Notebook UpdatesDue to changes in workflow/TensorFlow library updates, going to remake the Dog Vision project. This will be the latest version of TensorFlow (2.14.0, as of October 2023). Currently the notebook will be under the |
Beta Was this translation helpful? Give feedback.
-
There's an increase in students encountering errors when installing jupyter or creating a conda env with jupyter, with Python 3.12 somehow already installed. The error message always is something like this:
You can install jupyter with pip: However, since Python 3.12 is still very new and the possibility of encountering compatibility issues is still high, I recommend the following:
i.e., go to the folder you want to create the env in and execute: Then activate that env you just created Then install the libraries you wanted to have on that env, i.e. |
Beta Was this translation helpful? Give feedback.
-
Working on updates for 2024
Main goals:
See the branch (work in progress) - https://github.com/mrdbourke/zero-to-mastery-ml/tree/updates-2023 (this branch will get merged into
master
once the changes are finished)TODO
Working on
tf.keras
here: https://github.com/keras-team/keras-core/issues/223Done
Beta Was this translation helpful? Give feedback.
All reactions