understanding the data

understanding the data is about

null, missing, duplicates, wrong values (-99,..) & types (numeric/categorical)

outliers

relations

relations statistical tests -> chi-square, t-test, ANOVA, log. regression, correlation
bias vs variance -> in data (histogram, skewness = Δ(median, mean))
outliers -> skewness = Δ(median, mean), histogram, IQR percentiles ± 1.5 * [IQR=q3-q1], (μ ± 3 * σ) or anomaly detection,
types -> df.info()
null -> df.isna().sum
category -> df.describe(include='all')
median -> median == 50% percentile
wrong data -> df.groupby(['field-name','...'].aggregationFunctions)
imbalanced -> df.field-name.value_counts()
statistics -> df.describe(include="all)
scatter matrix -> pd.plotting.scatter_matrix(df)
correlation -> sns.heatmap(df.corr())

Provide feedback