You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the presence of NaNs in a dataset produces a distinguishing event, as for most functions (other than nanmean, nanvar, nanstd) the output will always be NaN. Is the best solution for all functions to just ignore NaNs (like nanmean, etc)?
For single-dimensional problems, there should be no issue, as removing NaNs is a simple deterministic pre-processing step. For multi-dimensional problems, removing data rows may prove problematic for utility. Is there justification here for doing something fancier, like mapping to a value within the range?
inf may also need special consideration, although these can usually be overcome when clipping the data, as inf will clip to the upper bound, and -inf to the lower bound. NaN has no obvious value to map to. When the algorithm requires the norm of a row to be clipped (like LogisticRegression), mapping from inf to a value is no longer trivial. Do we map inf to a value that ensures the row's norm matches the clip, or do we also scale the rest of the row?
The text was updated successfully, but these errors were encountered:
Currently, the presence of
NaN
s in a dataset produces a distinguishing event, as for most functions (other thannanmean
,nanvar
,nanstd
) the output will always beNaN
. Is the best solution for all functions to just ignoreNaN
s (likenanmean
, etc)?For single-dimensional problems, there should be no issue, as removing NaNs is a simple deterministic pre-processing step. For multi-dimensional problems, removing data rows may prove problematic for utility. Is there justification here for doing something fancier, like mapping to a value within the range?
inf
may also need special consideration, although these can usually be overcome when clipping the data, asinf
will clip to the upper bound, and-inf
to the lower bound.NaN
has no obvious value to map to. When the algorithm requires the norm of a row to be clipped (likeLogisticRegression
), mapping frominf
to a value is no longer trivial. Do we mapinf
to a value that ensures the row's norm matches the clip, or do we also scale the rest of the row?The text was updated successfully, but these errors were encountered: