Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need explicit missing data treatment #265

Open
jhollway opened this issue Jan 10, 2023 · 1 comment
Open

Need explicit missing data treatment #265

jhollway opened this issue Jan 10, 2023 · 1 comment
Assignees
Labels
feature_request Requests for new functionality

Comments

@jhollway
Copy link
Collaborator

Re email conversation with Robert Krause

@jhollway jhollway added the feature_request Requests for new functionality label Jan 10, 2023
@RWKrause
Copy link
Collaborator

hej hej,
For those seeing this conversation but not having read James and my mail exchange, the question is: How does migraph treat missing data?

The "true" answer under missing data is (virtually always) NA, i.e., the density etc. are unknown.
I proposed simple default treatments that can be activated:

  1. Where possible use only observed data. E.g., for density, if the network has 10 actors, and thus 10*10-10=90 cells, and 9 of those are missing data because one actor did not provide outgoing data, then the density should be calculated on the 80 observed cells only. This is also possible for reciprocity, transitivity, and many other measures, but (under MCAR) counts (e.g., total degree) will be biased downwards. The other measures will also be biased (see Krause et al 2020) but the bias will be less.

  2. Impute the median (e.g., 0 in a sparse binary network). Default for several functions in other packages but induces biases. Impute the mean for valued graphs.

  3. Anja Žnidaršič has done a lot of work on imputation with k-nearest neighbors for valued graphs. Mostly focused on blockmodels, but it is likely to produce better results than median or mean imputation and is still relatively simple to implement and fast to run compared to more complex treatments (e.g., imputation by Bayesian ERGM).

James further remarked that migraph-style is to make any such treatments very explicit, i.e., call a function instead of a function argument: node_degree(missingNet) ≠ node_degree(na_to_null(missingNet))
If that is the style I am not gonna argue about that. It makes programming the functions probably easier, because one needs to code missing data treatments only ones and does not need to copy paste into all functions. It will make the code very long though, but pipe operators might help here?
However, I think that measurement functions and plots should detect missing data and print a warning that a) missing data exists and b) how it is treated by that specific function (the user should be able to toggle the warning off to not spam the console in loops or on HPCs). For plots, it might be an idea that nodes with missing data are marked by default so that their isolation/lack of outdegree is not mistaken for observed isolation. this could be done by an extra color, fill, or shape. I would argue that null-tie or mean/mode imputation should be avoided for visualizations, because you are likely to create false image of the network. That is, I would impute with null-ties so that the functions work, but mark those nodes. Specific tie-missing data, aka item non-response (i.e. only some tie-variables for some dyads are missing but not necessarily entire out-degrees), are more problematic for visualization. Depending on the data, the tie variable itself might be imputed by, say a mean-value, displayed in the graphs but with a different shade and form, e.g., in a binary network with black ties, missing tie variables could be displayed with a dotted grey line? These options could be the default but if someone does not want that, well then just call na_to_null() before on the graph.

Just some ideas here...

Cheers
Robert

@jhollway jhollway self-assigned this Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature_request Requests for new functionality
Projects
None yet
Development

No branches or pull requests

2 participants