Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

falsify_graph #1216

Closed
nicolavizioli opened this issue Jun 27, 2024 · 8 comments
Closed

falsify_graph #1216

nicolavizioli opened this issue Jun 27, 2024 · 8 comments
Labels
question Further information is requested stale

Comments

@nicolavizioli
Copy link

Hi all,
I'm tryig to use the SCM module, in prticular falsify_graph.

The code (i'm using streamlit) is the following:

auto_assignment_summary = gcm.auto.assign_causal_mechanisms(
scm, data_pd, override_models=True, quality=gcm.auto.AssignmentQuality.GOOD
)
gcm.fit(scm, data_pd)

if st.session_state["choice"] == "lingam": #causal graph chososen=lingam
result = falsify_graph((st.session_state["lingam_graph_nx"]), data_pd, plot_histogram=False, suggestions=True, n_permutations=100)
st.text(result)
else: #causal graph chososen=pc
result = falsify_graph((st.session_state["pc_graph_nx"]), data_pd, plot_histogram=False, suggestions=True, n_permutations=100)
st.text(result)
where "lingam_graph_nx" or "pc_graph_nx" are causal nx.graph, data_pd is a pandas df.

Expected behavior
after fitting gcm I get the following error:

AssertionError: 0 must be list, set or str. Got <class 'int'> instead!

Traceback:

File "C:\Users\11612880\Anaconda3\envs\causal\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 600, in run_script
exec(code, module.dict)
File "C:\Users\11612880\Desktop\Causal-Discovery-PoC\pages\9_📊
SCM.py", line 144, in
result = falsify_graph((st.session_state["pc_graph_nx"]), data_pd, plot_histogram=False, suggestions=True, n_permutations=100)
File "C:\Users\11612880\Anaconda3\envs\causal\lib\site-packages\dowhy\gcm\falsify.py", line 618, in falsify_graph
summary_given = run_validations(
File "C:\Users\11612880\Anaconda3\envs\causal\lib\site-packages\dowhy\gcm\falsify.py", line 412, in run_validations
m_summary = m(causal_graph=causal_graph)
File "C:\Users\11612880\Anaconda3\envs\causal\lib\site-packages\dowhy\gcm\falsify.py", line 138, in validate_lmc
if not (node, non_desc, parents) in p_values_memory:
File "C:\Users\11612880\Anaconda3\envs\causal\lib\site-packages\dowhy\gcm\falsify.py", line 89, in contains
X, Y = (_to_frozenset(i) for i in item[:2])
File "C:\Users\11612880\Anaconda3\envs\causal\lib\site-packages\dowhy\gcm\falsify.py", line 89, in
X, Y = (_to_frozenset(i) for i in item[:2])
File "C:\Users\11612880\Anaconda3\envs\causal\lib\site-packages\dowhy\gcm\falsify.py", line 979, in _to_frozenset
assert (
Version information:

  • DoWhy version [e.g. 0.11]

Additional context
Add any other context about the problem here.

@nicolavizioli nicolavizioli added the question Further information is requested label Jun 27, 2024
@bloebp
Copy link
Member

bloebp commented Jun 27, 2024

Might be an issue with the variable names in the dataframe. Can you try the following after loading/creating the data:
data_pd.columns = [str(col) for col in data_pd.columns]

@nicolavizioli
Copy link
Author

Thnks, I tried but now I had as error:
AssertionError: 2 must be list, set or str. Got <class 'int'> instead!

the type of variable names in the dataframe is:
Index(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10'], dtype='object')
<class 'pandas.core.indexes.base.Index'>

so I don't know if this is the problem, many tks

@bloebp
Copy link
Member

bloebp commented Jun 28, 2024

Ok, do you think you can provide a small example snippet to reproduce this (you probably don't need the streamlit context here)? I can take a closer look then.

@nicolavizioli
Copy link
Author

Ok , thanks

I had a nx.graph (the nodes are int) I compute scm:

scm = gcm.StructuralCausalModel(st.session_state["pc_graph_nx"])

then for assigning causal mechanism I run:

data_pd = pd.DataFrame(data_agg) #dataframe of my data
data_pd.columns = [i for i in range(len(data_pd.columns))] #define the columns as integer
data_pd = check_and_convert_categorical(data_pd, threshold=2) #use a function to get categorical variables
auto_assignment_summary = gcm.auto.assign_causal_mechanisms(
scm, data_pd, override_models=True, quality=gcm.auto.AssignmentQuality.GOOD
)
gcm.fit(scm, data_pd)
st.text(auto_assignment_summary) #until now no problem

def convert_nodes_to_str(nx_graph): #convert nodes in string

mapping = {node: str(node) for node in nx_graph.nodes}

nx_graph = nx.relabel_nodes(nx_graph, mapping)
return nx_graph

data_pd.columns = [str(col) for col in data_pd.columns] #as suggested

and finally the code (in streamlit):

  pc_graph_nx=convert_nodes_to_str(st.session_state["pc_graph_nx"])
  print(pc_graph_nx.nodes)
  result = falsify_graph((st.session_state["pc_graph_nx"]), data_pd, plot_histogram=False, suggestions=True, n_permutations=100)
  st.text(result)

with the error:
AssertionError: 1 must be list, set or str. Got <class 'int'> instead!
Traceback:
File "C:\Users\11612880\Anaconda3\envs\causal\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 600, in run_script
exec(code, module.dict)
File "C:\Users\11612880\Desktop\Causal-Discovery-PoC\pages\9_📊
SCM.py", line 167, in
result = falsify_graph((st.session_state["pc_graph_nx"]), data_pd, plot_histogram=False, suggestions=True, n_permutations=100)
File "C:\Users\11612880\Anaconda3\envs\causal\lib\site-packages\dowhy\gcm\falsify.py", line 618, in falsify_graph
summary_given = run_validations(
File "C:\Users\11612880\Anaconda3\envs\causal\lib\site-packages\dowhy\gcm\falsify.py", line 412, in run_validations
m_summary = m(causal_graph=causal_graph)
File "C:\Users\11612880\Anaconda3\envs\causal\lib\site-packages\dowhy\gcm\falsify.py", line 138, in validate_lmc
if not (node, non_desc, parents) in p_values_memory:
File "C:\Users\11612880\Anaconda3\envs\causal\lib\site-packages\dowhy\gcm\falsify.py", line 89, in contains
X, Y = (_to_frozenset(i) for i in item[:2])
File "C:\Users\11612880\Anaconda3\envs\causal\lib\site-packages\dowhy\gcm\falsify.py", line 89, in
X, Y = (_to_frozenset(i) for i in item[:2])
File "C:\Users\11612880\Anaconda3\envs\causal\lib\site-packages\dowhy\gcm\falsify.py", line 979, in _to_frozenset
assert (

any idea? thanks

@bloebp
Copy link
Member

bloebp commented Jun 28, 2024

The column conversion needs to happen before calling the assignment and fit, otherwise the variables are expected to be integers, but are strings. I had similar issues before due to integer column names. So, maybe just change:

data_pd.columns = [i for i in range(len(data_pd.columns))] #define the columns as integer

to

data_pd.columns = [str(i) for i in range(len(data_pd.columns))]

But you also need to make sure that the node names in the graph are strings of integers instead of raw integers themselves.

@nicolavizioli
Copy link
Author

Now it works, thanks very appreciated

Copy link

This issue is stale because it has been open for 14 days with no activity.

@github-actions github-actions bot added the stale label Jul 13, 2024
Copy link

This issue was closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested stale
Projects
None yet
Development

No branches or pull requests

2 participants