You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I have a set of unit tests that check the functionality of code that uses the fugue_sql API with a DuckDB backend. When running these tests locally, they all pass without any issue. However, when I run these as part of a Github actions workflow, I frequently encounter a segmentation fault that occurs at the following location
Current thread 0x00007f4e615547[40](https://github.com/****/****/actions/runs/4555672657/jobs/8035039892#step:7:41) (most recent call first):
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue_duckdb/dataframe.py", line 101 in as_arrow
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue_duckdb/dataframe.py", line 110 in as_local_bounded
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue/dataframe/dataframe.py", line 90 in as_local
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue_duckdb/execution_engine.py", line 521 in convert_yield_dataframe
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue/workflow/_tasks.py", line 1[47](https://github.com/****/****/actions/runs/4555672657/jobs/8035039892#step:7:48) in set_result
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue/workflow/_tasks.py", line 293 in execute
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/adagio/instances.py", line 683 in run
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/adagio/instances.py", line 171 in run_single
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/adagio/instances.py", line 155 in run_tasks
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/adagio/instances.py", line 129 in run
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/adagio/instances.py", line 270 in run
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue/workflow/_workflow_context.py", line 54 in run
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue/workflow/workflow.py", line 1584 in run
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue/sql/api.py", line 107 in fugue_sql
The function that fails has the following form
deffilter_df(
df: pd.DataFrame,
outlets: pd.DataFrame,
adjustments: pd.DataFrame,
):
query="""keys = SELECT DateId, ProductId, LocationId, AdjustmentFactor, AdjustmentType, id FROM adjustments INNER JOIN outlets USING (LocationId) fdt = SELECT * FROM keys INNER JOIN df USING (DateId, ProductId, LocationId)"""result=fa.fugue_sql(
query,
df=df,
outlets=outlets,
adjustments=adjustments,
engine='duckdb',
as_fugue=True,
)
returnresult.as_pandas()
And I have multiple unit tests that call this function. It's difficult to fully isolate the problem as I can't fully reproduce it locally.
In this instance, I have been able to refactor my function to use the fugue api, but it would be good to be able to use the fugue_sql API for more complex queries where the SQL syntax is more suitable.
One problem I saw in unit tests of duckdb is that it can have weird behaviors because the duckdb connection are not properly closed at certain step so the following steps are having issues.
Hi @goodwanghan, thanks for looking into this. I'm currently using 0.7.1 which I believe is the latest version.
It wouldn't surprise me if it's related to trying to a previous duckdb connection not being properly closed, but for now I will stick with the fugue API.
Minimal Code To Reproduce
Describe the bug
I have a set of unit tests that check the functionality of code that uses the
fugue_sql
API with a DuckDB backend. When running these tests locally, they all pass without any issue. However, when I run these as part of a Github actions workflow, I frequently encounter a segmentation fault that occurs at the following locationThe function that fails has the following form
And I have multiple unit tests that call this function. It's difficult to fully isolate the problem as I can't fully reproduce it locally.
In this instance, I have been able to refactor my function to use the fugue api, but it would be good to be able to use the fugue_sql API for more complex queries where the SQL syntax is more suitable.
Expected behavior
I would expect these unit tests to run successfully.
Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: