Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1331404: fetch_pandas_all converts arrays into strings #1922

Open
lostkamp opened this issue Apr 16, 2024 · 9 comments
Open

SNOW-1331404: fetch_pandas_all converts arrays into strings #1922

lostkamp opened this issue Apr 16, 2024 · 9 comments
Assignees
Labels
enhancement The issue is a request for improvement or a new feature status-triage_done Initial triage done, will be further handled by the driver team

Comments

@lostkamp
Copy link

Python version

Python 3.11.7 (main, Mar 12 2024, 09:54:34) [GCC 13.2.0]

Operating system and processor architecture

Linux-6.5.0-27-generic-x86_64-with-glibc2.38

Installed packages

annotated-types==0.6.0
anyio==4.3.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asn1crypto==1.5.1
asttokens==2.4.1
async-lru==2.0.4
attrs==23.2.0
Babel==2.14.0
beautifulsoup4==4.12.3
bleach==6.1.0
botocore==1.34.81
certifi==2024.2.2
cffi==1.16.0
charset-normalizer==3.3.2
click==8.1.7
colorama==0.4.4
comm==0.2.1
contourpy==1.2.0
cryptography==42.0.5
cycler==0.12.1
debugpy==1.8.1
decorator==5.1.1
defusedxml==0.7.1
docutils==0.16
entrypoints==0.4
executing==2.0.1
fastapi==0.110.1
fastjsonschema==2.19.1
filelock==3.13.1
flake8==4.0.1
flake8-quotes==3.4.0
flakeheaven==3.3.0
fonttools==4.49.0
fqdn==1.5.1
h11==0.14.0
httpcore==1.0.4
httpx==0.27.0
idna==3.6
iniconfig==2.0.0
ipykernel==6.29.3
ipython==8.22.2
isoduration==20.11.0
jedi==0.19.1
Jinja2==3.1.3
jmespath==1.0.1
joblib==1.3.2
json5==0.9.22
jsonpointer==2.4
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
jupyter-events==0.9.0
jupyter-lsp==2.2.4
jupyter_client==8.6.0
jupyter_core==5.7.1
jupyter_server==2.13.0
jupyter_server_terminals==0.5.2
jupyterlab==4.1.2
jupyterlab_pygments==0.3.0
jupyterlab_server==2.25.4
kiwisolver==1.4.5
MarkupSafe==2.1.5
matplotlib==3.8.3
matplotlib-inline==0.1.6
mccabe==0.6.1
mistune==3.0.2
msal==1.28.0
nbclient==0.9.0
nbconvert==7.16.2
nbformat==5.9.2
nest-asyncio==1.6.0
notebook_shim==0.2.4
numpy==1.26.4
overrides==7.7.0
packaging==24.0
pandas==2.2.0
pandocfilters==1.5.1
parso==0.8.3
pexpect==4.9.0
pillow==10.2.0
platformdirs==3.11.0
pluggy==1.4.0
prometheus_client==0.20.0
prompt-toolkit==3.0.43
psutil==5.9.8
ptyprocess==0.7.0
pure-eval==0.2.2
pyarrow==15.0.2
pyasn1==0.6.0
pycodestyle==2.8.0
pycparser==2.21
pydantic==2.6.3
pydantic_core==2.16.3
pyflakes==2.4.0
Pygments==2.17.2
PyJWT==2.8.0
pyOpenSSL==24.1.0
pyparsing==3.1.2
pytest==8.0.1
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-json-logger==2.0.7
pytz==2024.1
PyYAML==6.0.1
pyzmq==25.1.2
referencing==0.33.0
requests==2.31.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rpds-py==0.18.0
rsa==4.7.2
s3transfer==0.10.1
scikit-learn==1.4.1.post1
scipy==1.12.0
seaborn==0.13.2
Send2Trash==1.8.2
six==1.16.0
sniffio==1.3.1
snowflake-connector-python==3.8.1
sortedcontainers==2.4.0
soupsieve==2.5
stack-data==0.6.3
starlette==0.37.2
terminado==0.18.0
threadpoolctl==3.3.0
tinycss2==1.2.1
toml==0.10.2
tomlkit==0.12.4
tornado==6.4
traitlets==5.14.1
types-python-dateutil==2.8.19.20240311
typing_extensions==4.10.0
tzdata==2024.1
uri-template==1.3.0
urllib3==2.2.1
uvicorn==0.28.0
wcwidth==0.2.13
webcolors==1.13
webencodings==0.5.1
websocket-client==1.7.0

What did you do?

Fetching a column of type array turns the arrays into strings in the dataframe, which makes them difficult to parse.


>>> query = 'select array_construct(10, 20, 30) as col'
>>> df = cursor.execute(query).fetch_pandas_all()
>>> df
                        COL
0  [\n  10,\n  20,\n  30\n]
>>> type(df['COL'].iloc[0])
str

What did you expect to see?

I would expect arrays to be represented as a list or tuple of values in the dataframe cell.

Can you set logging to DEBUG and collect the logs?

import logging
import os

for logger_name in ('snowflake.connector',):
    logger = logging.getLogger(logger_name)
    logger.setLevel(logging.DEBUG)
    ch = logging.StreamHandler()
    ch.setLevel(logging.DEBUG)
    ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
    logger.addHandler(ch)
@github-actions github-actions bot changed the title fetch_pandas_all converts arrays into strings SNOW-1331404: fetch_pandas_all converts arrays into strings Apr 16, 2024
@sfc-gh-sghosh
Copy link

Hello @lostkamp ,

Thanks for raising the issue, we are checking it, will update.

Regards,
Sujan

@sfc-gh-sghosh sfc-gh-sghosh self-assigned this Apr 17, 2024
@sfc-gh-sghosh sfc-gh-sghosh added status-triage Issue is under initial triage and removed bug needs triage labels Apr 17, 2024
@sfc-gh-sghosh
Copy link

Hello @lostkamp ,

This is currently the expected behavior. Adding support for returning structured data is currently being worked on.
Will update further.

Regards,
Sujan

@sfc-gh-sghosh sfc-gh-sghosh added status-triage_done Initial triage done, will be further handled by the driver team bug python Pull requests that update Python code and removed status-triage Issue is under initial triage labels Apr 17, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka added enhancement The issue is a request for improvement or a new feature and removed python Pull requests that update Python code bug labels Apr 23, 2024
@sfc-gh-dszmolka
Copy link
Contributor

this should be now resolved i suppose, now that v3.10.0 is released

@lostkamp
Copy link
Author

Hi @sfc-gh-sghosh and @sfc-gh-dszmolka ,

thanks for the update. Unfortunately I am getting exactly the same behavior as before when using v3.10.0.

@sfc-gh-dszmolka
Copy link
Contributor

reopened and we'll look further

@lostkamp
Copy link
Author

lostkamp commented Jul 3, 2024

Hi,
is there any update on this issue?

@sfc-gh-mkeller
Copy link
Collaborator

@lostkamp I believe this is caused by the fact that we don't support unstructured objects. You can however work around this by telling Snowflake the type yourself. Check the following example out:

SELECT
  SYSTEM$TYPEOF(
    array_construct(10, 20, 30)::ARRAY(NUMBER)
  ) AS structured_array,
  SYSTEM$TYPEOF(
    array_construct(10, 20, 30)
  ) AS semi_structured_array;

produces:

STRUCTURED_ARRAY SEMI_STRUCTURED_ARRAY
ARRAY(NUMBER(38,0))[LOB] ARRAY[LOB]

So your code could work this way:

>>> query = 'select array_construct(10, 20, 30) :: array(number) as col'
>>> df = cursor.execute(query).fetch_pandas_all()
>>> df
            COL
0  [10, 20, 30]
>>> type(df['COL'].iloc[0])
<class 'numpy.ndarray'>

@lostkamp
Copy link
Author

@sfc-gh-mkeller thank you for the explanation!

However your code still gives me strings:

>>> query = 'select array_construct(10, 20, 30)::array(number) as col'
>>> df = cursor.execute(query).fetch_pandas_all()
>>> df
                        COL
0  [\n  10,\n  20,\n  30\n]
>>> type(df['COL'].iloc[0])
<class 'str'>

using v3.12.3

@sfc-gh-mkeller
Copy link
Collaborator

Unfortunately @lostkamp as it turns out the code works, but it requires some Snowflake features that are not released yet. One caveat that I want pass along is that as things are now this only works with Iceberg tables.

Let's keep this ticket open to track this feature request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement The issue is a request for improvement or a new feature status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

6 participants