Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: avoid 403 from to_gbq when table has policyTags #356

Merged
merged 3 commits into from
Mar 30, 2021

Conversation

tswast
Copy link
Collaborator

@tswast tswast commented Mar 29, 2021

Follow-up to googleapis/python-bigquery#557

  • closes #xxxx
  • tests added / passed
  • passes nox -s blacken lint
  • docs/source/changelog.rst entry

@@ -0,0 +1,95 @@
"""Module for checking dependency versions and supported features."""
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled this out of gbq.py, because load.py also needs some of this logic now.

yield remaining_rows
client.load_table_from_file(
chunk_buffer,
if FEATURES.bigquery_has_from_dataframe_with_csv:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly necessary (just the "omit policyTags" logic was), but I thought this might be a good opportunity to use more logic from google-cloud-bigquery, per #339

The CSV encoding in google-cloud-bigquery is still relatively new, so I didn't want to bump our minimum google-cloud-bigquery versions yet. Discussion: #357

@tswast
Copy link
Collaborator Author

tswast commented Mar 29, 2021

conda-3.7 test failures:

=================================== FAILURES ===================================
_____________________ TestReadGBQIntegration.test_ddl[env] _____________________

self = <tests.system.test_gbq.TestReadGBQIntegration object at 0x7f9dc005a850>
random_dataset = Dataset(DatasetReference('****************', 'pandas_gbq_c4735f6e_fbe3_4373_9148_93ad820ed06e'))
project_id = '****************'

    def test_ddl(self, random_dataset, project_id):
        # Bug fix for https://github.com/pydata/pandas-gbq/issues/45
        df = gbq.read_gbq(
            "CREATE OR REPLACE TABLE {}.test_ddl (x INT64)".format(
>               random_dataset.dataset_id
            )
        )

tests/system/test_gbq.py:635: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas_gbq/gbq.py:857: in read_gbq
    dtypes=dtypes,
pandas_gbq/gbq.py:481: in run_query
    user_dtypes=dtypes,
pandas_gbq/gbq.py:536: in _download_results
    **to_dataframe_kwargs
/opt/conda/envs/test-environment/lib/python3.7/site-packages/google/cloud/bigquery/table.py:1478: in to_dataframe
    return self._to_dataframe_tabledata_list(dtypes, progress_bar=progress_bar)
/opt/conda/envs/test-environment/lib/python3.7/site-packages/google/cloud/bigquery/table.py:1317: in _to_dataframe_tabledata_list
    current_frame = self._to_dataframe_dtypes(page, column_names, dtypes)
/opt/conda/envs/test-environment/lib/python3.7/site-packages/google/cloud/bigquery/table.py:1309: in _to_dataframe_dtypes
    return pandas.DataFrame(columns, columns=column_names)
/opt/conda/envs/test-environment/lib/python3.7/site-packages/pandas/core/frame.py:348: in __init__
    mgr = self._init_dict(data, index, columns, dtype=dtype)
/opt/conda/envs/test-environment/lib/python3.7/site-packages/pandas/core/frame.py:451: in _init_dict
    nan_dtype)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

value = nan, length = 0, dtype = <class 'object'>

    def construct_1d_arraylike_from_scalar(value, length, dtype):
        """
        create a np.ndarray / pandas type of specified shape and dtype
        filled with values
    
        Parameters
        ----------
        value : scalar value
        length : int
        dtype : pandas_dtype / np.dtype
    
        Returns
        -------
        np.ndarray / pandas type of length, filled with value
    
        """
        if is_datetimetz(dtype):
            from pandas import DatetimeIndex
            subarr = DatetimeIndex([value] * length, dtype=dtype)
        elif is_categorical_dtype(dtype):
            from pandas import Categorical
            subarr = Categorical([value] * length, dtype=dtype)
        else:
            if not isinstance(dtype, (np.dtype, type(np.dtype))):
>               dtype = dtype.dtype
E               AttributeError: type object 'object' has no attribute 'dtype'

/opt/conda/envs/test-environment/lib/python3.7/site-packages/pandas/core/dtypes/cast.py:1196: AttributeError
__________________ TestReadGBQIntegration.test_zero_rows[env] __________________

self = <tests.system.test_gbq.TestReadGBQIntegration object at 0x7f9dc01ea890>
project_id = '****************'

    def test_zero_rows(self, project_id):
        # Bug fix for https://github.com/pandas-dev/pandas/issues/10273
        df = gbq.read_gbq(
            'SELECT name, number, (mlc_class = "HU") is_hurricane, iso_time '
            "FROM `bigquery-public-data.noaa_hurricanes.hurricanes` "
            'WHERE iso_time = TIMESTAMP("1900-01-01 00:00:00") ',
            project_id=project_id,
>           credentials=self.credentials,
        )

tests/system/test_gbq.py:662: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas_gbq/gbq.py:857: in read_gbq
    dtypes=dtypes,
pandas_gbq/gbq.py:481: in run_query
    user_dtypes=dtypes,
pandas_gbq/gbq.py:536: in _download_results
    **to_dataframe_kwargs
/opt/conda/envs/test-environment/lib/python3.7/site-packages/google/cloud/bigquery/table.py:1478: in to_dataframe
    return self._to_dataframe_tabledata_list(dtypes, progress_bar=progress_bar)
/opt/conda/envs/test-environment/lib/python3.7/site-packages/google/cloud/bigquery/table.py:1317: in _to_dataframe_tabledata_list
    current_frame = self._to_dataframe_dtypes(page, column_names, dtypes)
/opt/conda/envs/test-environment/lib/python3.7/site-packages/google/cloud/bigquery/table.py:1309: in _to_dataframe_dtypes
    return pandas.DataFrame(columns, columns=column_names)
/opt/conda/envs/test-environment/lib/python3.7/site-packages/pandas/core/frame.py:348: in __init__
    mgr = self._init_dict(data, index, columns, dtype=dtype)
/opt/conda/envs/test-environment/lib/python3.7/site-packages/pandas/core/frame.py:451: in _init_dict
    nan_dtype)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

value = nan, length = 0, dtype = <class 'object'>

    def construct_1d_arraylike_from_scalar(value, length, dtype):
        """
        create a np.ndarray / pandas type of specified shape and dtype
        filled with values
    
        Parameters
        ----------
        value : scalar value
        length : int
        dtype : pandas_dtype / np.dtype
    
        Returns
        -------
        np.ndarray / pandas type of length, filled with value
    
        """
        if is_datetimetz(dtype):
            from pandas import DatetimeIndex
            subarr = DatetimeIndex([value] * length, dtype=dtype)
        elif is_categorical_dtype(dtype):
            from pandas import Categorical
            subarr = Categorical([value] * length, dtype=dtype)
        else:
            if not isinstance(dtype, (np.dtype, type(np.dtype))):
>               dtype = dtype.dtype
E               AttributeError: type object 'object' has no attribute 'dtype'

/opt/conda/envs/test-environment/lib/python3.7/site-packages/pandas/core/dtypes/cast.py:1196: AttributeError
_______________ test_to_gbq_wo_verbose_w_new_pandas_no_warnings ________________

monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x7f9dbf7947d0>
recwarn = WarningsRecorder(record=True)

    def test_to_gbq_wo_verbose_w_new_pandas_no_warnings(monkeypatch, recwarn):
        monkeypatch.setattr(
            type(FEATURES),
            "pandas_has_deprecated_verbose",
            mock.PropertyMock(return_value=True),
        )
        try:
            gbq.to_gbq(
                DataFrame([[1]]), "dataset.tablename", project_id="my-project"
            )
        except gbq.TableCreationError:
            pass
>       assert len(recwarn) == 0
E       assert 1 == 0
E         +1
E         -0

tests/unit/test_gbq.py:172: AssertionError
_______________ test_to_gbq_with_verbose_old_pandas_no_warnings ________________

monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x7f9dbf53b490>
recwarn = WarningsRecorder(record=True)

    def test_to_gbq_with_verbose_old_pandas_no_warnings(monkeypatch, recwarn):
        monkeypatch.setattr(
            type(FEATURES),
            "pandas_has_deprecated_verbose",
            mock.PropertyMock(return_value=False),
        )
        try:
            gbq.to_gbq(
                DataFrame([[1]]),
                "dataset.tablename",
                project_id="my-project",
                verbose=True,
            )
        except gbq.TableCreationError:
            pass
>       assert len(recwarn) == 0
E       assert 1 == 0
E         +1
E         -0

tests/unit/test_gbq.py:190: AssertionError
=============================== warnings summary ===============================
../../opt/conda/envs/test-environment/lib/python3.7/site-packages/pandas/core/dtypes/inference.py:6
  /opt/conda/envs/test-environment/lib/python3.7/site-packages/pandas/core/dtypes/inference.py:6: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
    from collections import Iterable

../../opt/conda/envs/test-environment/lib/python3.7/site-packages/pandas/core/tools/datetimes.py:3
  /opt/conda/envs/test-environment/lib/python3.7/site-packages/pandas/core/tools/datetimes.py:3: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
    from collections import MutableMapping

../../opt/conda/envs/test-environment/lib/python3.7/site-packages/pandas/util/testing.py:47
  /opt/conda/envs/test-environment/lib/python3.7/site-packages/pandas/util/testing.py:47: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
  Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
    from pandas._libs import testing as _testing

tests/system/test_auth.py:66
  /root/project/tests/system/test_auth.py:66: PytestUnknownMarkWarning: Unknown pytest.mark.local_auth - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/mark.html
    @pytest.mark.local_auth

tests/system/test_auth.py:82
  /root/project/tests/system/test_auth.py:82: PytestUnknownMarkWarning: Unknown pytest.mark.local_auth - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/mark.html
    @pytest.mark.local_auth

tests/system/test_auth.py:94
  /root/project/tests/system/test_auth.py:94: PytestUnknownMarkWarning: Unknown pytest.mark.local_auth - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/mark.html
    @pytest.mark.local_auth

../../opt/conda/envs/test-environment/lib/python3.7/site-packages/pandas/core/indexes/base.py:378: 6 warnings
tests/system/test_gbq.py: 297 warnings
tests/unit/test_gbq.py: 7 warnings
tests/unit/test_load.py: 8 warnings
tests/unit/test_timestamp.py: 5 warnings
  /opt/conda/envs/test-environment/lib/python3.7/site-packages/pandas/core/indexes/base.py:378: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
  Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
    elif issubclass(data.dtype.type, np.bool) or is_bool_dtype(data):

tests/system/test_gbq.py: 54 warnings
  /root/project/pandas_gbq/gbq.py:536: UserWarning: A progress bar was requested, but there was an error loading the tqdm library. Please install tqdm to use the progress bar functionality.
    **to_dataframe_kwargs

tests/system/test_gbq.py: 12 warnings
tests/unit/test_gbq.py: 5 warnings
  /opt/conda/envs/test-environment/lib/python3.7/site-packages/pandas/core/frame.py:7616: DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe. 
  Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
    if dtype != object and dtype != np.object:

-- Docs: https://docs.pytest.org/en/stable/warnings.html

---------- coverage: platform linux, python 3.7.10-final-0 -----------
Coverage XML written to file /tmp/pytest-cov.xml

=========================== short test summary info ============================
FAILED tests/system/test_gbq.py::TestReadGBQIntegration::test_ddl[env] - Attr...
FAILED tests/system/test_gbq.py::TestReadGBQIntegration::test_zero_rows[env]
FAILED tests/unit/test_gbq.py::test_to_gbq_wo_verbose_w_new_pandas_no_warnings
FAILED tests/unit/test_gbq.py::test_to_gbq_with_verbose_old_pandas_no_warnings
= 4 failed, 162 passed, 8 skipped, 3 deselected, 400 warnings in 233.09s (0:03:53) =

The dtype errors are possibly fixed if we bump the minimum version of pandas? Not sure what the extra warning we're getting in to_gbq is. Maybe?

tests/unit/test_gbq.py: 5 warnings
  /opt/conda/envs/test-environment/lib/python3.7/site-packages/pandas/core/frame.py:7616: DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe. 
  Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
    if dtype != object and dtype != np.object:

@tswast tswast merged commit 853f792 into googleapis:master Mar 30, 2021
@tswast tswast deleted the b182204971-policy-tags branch March 30, 2021 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant