Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dev: Pangea v1alpha: Do Not Merge. #2071

Draft
wants to merge 45 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
1b234c8
Adds ExternalCatalogDatasetOptions to Dataset
chalmerlowe Sep 4, 2024
8029213
adds ExternalCatalogTableOptions class and assorted content
chalmerlowe Sep 5, 2024
0992bbf
modifies argument names to snake_case
chalmerlowe Sep 11, 2024
45ddd89
replaces dtype placeholder with parameter names
chalmerlowe Sep 11, 2024
1411460
updates the inclusion of value in properties to use repr version
chalmerlowe Sep 11, 2024
20ee950
updates another inclusion of value in properties to use repr version
chalmerlowe Sep 11, 2024
bee33ef
updates type check via isinstance() or None
chalmerlowe Sep 11, 2024
15acfb3
Merge branch 'main' into add-pangea-classes
chalmerlowe Sep 11, 2024
ee69f24
adds tests related to ExternalCatalogDatasetOptions
chalmerlowe Sep 12, 2024
aeab931
Merge branch 'main' into add-pangea-classes
chalmerlowe Sep 12, 2024
f9d657b
adds test suite for ExternalCatalogTableOptions and minor tweaks else…
chalmerlowe Sep 12, 2024
89896a3
corrects Error type of failing test
chalmerlowe Sep 19, 2024
c452459
forgive me... a wild mess of tests, tweaks, etc
chalmerlowe Sep 26, 2024
199e903
Updates isinstance_or_raise, refines ExternalCatalogDatasetOptions in…
chalmerlowe Oct 2, 2024
e238ba0
Updates ExternalCatalogTableOptions and associated tests
chalmerlowe Oct 2, 2024
5fc89ae
Tweaks several docstrings
chalmerlowe Oct 2, 2024
68d04f0
Adds content related to ForeignTypeInfo
chalmerlowe Oct 2, 2024
2a5774e
add new classes and tests
chalmerlowe Oct 3, 2024
cbd08c5
Merge branch 'main' into add-pangea-classes
chalmerlowe Oct 3, 2024
0fcf424
Update tests/unit/test_schema.py
chalmerlowe Oct 3, 2024
d7698d2
Update google/cloud/bigquery/_helpers.py
chalmerlowe Oct 11, 2024
43dc45e
updates logic and tests related to _isinstance_or_raise'
chalmerlowe Oct 11, 2024
4f117a7
updates from_api_repr and a number of tests and cleans up miscellaneo…
chalmerlowe Oct 11, 2024
defa38c
Update google/cloud/bigquery/_helpers.py
chalmerlowe Oct 14, 2024
14d1bd8
Most recent round of tweaks and experiments
chalmerlowe Oct 30, 2024
1b7ba09
Updates from futures import annotation.
chalmerlowe Nov 1, 2024
79bbeb2
Updates from_api_repr() and external_config tests
chalmerlowe Nov 4, 2024
d71d904
Updates external_catalog_dataset functions in dataset.py and tests.
chalmerlowe Nov 4, 2024
b0a7fb1
Adds fixtures, tests, corrections to classes and tests
chalmerlowe Nov 6, 2024
d0d96fa
Updates comments and addes a to_api_repr test
chalmerlowe Nov 6, 2024
16e2c2c
Merge branch 'main' into add-pangea-classes
chalmerlowe Nov 12, 2024
116de78
Revises test for additional clarity
chalmerlowe Nov 13, 2024
5d0b7d6
Merge branch 'add-pangea-classes' into pangea-v1alpha
chalmerlowe Nov 15, 2024
e912133
chore: merge contents of main that were in add-pangea-classes into pa…
chalmerlowe Nov 15, 2024
0960426
Updates test_dataset to account for coverage
chalmerlowe Nov 15, 2024
8e5af84
Removes single line comment
chalmerlowe Nov 15, 2024
b67dda2
chore: syncing with main (#2067)
chalmerlowe Nov 18, 2024
236455c
fix: Updates mypy and pytype annotations, etc (#2072)
chalmerlowe Nov 22, 2024
9fd8854
feat: adds two from_api_tests: StorageDesc & SerDeInfo (#2074)
chalmerlowe Nov 25, 2024
8162a2c
feat: adds ForeignTypeInfo test (#2076)
chalmerlowe Nov 25, 2024
74beca6
Merge branch 'main' into pangea-v1alpha
chalmerlowe Nov 25, 2024
48c8cc6
feat: Adds attributes to SchemaField (#2077)
chalmerlowe Nov 26, 2024
1fcbc09
fix: updates tests on getters to resolve coverage issues (#2080)
chalmerlowe Dec 4, 2024
07bc30a
Merge branch 'main' into pangea-v1alpha
chalmerlowe Dec 6, 2024
bb5c06c
Merge branch 'main' into pangea-v1alpha
chalmerlowe Dec 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 34 additions & 1 deletion google/cloud/bigquery/_helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
import re
import os
import warnings
from typing import Optional, Union
from typing import Optional, Union, Any, Tuple, Type

from dateutil import relativedelta
from google.cloud._helpers import UTC # type: ignore
Expand Down Expand Up @@ -1004,3 +1004,36 @@ def _verify_job_config_type(job_config, expected_type, param_name="job_config"):
job_config=job_config,
)
)


def _isinstance_or_raise(
value: Any,
dtype: Union[Type, Tuple[Type, ...]],
none_allowed: Optional[bool] = False,
) -> Any:
"""Determine whether a value type matches a given datatype or None.

Args:
value (Any): Value to be checked.
dtype (type): Expected data type or tuple of data types.
none_allowed Optional(bool): whether value is allowed to be None. Default
is False.

Returns:
Any: Returns the input value if the type check is successful.

Raises:
TypeError: If the input value's type does not match the expected data type(s).
"""
if none_allowed and value is None:
return value

if isinstance(value, dtype):
return value

or_none = ""
if none_allowed:
or_none = " (or None)"

msg = f"Pass {value} as a '{dtype}'{or_none}. Got {type(value)}."
raise TypeError(msg)
29 changes: 27 additions & 2 deletions google/cloud/bigquery/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,13 @@
import google.cloud._helpers # type: ignore

from google.cloud.bigquery import _helpers
from google.cloud.bigquery._helpers import _isinstance_or_raise
from google.cloud.bigquery.model import ModelReference
from google.cloud.bigquery.routine import Routine, RoutineReference
from google.cloud.bigquery.table import Table, TableReference
from google.cloud.bigquery.encryption_configuration import EncryptionConfiguration
from google.cloud.bigquery.external_config import ExternalCatalogDatasetOptions


from typing import Optional, List, Dict, Any, Union

Expand Down Expand Up @@ -530,6 +533,7 @@ class Dataset(object):
"storage_billing_model": "storageBillingModel",
"max_time_travel_hours": "maxTimeTravelHours",
"default_rounding_mode": "defaultRoundingMode",
"external_catalog_dataset_options": "externalCatalogDatasetOptions",
}

def __init__(self, dataset_ref) -> None:
Expand Down Expand Up @@ -937,10 +941,31 @@ def _build_resource(self, filter_fields):
"""Generate a resource for ``update``."""
return _helpers._build_resource_from_properties(self, filter_fields)

table = _get_table_reference
@property
def external_catalog_dataset_options(self):
"""Options defining open source compatible datasets living in the
BigQuery catalog. Contains metadata of open source database, schema
or namespace represented by the current dataset."""

model = _get_model_reference
prop = _helpers._get_sub_prop(
self._properties, ["externalCatalogDatasetOptions"]
)

if prop is not None:
prop = ExternalCatalogDatasetOptions().from_api_repr(prop)
return prop

@external_catalog_dataset_options.setter
def external_catalog_dataset_options(self, value):
value = _isinstance_or_raise(
value, ExternalCatalogDatasetOptions, none_allowed=True
)
self._properties[
self._PROPERTY_TO_API_FIELD["external_catalog_dataset_options"]
] = value.to_api_repr()

table = _get_table_reference
model = _get_model_reference
routine = _get_routine_reference

def __repr__(self):
Expand Down
22 changes: 21 additions & 1 deletion google/cloud/bigquery/enums.py
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,11 @@ class KeyResultStatementKind:


class StandardSqlTypeNames(str, enum.Enum):
"""Enum of allowed SQL type names in schema.SchemaField.

Datatype used in GoogleSQL.
"""

def _generate_next_value_(name, start, count, last_values):
return name

Expand All @@ -267,6 +272,9 @@ def _generate_next_value_(name, start, count, last_values):
ARRAY = enum.auto()
STRUCT = enum.auto()
RANGE = enum.auto()
# NOTE: FOREIGN acts as a wrapper for data types
# not natively understood by BigQuery unless translated
FOREIGN = enum.auto()


class EntityTypes(str, enum.Enum):
Expand All @@ -285,7 +293,10 @@ class EntityTypes(str, enum.Enum):
# See also: https://cloud.google.com/bigquery/data-types#legacy_sql_data_types
# and https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types
class SqlTypeNames(str, enum.Enum):
"""Enum of allowed SQL type names in schema.SchemaField."""
"""Enum of allowed SQL type names in schema.SchemaField.

Datatype used in Legacy SQL.
"""

STRING = "STRING"
BYTES = "BYTES"
Expand All @@ -306,6 +317,9 @@ class SqlTypeNames(str, enum.Enum):
DATETIME = "DATETIME"
INTERVAL = "INTERVAL" # NOTE: not available in legacy types
RANGE = "RANGE" # NOTE: not available in legacy types
# NOTE: FOREIGN acts as a wrapper for data types
# not natively understood by BigQuery unless translated
FOREIGN = "FOREIGN"


class WriteDisposition(object):
Expand Down Expand Up @@ -344,3 +358,9 @@ class DeterminismLevel:

NOT_DETERMINISTIC = "NOT_DETERMINISTIC"
"""The UDF is not deterministic."""


class RoundingMode(enum.Enum):
ROUNDING_MODE_UNSPECIFIED = 0
ROUND_HALF_AWAY_FROM_ZERO = 1
ROUND_HALF_EVEN = 2
199 changes: 193 additions & 6 deletions google/cloud/bigquery/external_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,18 +18,22 @@
Job.configuration.query.tableDefinitions.
"""

from __future__ import absolute_import
from __future__ import absolute_import, annotations

import base64
import copy
from typing import Any, Dict, FrozenSet, Iterable, Optional, Union

from google.cloud.bigquery._helpers import _to_bytes
from google.cloud.bigquery._helpers import _bytes_to_json
from google.cloud.bigquery._helpers import _int_or_none
from google.cloud.bigquery._helpers import _str_or_none
from google.cloud.bigquery._helpers import (
_to_bytes,
_bytes_to_json,
_int_or_none,
_str_or_none,
_isinstance_or_raise,
_get_sub_prop,
)
from google.cloud.bigquery.format_options import AvroOptions, ParquetOptions
from google.cloud.bigquery.schema import SchemaField
from google.cloud.bigquery.schema import SchemaField, StorageDescriptor


class ExternalSourceFormat(object):
Expand Down Expand Up @@ -1003,3 +1007,186 @@ def from_api_repr(cls, resource: dict) -> "ExternalConfig":
config = cls(resource["sourceFormat"])
config._properties = copy.deepcopy(resource)
return config


class ExternalCatalogDatasetOptions:
"""Options defining open source compatible datasets living in the BigQuery catalog.
Contains metadata of open source database, schema or namespace represented
by the current dataset.

Args:
default_storage_location_uri (Optional[str]): The storage location URI for all
tables in the dataset. Equivalent to hive metastore's database
locationUri. Maximum length of 1024 characters. (str)
parameters (Optional[dict[str, Any]]): A map of key value pairs defining the parameters
and properties of the open source schema. Maximum size of 2Mib.
"""

def __init__(
self,
default_storage_location_uri: Optional[str] = None,
parameters: Optional[Dict[str, Any]] = None,
):
self._properties: Dict[str, Any] = {}
self.default_storage_location_uri = default_storage_location_uri
self.parameters = parameters

@property
def default_storage_location_uri(self) -> Any:
"""Optional. The storage location URI for all tables in the dataset.
Equivalent to hive metastore's database locationUri. Maximum length of
1024 characters."""

return self._properties.get("defaultStorageLocationUri")

@default_storage_location_uri.setter
def default_storage_location_uri(self, value: str):
value = _isinstance_or_raise(value, str, none_allowed=True)
self._properties["defaultStorageLocationUri"] = value

@property
def parameters(self) -> Any:
"""Optional. A map of key value pairs defining the parameters and
properties of the open source schema. Maximum size of 2Mib."""

return self._properties.get("parameters")

@parameters.setter
def parameters(self, value: dict[str, Any]):
value = _isinstance_or_raise(value, dict, none_allowed=True)
self._properties["parameters"] = value

def to_api_repr(self) -> dict:
"""Build an API representation of this object.

Returns:
Dict[str, Any]:
A dictionary in the format used by the BigQuery API.
"""
config = copy.deepcopy(self._properties)
return config

@classmethod
def from_api_repr(cls, resource: dict) -> ExternalCatalogDatasetOptions:
"""Factory: constructs an instance of the class (cls)
given its API representation.

Args:
resource (Dict[str, Any]):
API representation of the object to be instantiated.

Returns:
An instance of the class initialized with data from 'resource'.
"""
config = cls()
config._properties = copy.deepcopy(resource)
return config


class ExternalCatalogTableOptions:
"""Metadata about open source compatible table. The fields contained in these
options correspond to hive metastore's table level properties.

Args:
connection_id (Optional[str]): The connection specifying the credentials to be
used to read external storage, such as Azure Blob, Cloud Storage, or
S3. The connection is needed to read the open source table from
BigQuery Engine. The connection_id can have the form `..` or
`projects//locations//connections/`.
parameters (Union[Dict[str, Any], None]): A map of key value pairs defining the parameters
and properties of the open source table. Corresponds with hive meta
store table parameters. Maximum size of 4Mib.
storage_descriptor (Optional[StorageDescriptor]): A storage descriptor containing information
about the physical storage of this table.
"""

def __init__(
self,
connection_id: Optional[str] = None,
parameters: Union[Dict[str, Any], None] = None,
storage_descriptor: Optional[
StorageDescriptor
] = None, # TODO implement StorageDescriptor, then correct this type hint
):
self._properties = {} # type: Dict[str, Any]
self.connection_id = connection_id
self.parameters = parameters
self.storage_descriptor = storage_descriptor

@property
def connection_id(self):
"""Optional. The connection specifying the credentials to be
used to read external storage, such as Azure Blob, Cloud Storage, or
S3. The connection is needed to read the open source table from
BigQuery Engine. The connection_id can have the form `..` or
`projects//locations//connections/`. (str)
"""
return self._properties.get("connectionId")

@connection_id.setter
def connection_id(self, value: Optional[str]):
value = _isinstance_or_raise(value, str, none_allowed=True)
self._properties["connectionId"] = value

@property
def parameters(self) -> Any:
"""Optional. A map of key value pairs defining the parameters and
properties of the open source table. Corresponds with hive meta
store table parameters. Maximum size of 4Mib.
"""

return self._properties.get("parameters")

@parameters.setter
def parameters(self, value: Union[Dict[str, Any], None]):
value = _isinstance_or_raise(value, dict, none_allowed=True)
self._properties["parameters"] = value

@property
def storage_descriptor(self) -> Any:
"""Optional. A storage descriptor containing information about the
physical storage of this table."""

prop = _get_sub_prop(self._properties, ["storageDescriptor"])

if prop is not None:
prop = StorageDescriptor().from_api_repr(prop)
return prop

@storage_descriptor.setter
def storage_descriptor(self, value):
value = _isinstance_or_raise(value, StorageDescriptor, none_allowed=True)
if value is not None:
self._properties["storageDescriptor"] = value.to_api_repr()
else:
self._properties["storageDescriptor"] = value

def to_api_repr(self) -> dict:
"""Build an API representation of this object.

Returns:
Dict[str, Any]:
A dictionary in the format used by the BigQuery API.
"""

config = copy.deepcopy(self._properties)
return config

@classmethod
def from_api_repr(cls, resource: dict) -> ExternalCatalogTableOptions:
"""Factory: constructs an instance of the class (cls)
given its API representation.

Args:
resource (Dict[str, Any]):
API representation of the object to be instantiated.

Returns:
An instance of the class initialized with data from 'resource'.
"""
config = cls()
config._properties = copy.deepcopy(resource)
return config

def __eq__(self, value):
return self.to_api_repr() == value.to_api_repr()
1 change: 0 additions & 1 deletion google/cloud/bigquery/query.py
Original file line number Diff line number Diff line change
Expand Up @@ -1003,7 +1003,6 @@ def __init__(
):
self.name = name
self.range_element_type = self._parse_range_element_type(range_element_type)
print(self.range_element_type.type_._type)
self.start = start
self.end = end

Expand Down
Loading
Loading