-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Geometry column #400
Comments
Now that we have some Geometry support in the We may just need to pass |
I think I just hit a regression because of this. Loading a shapefile into BigQuery with geopandas worked back on August 5th, 2021 but now fails with
some of the conda-forge packages in use:
Looks like Is fixing this a matter of getting this mapping to map geopandas types to I might be able to put together a PR if I can get guidance or confirmation on this from someone in the know. |
I got good help elsewhere and wanted to follow up with a solution/workaround. My issue can be resolved by converting geopandas' working_shapefile = pd.DataFrame({
col: (shapefile[col] if col != 'geometry' else [g.wkt for g in shapefile[col].values])
for col in shapefile.columns.values
}) Sorry for the spam. Hopefully, this will help someone in the future. |
Here is some extra input which might be helpful in case geometry support will be added. The solution by @brews worked for a few rows in my case, I needed to manually specify the schema to get it to parse the geometry column as GEOGRAPHY. This is what I used: import geopandas as gpd
from google.cloud import bigquery
client = bigquery.Client()
table_id = "dataset.tablename"
df = gpd.read_file("my_file")
#determine schema
type_dict = {
'b' : 'BOOLEAN',
'i' : 'INTEGER',
'f' : 'FLOAT',
'O' : 'STRING',
'S' : 'STRING',
'U' : 'STRING'
}
schema = [{'name' : col_name, 'type' : "GEOGRAPHY" if col_name == "geometry" else type_dict.get(col_type.kind, 'STRING')} for (col_name, col_type) in df.dtypes.iteritems()]
#https://cloud.google.com/bigquery/docs/pandas-gbq-migration#loading_a_pandas_dataframe_to_a_table
job_config = bigquery.LoadJobConfig(schema=schema)
job = client.load_table_from_dataframe(
df.to_wkt(), #same output as github issue solution
table_id,
job_config=job_config
)
job.result() This worked for a small dataset but for a larger set I quickly started running into all kinds of errors where bq would not accept the polygons:
I found a potential solution using geojson instead of wkt: https://stackoverflow.com/questions/62233152/uploading-to-bigquery-gis-invalid-nesting-loop-1-should-not-contain-loop-0 After quite some experimentation I found a way to create a df that seemed acceptable to bq similar to the solution above: df_json = pd.DataFrame({
col: (df[col] if col != 'geometry' else df[col].map(lambda x: json.dumps(shapely.geometry.mapping(x))))
for col in df
}) This led to another similar error as before:
I then found that bq has an option to fix this type of data using import geopandas as gpd
from google.cloud import bigquery
client = bigquery.Client()
table_id = "dataset.tablename"
df = gpd.read_file("my_file")
df.to_wkt().to_gbq(table_id,if_exists="replace")
cols = ",".join("st_geogfromtext(geometry, make_valid => TRUE) as geometry" if col == "geometry" else col for col in df)
query = f"CREATE OR REPLACE TABLE {table_id} AS SELECT {cols} FROM {table_id}"
# print(query)
query_job = client.query(query) I thought I'd share this solution in case others have similar issues. It would be nice if EDIT: I learned that shapely also has make_valid, this still didn't work with import geopandas as gpd
import json
import shapely
from shapely.validation import make_valid
from google.cloud import bigquery
client = bigquery.Client()
table_id = "dataset.tablename"
df = gpd.read_file("my_file")
#determine schema
type_dict = {
'b' : 'BOOLEAN',
'i' : 'INTEGER',
'f' : 'FLOAT',
'O' : 'STRING',
'S' : 'STRING',
'U' : 'STRING'
}
schema = [{'name' : col_name, 'type' : "GEOGRAPHY" if col_name == "geometry" else type_dict.get(col_type.kind, 'STRING')} for (col_name, col_type) in df.dtypes.iteritems()]
df_json = pd.DataFrame({
col: (df[col] if col != 'geometry' else df[col].map(lambda x: ujson.dumps(shapely.geometry.mapping(make_valid(x)))))
for col in df
})
job_config = bigquery.LoadJobConfig(schema=schema)
job = client.load_table_from_dataframe(
df_json,
table_id,
job_config=job_config
)
job.result() |
https://geopandas.org/docs/reference/api/geopandas.GeoDataFrame.html lists to_gbq, but I think it just inherits pandas-gbq; is there any plans to support a geometry column here or should this be in geopandas?
Right now I think the geometry column gets converted to string in the schema
The text was updated successfully, but these errors were encountered: