-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add encoding support for shapereader #2231
Conversation
There was a previous suggestion to pass kwargs upstream, but explicitly writing encoding as done here looks right to me to ensure compatibility between Basic and Fiona readers. This will also help close other open issues #739 and #1282 related to handling encoding. |
@@ -131,9 +131,9 @@ class BasicReader: | |||
|
|||
""" | |||
|
|||
def __init__(self, filename, bbox=None): | |||
def __init__(self, filename, bbox=None, encoding='utf-8'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love that the defaults between the two readers isn't the same here... I think the **kwargs
solution referenced by @lgolston would help by only sending along the encoding when a user explicitly sets it without needing to worry about the default between Fiona and Pyshp.
@@ -186,10 +186,10 @@ class FionaReader: | |||
|
|||
""" | |||
|
|||
def __init__(self, filename, bbox=None): | |||
def __init__(self, filename, bbox=None, encoding=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def __init__(self, filename, bbox=None, encoding=None): | |
def __init__(self, filename, bbox=None, encoding='utf-8'): |
Thinking about the comment from @greglucas, there are a lot of different options since the two libraries do slightly different things (pyshp raises a TypeError if encoding=None is passed; fiona attempts to detect the encoding by default while pyshp does not; and both accept different kwargs other than encoding). I believe the simplest and most compatible change would be to simply make utf-8 the default for both. This also does not seem to lose much functionality for fiona; for example I tried reading a gbk encoded shapefile and while fiona did not raise an error, it also did not actually read it correctly without specifying the encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should restrict users to only using utf-8 by default. If users had previously installed Fiona and were reading in other encodings, they would have been able to use the data, but if we force the encoding now then that would break them. I guess there is a question of whether we should call one of these libraries our defacto standard and go with that and then manipulate keyword arguments to be compatible with that...
I still think passing **kwargs
through is the most flexible and least amount of work, very similar to what the stock_img pr is doing by forwarding them on to the underlying function: #2230
Rationale
Based on the discussion of #2041,
shapereader
should supportencoding
in order to read shape files with non-default encoding.Currently, for a shape file with GBK encoding
aomen.shp
(containing the polygon for Macau)will produce error
After this PR, the
encoding
can be specifiedand the results are
Implications
None as far as I know