Skip to content
This repository has been archived by the owner on Dec 10, 2018. It is now read-only.

Utf-8-encoded unicode in thrift definition comments causes failure of thriftpy.load in Python 3 #309

Open
aawilson opened this issue Sep 15, 2017 · 0 comments

Comments

@aawilson
Copy link

aawilson commented Sep 15, 2017

To reproduce, save the following as a .thrift file in an app that preserves the quotations as they are (rather than converting them to something ASCII-friendly) (my test file was saved as utf-8, for reference):

service PingPong {
    /* Ping to the pong with “funky quotes” y'all */
    string ping(),
}

In a Python 3 environment, run the following:

from thriftpy import load
load(path_to_thrift)

Observe something like this:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 66: character maps to <undefined>

(this was on Windows, other platforms might have other codecs listed, or maybe won't experience this problem at all).

I was personally able to fix this by adding an "encoding" argument to the open call in parser.py, but that argument doesn't exist in Python 2.7 and lower, so it is not a version-agnostic fix (and could conceivably be wrong anyway if the thrift file were saved in some other encoding for some reason, since I doubt the spec actually specifies an encoding). My guess is that file treatment will have to be rewritten to open files as binary and treat them explicitly rather than just passing them to the lexer (simply passing mode='rb' wasn't sufficient, so there's more to do).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant