You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now that get_reader is its own project, it would be useful to explicitly define its scope and goals. We can always redefine these terms in the future but a working definition can help guide development and prevent scope creep.
The initial motivation for get_reader was to provide a common interface for reading Unicode CSV data across different versions of Python. Reading Unicode CSV data is very different in Python 3 than it was in Python 2.
Here's what I'm thinking for this working definition:
Essential Properties
The get_reader project should:
Provide a common interface for reading tabular data across different versions of Python.
Provide simplified interfaces to multiple data sources that might otherwise have unfamiliar APIs (like a simplified version of the IO tools sub-package in pandas except without the overhead of a dependency as large as pandas).
Be easily vendorable by simply copying it into the other project's directory (no hard third-party dependencies and no modifications to get_reader's source code).
Provide broad support for many different versions of Python.
Read data using memory-efficient iteration (unless explicitly directed to do otherwise)--to support reading data from sources that are larger than available memory.
Non-essential Properties
Provide tools for working with reader and reader-like objects (e.g., ReaderLike for type checking).
Adding an Interface
Before adding an interface (e.g., from_sql(), from_excel(), etc.) it is useful to ask the following questions:
PROs:
Does the interface unify differences across multiple version of Python? Bonus points if it unifies differences between Python 2 and 3.
Can the interface reduce the number of objects a user would otherwise need to manage explicitly (automatically closing files or database cursors)?
Does using the interface take less lines of boilerplate code than it would require to read the data directly? How many lines of boilerplate code does it save? Can it do this reliably without introducing ambiguity or unpredictability?
Does the interface simplify reading data from sources that might otherwise have an unfamiliar API (e.g., DBF, Excel)?
CONs:
Does the interface obfuscate a standard or otherwise well-known API?
Would the feature introduce an API or behavior that is inconsistent with existing interfaces?
Would including the feature compromise the get_reader project's status as a light-weight, easy-to-include dependency?
The text was updated successfully, but these errors were encountered:
Now that
get_reader
is its own project, it would be useful to explicitly define its scope and goals. We can always redefine these terms in the future but a working definition can help guide development and prevent scope creep.The initial motivation for
get_reader
was to provide a common interface for reading Unicode CSV data across different versions of Python. Reading Unicode CSV data is very different in Python 3 than it was in Python 2.Here's what I'm thinking for this working definition:
The text was updated successfully, but these errors were encountered: