Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project Scope #1

Closed
shawnbrown opened this issue Aug 24, 2019 · 0 comments
Closed

Project Scope #1

shawnbrown opened this issue Aug 24, 2019 · 0 comments

Comments

@shawnbrown
Copy link
Owner

shawnbrown commented Aug 24, 2019

Now that get_reader is its own project, it would be useful to explicitly define its scope and goals. We can always redefine these terms in the future but a working definition can help guide development and prevent scope creep.

The initial motivation for get_reader was to provide a common interface for reading Unicode CSV data across different versions of Python. Reading Unicode CSV data is very different in Python 3 than it was in Python 2.

Here's what I'm thinking for this working definition:

Essential Properties

The get_reader project should:

  1. Provide a common interface for reading tabular data across different versions of Python.
  2. Provide simplified interfaces to multiple data sources that might otherwise have unfamiliar APIs (like a simplified version of the IO tools sub-package in pandas except without the overhead of a dependency as large as pandas).
  3. Be easily vendorable by simply copying it into the other project's directory (no hard third-party dependencies and no modifications to get_reader's source code).
  4. Provide broad support for many different versions of Python.
  5. Read data using memory-efficient iteration (unless explicitly directed to do otherwise)--to support reading data from sources that are larger than available memory.

Non-essential Properties

  1. Provide tools for working with reader and reader-like objects (e.g., ReaderLike for type checking).

Adding an Interface

Before adding an interface (e.g., from_sql(), from_excel(), etc.) it is useful to ask the following questions:

PROs:

  • Does the interface unify differences across multiple version of Python? Bonus points if it unifies differences between Python 2 and 3.
  • Can the interface reduce the number of objects a user would otherwise need to manage explicitly (automatically closing files or database cursors)?
  • Does using the interface take less lines of boilerplate code than it would require to read the data directly? How many lines of boilerplate code does it save? Can it do this reliably without introducing ambiguity or unpredictability?
  • Does the interface simplify reading data from sources that might otherwise have an unfamiliar API (e.g., DBF, Excel)?

CONs:

  • Does the interface obfuscate a standard or otherwise well-known API?
  • Would the feature introduce an API or behavior that is inconsistent with existing interfaces?
  • Would including the feature compromise the get_reader project's status as a light-weight, easy-to-include dependency?
@shawnbrown shawnbrown pinned this issue Aug 24, 2019
@shawnbrown shawnbrown changed the title Define Project Scope Project Scope Sep 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant