Skip to content
This repository has been archived by the owner on Jul 11, 2023. It is now read-only.

Goals #1

Closed
pwalsh opened this issue Aug 24, 2015 · 2 comments
Closed

Goals #1

pwalsh opened this issue Aug 24, 2015 · 2 comments

Comments

@pwalsh
Copy link
Member

pwalsh commented Aug 24, 2015

@rgrp @pudo @danfowler

Based on email discussion with Friedrich last month, it would be really useful to implement a generic tabular data reader library.

  • MessyTables implements one
    • Supports CSV, Excel, HTML, OpenOffice
  • GoodTables implements one
    • Supports CSV, Excel, I did some initial tests with JSON using ijson
  • Extracting out a common lib will make easier reuse, make it easier for community contrib, and give us a shared base for multiple other data wrangling libs (eg: I could use it right now in jsontableschema)

MessyTables has a cleaner internal API.

GoodTables is simpler in that it just returns raw cell values, and it always returns them as utf-8 encoded text (unicode on Py2, string on Py3).

Copying from email between Friedrich and myself, and adding a few things, here is a high level overview of how I'd like a standalone lib to look:

  • Works on Python 2/3
  • Whatever the input (any encoding, any supported format), the output will be a csv-like iterable of data encoded as utf-8 text strings
  • Input formats:
    • CSV
    • Excel
    • JSON
    • ODS
    • Google Spreadsheet?
  • Want to work with text data as text streams - following Python 3 API preferences
  • Want to build libs (data processors) that handle such data around a common format: csv-like iterable (arrays of utf-8 encoded text strings) is good for this - this is how GT works internally
    • Probably an option to get rows as arrays or as dicts
  • Takes an argument to cast values on iteration, based on both JSON Table Schema and JSON Schema
    • Obviously some input formats, like JSON, may already be typed. In this case "casting" might do something like raise a MismatchedTypeError

Using existing code as a base, we should be able to get something up pretty easily I think.

@pudo
Copy link

pudo commented Aug 24, 2015

I agree on the thrust of this, but would prefer to make it into a major iteration of messytables. Proposing messytables 2: okfn/messytables#142

@pwalsh
Copy link
Member Author

pwalsh commented Nov 23, 2015

Closing the goals issue now. We're doing an implementation in #2 and we'll see where it goes from there, in terms of MessyTables2/GoodTables integration.

@pwalsh pwalsh closed this as completed Nov 23, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants