You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 11, 2023. It is now read-only.
Supports CSV, Excel, I did some initial tests with JSON using ijson
Extracting out a common lib will make easier reuse, make it easier for community contrib, and give us a shared base for multiple other data wrangling libs (eg: I could use it right now in jsontableschema)
MessyTables has a cleaner internal API.
GoodTables is simpler in that it just returns raw cell values, and it always returns them as utf-8 encoded text (unicode on Py2, string on Py3).
Copying from email between Friedrich and myself, and adding a few things, here is a high level overview of how I'd like a standalone lib to look:
Works on Python 2/3
Whatever the input (any encoding, any supported format), the output will be a csv-like iterable of data encoded as utf-8 text strings
Input formats:
CSV
Excel
JSON
ODS
Google Spreadsheet?
Want to work with text data as text streams - following Python 3 API preferences
Want to build libs (data processors) that handle such data around a common format: csv-like iterable (arrays of utf-8 encoded text strings) is good for this - this is how GT works internally
Probably an option to get rows as arrays or as dicts
Takes an argument to cast values on iteration, based on both JSON Table Schema and JSON Schema
Obviously some input formats, like JSON, may already be typed. In this case "casting" might do something like raise a MismatchedTypeError
Using existing code as a base, we should be able to get something up pretty easily I think.
The text was updated successfully, but these errors were encountered:
Closing the goals issue now. We're doing an implementation in #2 and we'll see where it goes from there, in terms of MessyTables2/GoodTables integration.
@rgrp @pudo @danfowler
Based on email discussion with Friedrich last month, it would be really useful to implement a generic tabular data reader library.
jsontableschema
)MessyTables has a cleaner internal API.
GoodTables is simpler in that it just returns raw cell values, and it always returns them as utf-8 encoded text (unicode on Py2, string on Py3).
Copying from email between Friedrich and myself, and adding a few things, here is a high level overview of how I'd like a standalone lib to look:
MismatchedTypeError
Using existing code as a base, we should be able to get something up pretty easily I think.
The text was updated successfully, but these errors were encountered: