Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV adapter difficult to use and not very flexible #259

Open
timkpaine opened this issue May 29, 2024 · 1 comment
Open

CSV adapter difficult to use and not very flexible #259

timkpaine opened this issue May 29, 2024 · 1 comment
Labels
adapter: general Issues and PRs related to input/output adapters in general good first issue Good issue for first-time contributors type: enhancement Issues and PRs related to improvements to existing features

Comments

@timkpaine
Copy link
Member

Discovered during the hackathon, the current CSV adapter is not great. It was difficult to map multiple columns as datetimes, it basically presupposes a "symbol column", and it doesnt allow for returning just e.g. a dict of values in the row. Here is a naive alternative built for the hackathon to read Citibike historical CSV data:

import csv as pycsv
from datetime import datetime

from csp import ts
from csp.impl.pulladapter import PullInputAdapter
from csp.impl.wiring import py_pull_adapter_def


class CSVAdapterImpl(PullInputAdapter):
    def __init__(self, filename: str, datetime_columns: list = None):
        if not datetime_columns:
            raise Exception("Must provide at least one datetime column")
        self._filename = filename
        self._datetime_columns = datetime_columns
        self._csv_reader = None
        self._first_row = None
        super().__init__()

    def start(self, starttime, endtime):
        super().start(starttime, endtime)
        self._csv_reader = pycsv.DictReader(open(self._filename, "r"))

        # fast forward to first record
        while True:
            try:
                row = next(self._csv_reader)
                time = datetime.strptime(
                    row[self._datetime_columns[0]], "%Y-%m-%d %H:%M:%S"
                )

                if time < starttime:
                    continue

                for dtc in self._datetime_columns:
                    row[dtc] = datetime.strptime(row[dtc], "%Y-%m-%d %H:%M:%S")
                self._first_row = row
                break

            except StopIteration:
                return

    def stop(self):
        self._csv_reader = None

    def next(self):
        if self._first_row is not None:
            ret = self._first_row[self._datetime_columns[0]], self._first_row
            self._first_row = None
        try:
            row = next(self._csv_reader)
            time = datetime.strptime(
                row[self._datetime_columns[0]], "%Y-%m-%d %H:%M:%S"
            )
            for dtc in self._datetime_columns:
                row[dtc] = datetime.strptime(row[dtc], "%Y-%m-%d %H:%M:%S")
            return time, row
        except StopIteration:
            return None


CSVAdapter = py_pull_adapter_def(
    "CSVAdapter", CSVAdapterImpl, ts[dict], filename=str, datetime_columns=list
)
@timkpaine timkpaine added type: enhancement Issues and PRs related to improvements to existing features good first issue Good issue for first-time contributors adapter: general Issues and PRs related to input/output adapters in general labels May 29, 2024
@robambalu
Copy link
Collaborator

CSVReader should be completely re-implemented with an efficient c++ impl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adapter: general Issues and PRs related to input/output adapters in general good first issue Good issue for first-time contributors type: enhancement Issues and PRs related to improvements to existing features
Projects
None yet
Development

No branches or pull requests

2 participants