Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[I/O] Enable using CrateDB-specific DDL statements on ctk load table interface #149

Open
amotl opened this issue May 20, 2024 · 6 comments

Comments

@amotl
Copy link
Member

amotl commented May 20, 2024

About

@wierdvanderhaar suggested to improve the ctk load table interface such that the target table can be created with CrateDB-specific features like partitioned table, and friends. Thanks.

@amotl
Copy link
Member Author

amotl commented Jun 11, 2024

Currently, it is not even possible to create the table manually using specific parameters, because the process is dropping the table and recreating it with standard options, see GH-148. That's probably the worst approach ;].

@amotl
Copy link
Member Author

amotl commented Jun 11, 2024

It may make sense to have a dedicated subcommand aside to ctk load table, in order to conceive a little subsystem specialized for forwarding DDL to the database. On the other hand, maybe let's just use plain SQL DDL and slap it in front of a ctk load table call, using ctk shell or just crash.

@amotl
Copy link
Member Author

amotl commented Jun 23, 2024

On the other hand, maybe let's just use plain SQL DDL and slap it in front of a ctk load table call, using ctk shell or just crash.

While that will give more powers to the end user, in order to better use CrateDB's special features, it would totally deter the conveniency of automatic schema propagation. Maybe we should go the extra mile of being able to supply proprietary SQL DDL parameters to the application?

However, this approach would still need some kind of special patches to pandas' to_sql(), because otherwise, well, it would still generate a standard SQL DDL statement. Another approach would be to capture the standard SQL DDL statement, and relay it to the user to be extended manually. Then, she would feed it back to the ETL job by running it just once.

@amotl
Copy link
Member Author

amotl commented Jun 23, 2024

[...] would still need some kind of special patches to pandas' to_sql(), because otherwise, well, it would still generate a standard SQL DDL statement.

The question is how to bring that together and make it work on behalf of SQLAlchemy Core (not ORM), because in this mode, there just isn't any ORM model where __table_args__ can be slapped onto.

@amotl
Copy link
Member Author

amotl commented Jun 23, 2024

At this spot, in pandas.io.sql.SQLTable._create_table_setup, the call to Table(self.name, meta, *columns, schema=schema) would need to convey additional table arguments to SQLAlchemy's sqlalchemy.sql.schema.Table implementation through **kw, as dialect-specific keyword args.

:param \**kw: Additional keyword arguments not mentioned above are
    dialect specific, and passed in the form ``<dialectname>_<argname>``.
    See the documentation regarding an individual dialect at
    :ref:`dialect_toplevel` for detail on documented arguments.

For example, defining a PARTITIONED BY ("time") clause would work like that:

Table(self.name, meta, *columns, schema=schema, crate_partitioned_by="time")

@amotl
Copy link
Member Author

amotl commented Jun 24, 2024

There are patches now, which aim to improve the situation.

SQLAlchemy Dialect

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant