Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Expanding Fugue API #384

Closed
goodwanghan opened this issue Nov 13, 2022 · 0 comments · Fixed by #396
Closed

[FEATURE] Expanding Fugue API #384

goodwanghan opened this issue Nov 13, 2022 · 0 comments · Fixed by #396

Comments

@goodwanghan
Copy link
Collaborator

Currently, transform and out_transform are the only two utility function of Fugue. They are used extensively by users. However, if people want to do other data operations such as load, join and union, they have to use FugueWorkflow (or not using Fugue). So we should expand the collection of functions to make more operations scale agnostic and framework agnostic.

Here are the design goals of these functions:

  1. Each function can be used independently and can directly operate on different dataframes with the consistent behaviors. For example inner_join can directly take Spark dataframes as the input and output a Spark DataFrame.
  2. Each function can choose its own ExecutionEngine, and by default, the should use the engine in the current context (the concept of 'current context` is to be implemented)
  3. These functions should not prevent using framework specific methods between them.
  4. Using only the utility functions for representing a data workflow should make it framework agnostic.

For example

import fugue.utils as fu

def my_logic(input1, intput2):
    df1 = fu.load(input1)
    df2 = fu.load(input2)
    df3 = fu.inner_join(df1, df2)
    return fu.transform(df3, my_func)

# unit test
res = my_logic1(pandas_df1, pandas_df2)
assert_pd_df_eq(res, ...)

# using different engines
with make_spark_engine():
    spark_res_df = my_logic(spark_df1, "s3://..parquet")
@goodwanghan goodwanghan linked a pull request Dec 18, 2022 that will close this issue
@goodwanghan goodwanghan changed the title [FEATURE] Expanding utility functions [FEATURE] Expanding Fugue API Dec 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant