[FEATURE] Expanding Fugue API #384

goodwanghan · 2022-11-13T19:41:46Z

Currently, transform and out_transform are the only two utility function of Fugue. They are used extensively by users. However, if people want to do other data operations such as load, join and union, they have to use FugueWorkflow (or not using Fugue). So we should expand the collection of functions to make more operations scale agnostic and framework agnostic.

Here are the design goals of these functions:

Each function can be used independently and can directly operate on different dataframes with the consistent behaviors. For example inner_join can directly take Spark dataframes as the input and output a Spark DataFrame.
Each function can choose its own ExecutionEngine, and by default, the should use the engine in the current context (the concept of 'current context` is to be implemented)
These functions should not prevent using framework specific methods between them.
Using only the utility functions for representing a data workflow should make it framework agnostic.

For example

import fugue.utils as fu

def my_logic(input1, intput2):
    df1 = fu.load(input1)
    df2 = fu.load(input2)
    df3 = fu.inner_join(df1, df2)
    return fu.transform(df3, my_func)

# unit test
res = my_logic1(pandas_df1, pandas_df2)
assert_pd_df_eq(res, ...)

# using different engines
with make_spark_engine():
    spark_res_df = my_logic(spark_df1, "s3://..parquet")

The text was updated successfully, but these errors were encountered:

goodwanghan added core feature fugueless enhancement New feature or request labels Nov 13, 2022

goodwanghan linked a pull request Dec 18, 2022 that will close this issue

Add fugue API #396

Merged

goodwanghan changed the title ~~[FEATURE] Expanding utility functions~~ [FEATURE] Expanding Fugue API Dec 28, 2022

goodwanghan added behavior change high priority labels Dec 28, 2022

goodwanghan closed this as completed in #396 Dec 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Expanding Fugue API #384

[FEATURE] Expanding Fugue API #384

goodwanghan commented Nov 13, 2022

[FEATURE] Expanding Fugue API #384

[FEATURE] Expanding Fugue API #384

Comments

goodwanghan commented Nov 13, 2022