[Dependencies] List fugue as optional dependency #631

sankarsiva123 · 2023-09-07T07:16:11Z

Description

I'm an actively using this package, I've noticed that the current list of dependencies includes the libraries Plotly and Fugue. While these libraries are undoubtedly powerful and useful, I believe it would be beneficial to make them optional dependencies rather than mandatory ones.

Use case

No response

jmoralez · 2023-09-07T17:20:56Z

Hey @sankarsiva123, thanks for using statsforecast. plotly was moved to be an optional dependency in #600, so I'll rename this to reference fugue instead.

I believe this issue is similar to #596 and we're working on reducing the total dependencies' size, hopefully it will be done for the next release.

sankara-nps · 2023-09-08T13:27:35Z

@jmoralez Thanks for the reply,
Even matplotlib made as optional dependency would nice .

jmoralez · 2023-09-08T15:37:17Z

Yes, we're going to make the package as lightweight as possible. We've moved the plotting to utilsforecast, which will be a dependency for this library but that only depends on numpy and pandas (matplotlib, polars, etc will be extras).

vspinu · 2023-09-14T12:34:01Z

We are no longer able to use statsforecast v1.5 on our spark cluster due to the dependency conflicts of the fugue dependencies. They require antlr4 higher than that what spark on java side is using. And that interferes with our cluster setup.

Given that fugue is still a pretty young project and doesn't seem to be required for the core functionality of statsforecast, adding it as a hard dependency seems a bit premature. Could we hope for a relatively quick "fix" on this one? Many thanks!!!

jmoralez · 2023-09-14T23:51:12Z

We're working to keep only the minimum required dependencies in the next release. @kvnkho do you have any suggestions for @vspinu's problem?

kvnkho · 2023-09-15T01:19:56Z

Hi @vspinu , it would require quite a bit of refactoring and would actually be a breaking change for many users as it breaks the core interface (will explain). The best we can do for you is lower the dependency on fugue-sql-antlr, which we'll do sometime. I talked to @goodwanghan . The rest of this comment is just more detail.

The reason Fugue is a core dependency is because it allows this syntax:

sf = StatsForecast(
    models=[AutoETS(season_length=7)],
    freq='D',
)
sf.forecast(df=df, h=horizon)

to work off the shelf with any DataFrame (Spark, Dask, Ray, Pandas). To support each of those without Fugue, there would have to be significant forking in the codebase. Think if-else and try statements and imports in random places. Fugue consolidates that. Note that removing Fugue would be the breaking change as a lot of people are already using this syntax on top of Spark specifically.

A bit of history, Fugue was made into an optional dependency at first. In order to parallelize Statsforecast, users would use code like this:

from statsforecast.distributed.utils import forecast
from statsforecast.distributed.fugue import FugueBackend

spark = SparkSession.builder.getOrCreate()
backend = FugueBackend(spark, {"fugue.spark.use_pandas_udf":True})

forecast(df, 
         [AutoARIMA()], 
         freq="D", 
         h=7, 
         parallel=backend).toPandas()

This introduced a lot more code for end users to write, and then other users did not want to learn about Fugue. They also could not use the same base StatsForecast class, instead having to call this utility function. To consolidate public-facing classes and keep the same interface for distributed execution and local execution, then Fugue needed to become a core dependency to make the experience invisible and seamless.

On size, Fugue is around 10MB so it's pretty trivial for installation.

Fugue utility functions are also used in the core code right now for things like identifying the input DataFrame type, conditional importing, and configuring the parallel backends. While not impossible to fork and create Fugue-specific code paths, it decreases code maintainability and quality and I am not sure it's worth it here.

In the Fugue issue, you mentioned you're aware it's a corner case because you guys actually use ANTLR on Python, which is quite uncommon. Given there are already a lot of statsforecast users using Spark with the base StatsForecast class, it's hard to do what would be a breaking change to support an edge case.

Thus, the best resolution for this is to lower the dependency on ANTLR.

goodwanghan · 2023-09-15T01:56:59Z

@vspinu @jmoralez

I just want to reiterate some of the facts:

Fugue is the core part of statsforecast to make the lib run seamlessly on different distributed environment
Antler dependency is planned to be removed from Fugue's core dependency on 0.9.0
This is an edge case only when you want consistency between Spark's Java legacy antlr requirement and Fugue's python requirement. Fugue + Spark is a very common combination, no other similar complaints so far.

Here is the resolution: fugue-project/fugue-sql-antlr#19

We will temporarily remove the lower constraint of antlr, and make this requirement into runtime. So as long as you don't use Fugue SQL, you have more flexibility on antlr version.

In fugue>=0.9.0, the core will only depend on pandas and arrow.

goodwanghan · 2023-09-15T17:17:02Z

@sankarsiva123 @jmoralez In a few weeks, we will release fugue 0.9.0 whose core dependency (out of the fugue-project) is only pandas and pyarrow. Fugue integration is supposed to handle all dataframes other than pandas in a seamless and non-invasive way, with Fugue, statsforecast will expand the list of supported dataframes without any code change, that is why it is in the core dependency. In addition, the size of fugue project packages are very small and totally pythonic.

mergenthaler · 2023-09-15T17:23:49Z

Thanks, @goodwanghan and @kvnkho. I also believe that Fugue is an essential aspect of this library and the Nixtlaverse. I think the 0.9.0 Fugue release will fully address the issues raised by the other contributors and make the dependencies of statsforecast even easier.

vspinu · 2023-09-15T20:18:35Z

Thanks @goodwanghan and @kvnkho for your impressive efforts and extensive explanation! Looking forward to 0.9.0.

goodwanghan · 2023-09-15T21:08:44Z

Thanks @goodwanghan and @kvnkho for your impressive efforts and extensive explanation! Looking forward to 0.9.0.

@vspinu we have released the new version of fugue-sql-antlr yesterday, if you upgrade it to 0.1.7 the conflict should have been resolved, can you try?

vspinu · 2023-09-16T09:50:39Z

@goodwanghan It takes a few days for packages to propagate to our internal mirrors for security reasons. I will report back to the original issue once I verify it. I don't see why it shouldn't work though.

goodwanghan · 2023-10-05T07:27:10Z

@vspinu have you verified whether the new package resolves the conflicts?

vspinu · 2023-10-18T10:10:22Z

Sorry for the late reply. It takes 2 weeks for us to propagate the package and then I got derailed to another project. The issue with antlr dependency still persists. I have provided more detail in the original.

sankarsiva123 added enhancement feature labels Sep 7, 2023

jmoralez changed the title ~~Would be nice if we were able to install Satsforecast pakage with plotly as optional dependency~~ [Dependencies] List fugue as optional dependency Sep 7, 2023

jmoralez added the dependencies label Sep 8, 2023

goodwanghan mentioned this issue Sep 15, 2023

Remove lower constraint of antlr from package requirement fugue-project/fugue-sql-antlr#19

Merged

jmoralez mentioned this issue Sep 20, 2023

[Models] Return and store models parameters during forecast and CV #639

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dependencies] List fugue as optional dependency #631

[Dependencies] List fugue as optional dependency #631

sankarsiva123 commented Sep 7, 2023

jmoralez commented Sep 7, 2023

sankara-nps commented Sep 8, 2023

jmoralez commented Sep 8, 2023

vspinu commented Sep 14, 2023 •

edited

Loading

jmoralez commented Sep 14, 2023

kvnkho commented Sep 15, 2023 •

edited

Loading

goodwanghan commented Sep 15, 2023 •

edited

Loading

goodwanghan commented Sep 15, 2023

mergenthaler commented Sep 15, 2023

vspinu commented Sep 15, 2023

goodwanghan commented Sep 15, 2023

vspinu commented Sep 16, 2023

goodwanghan commented Oct 5, 2023

vspinu commented Oct 18, 2023

[Dependencies] List fugue as optional dependency #631

[Dependencies] List fugue as optional dependency #631

Comments

sankarsiva123 commented Sep 7, 2023

Description

Use case

jmoralez commented Sep 7, 2023

sankara-nps commented Sep 8, 2023

jmoralez commented Sep 8, 2023

vspinu commented Sep 14, 2023 • edited Loading

jmoralez commented Sep 14, 2023

kvnkho commented Sep 15, 2023 • edited Loading

goodwanghan commented Sep 15, 2023 • edited Loading

goodwanghan commented Sep 15, 2023

mergenthaler commented Sep 15, 2023

vspinu commented Sep 15, 2023

goodwanghan commented Sep 15, 2023

vspinu commented Sep 16, 2023

goodwanghan commented Oct 5, 2023

vspinu commented Oct 18, 2023

vspinu commented Sep 14, 2023 •

edited

Loading

kvnkho commented Sep 15, 2023 •

edited

Loading

goodwanghan commented Sep 15, 2023 •

edited

Loading