-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Dependencies] List fugue as optional dependency #631
Comments
Hey @sankarsiva123, thanks for using statsforecast. plotly was moved to be an optional dependency in #600, so I'll rename this to reference fugue instead. I believe this issue is similar to #596 and we're working on reducing the total dependencies' size, hopefully it will be done for the next release. |
@jmoralez Thanks for the reply, |
Yes, we're going to make the package as lightweight as possible. We've moved the plotting to utilsforecast, which will be a dependency for this library but that only depends on numpy and pandas (matplotlib, polars, etc will be extras). |
We are no longer able to use statsforecast v1.5 on our spark cluster due to the dependency conflicts of the fugue dependencies. They require antlr4 higher than that what spark on java side is using. And that interferes with our cluster setup. Given that fugue is still a pretty young project and doesn't seem to be required for the core functionality of statsforecast, adding it as a hard dependency seems a bit premature. Could we hope for a relatively quick "fix" on this one? Many thanks!!! |
Hi @vspinu , it would require quite a bit of refactoring and would actually be a breaking change for many users as it breaks the core interface (will explain). The best we can do for you is lower the dependency on The reason Fugue is a core dependency is because it allows this syntax: sf = StatsForecast(
models=[AutoETS(season_length=7)],
freq='D',
)
sf.forecast(df=df, h=horizon) to work off the shelf with any DataFrame (Spark, Dask, Ray, Pandas). To support each of those without Fugue, there would have to be significant forking in the codebase. Think if-else and try statements and imports in random places. Fugue consolidates that. Note that removing Fugue would be the breaking change as a lot of people are already using this syntax on top of Spark specifically. A bit of history, Fugue was made into an optional dependency at first. In order to parallelize Statsforecast, users would use code like this: from statsforecast.distributed.utils import forecast
from statsforecast.distributed.fugue import FugueBackend
spark = SparkSession.builder.getOrCreate()
backend = FugueBackend(spark, {"fugue.spark.use_pandas_udf":True})
forecast(df,
[AutoARIMA()],
freq="D",
h=7,
parallel=backend).toPandas() This introduced a lot more code for end users to write, and then other users did not want to learn about Fugue. They also could not use the same base StatsForecast class, instead having to call this utility function. To consolidate public-facing classes and keep the same interface for distributed execution and local execution, then Fugue needed to become a core dependency to make the experience invisible and seamless. On size, Fugue is around 10MB so it's pretty trivial for installation. Fugue utility functions are also used in the core code right now for things like identifying the input DataFrame type, conditional importing, and configuring the parallel backends. While not impossible to fork and create Fugue-specific code paths, it decreases code maintainability and quality and I am not sure it's worth it here. In the Fugue issue, you mentioned you're aware it's a corner case because you guys actually use ANTLR on Python, which is quite uncommon. Given there are already a lot of Thus, the best resolution for this is to lower the dependency on ANTLR. |
I just want to reiterate some of the facts:
Here is the resolution: fugue-project/fugue-sql-antlr#19 We will temporarily remove the lower constraint of antlr, and make this requirement into runtime. So as long as you don't use Fugue SQL, you have more flexibility on antlr version. In fugue>=0.9.0, the core will only depend on pandas and arrow. |
@sankarsiva123 @jmoralez In a few weeks, we will release fugue 0.9.0 whose core dependency (out of the fugue-project) is only pandas and pyarrow. Fugue integration is supposed to handle all dataframes other than pandas in a seamless and non-invasive way, with Fugue, statsforecast will expand the list of supported dataframes without any code change, that is why it is in the core dependency. In addition, the size of fugue project packages are very small and totally pythonic. |
Thanks, @goodwanghan and @kvnkho. I also believe that Fugue is an essential aspect of this library and the Nixtlaverse. I think the 0.9.0 Fugue release will fully address the issues raised by the other contributors and make the dependencies of statsforecast even easier. |
Thanks @goodwanghan and @kvnkho for your impressive efforts and extensive explanation! Looking forward to 0.9.0. |
@vspinu we have released the new version of fugue-sql-antlr yesterday, if you upgrade it to 0.1.7 the conflict should have been resolved, can you try? |
@goodwanghan It takes a few days for packages to propagate to our internal mirrors for security reasons. I will report back to the original issue once I verify it. I don't see why it shouldn't work though. |
@vspinu have you verified whether the new package resolves the conflicts? |
Sorry for the late reply. It takes 2 weeks for us to propagate the package and then I got derailed to another project. The issue with antlr dependency still persists. I have provided more detail in the original. |
Description
I'm an actively using this package, I've noticed that the current list of dependencies includes the libraries Plotly and Fugue. While these libraries are undoubtedly powerful and useful, I believe it would be beneficial to make them optional dependencies rather than mandatory ones.
Use case
No response
The text was updated successfully, but these errors were encountered: