Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do you notice the phenomenon of memory leak in the code of FSNet #3

Open
LiuYasuo opened this issue Mar 25, 2024 · 3 comments
Open

Comments

@LiuYasuo
Copy link

LiuYasuo commented Mar 25, 2024

I've fully experimented with their code taking up CPU memory issues. It is evident that the CPU memory decreases gradually when the program is running. Fortunately, their process will not be killed because the datasets in their experiment are too small.
However, I tested their code on a dataset with 100,000 entries. And the process was killed when we got to one tenth of the way through the test because of memory leak.
Long data series are not uncommon in the real world of online learning, and online learning is geared towards applications in the display world. Therefore, we are supposed to take this issue seriously.
I have asked the author of the FSNet but got no reply. Considering the in-depth research you have conducted on time series online learning, I hope you could discuss this problem with me.
Looking forward to your reply.

@LiuYasuo LiuYasuo changed the title DO Do you notice the phenomenon of memory leak in the code of FSNet Mar 25, 2024
@yfzhang114
Copy link
Owner

Regarding your concern about the dataset size and memory management, we acknowledge that larger datasets can pose significant challenges in terms of memory utilization. While we have successfully tested our code on datasets with high numbers of time steps and channels, such as the ECL dataset with over 100k time steps and 300+ channels, we recognize that each dataset may have unique characteristics that could affect memory usage differently.

In response to your query about the specific details of the dataset you used, such as the number of channels and the meaning of "100,000 entries," we would appreciate more information to better understand the context of your testing. This will allow us to provide more targeted suggestions for optimizing memory usage and addressing potential memory leaks.

Here are some advice to process large datasets and avoid CPU memory issues:

  1. Batch Processing: Instead of loading the entire dataset into memory at once, process the data in smaller batches. This helps in reducing memory usage and prevents overwhelming the CPU.
  2. Data Compression: If applicable, consider compressing the dataset before loading it into memory. This can significantly reduce memory usage while still allowing for efficient processing.
  3. Data Preprocessing: Prior to loading the data into memory, perform preprocessing steps such as feature selection, normalization, or downsampling to reduce the size of the dataset without losing important information.
  4. Resource Monitoring: Continuously monitor CPU and memory usage during data processing to identify any abnormal spikes or patterns. This can help in detecting memory issues early on and taking appropriate measures to address them.

note that 4 is important, we need to know where the error occurred.

@LiuYasuo
Copy link
Author

Thanks for your reply. I realized I hadn't made the point clear. "100,000 entries" means our dataset has over 100k time steps. What's more, in industrial time series forecasting scenarios, we need to make multi-step iterative predictions, which would magnify the problems mentioned above indefinitely and the process would be killed eventually. As a matter of fact, if you have the time, I hope you could run the original code of FSnet again. And you can use the "top" command to check the memory usage of this process while it is running. Then you will find that the memory occupied by this process continues to increase.
Looking forward to your early reply.

@yfzhang114
Copy link
Owner

It is unreasonable, the ECL dataset also contains more than 100,000 time steps and a large number of channels, works well with the model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants