Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance difference between write_async and write_dataframe_async #953

Open
echangcl opened this issue Aug 4, 2023 · 5 comments
Open
Labels

Comments

@echangcl
Copy link

echangcl commented Aug 4, 2023

Describe what did you try to do with TM1py
I tried to compare the performance of TI execution / write_dataframe / write_async / write_dataframe_async in TM1py.
Use a huge source file, which has more than 25 million records, to do the testing.
Found that write_async performance is much better than write_dataframe_async. is it normal?
image004 copy

Describe what's not working the way you expect
The only difference between testing of write_async and write_dataframe_async:

  • convert dataframe to dictionary before using write_async, which takes about 1 minute.
    So I expect write_dataframe_async should be faster than write_async. But it's not.

Version

  • TM1py : 1.11.3
  • TM1 Server Version: 11.5.00000.23
@MariusWirtz
Copy link
Collaborator

MariusWirtz commented Aug 4, 2023

I can't reproduce this behaviour in a dataset of 4 million or 8 million records.
Can you please check if the system is running low on memory when the write_dataframe_async function runs?
It might be less memory efficient than the write_async function.

@echangcl
Copy link
Author

echangcl commented Aug 7, 2023

I'm running TM1 on VM, with PyCharm on Host.
Tried to test again with another VM, which has enough resource (CPU / Memory / Disk space) during TM1py runs.
Memory peak usage is less than 45%. Disk space still have more than 10GB (25%).
Testing result is still the same, write_dataframe_async is slower than write_async.
image

@MariusWirtz
Copy link
Collaborator

Hi @echangcl,
Thanks for the analysis. I will try to reproduce it.

@MariusWirtz
Copy link
Collaborator

@echangcl
Can you please rerun the analysis with pandas 2 vs pandas < 2.

pip install pandas==1.5.3

pip install pandas --upgrade

@echangcl
Copy link
Author

@echangcl Can you please rerun the analysis with pandas 2 vs pandas < 2.

pip install pandas==1.5.3

pip install pandas --upgrade

Tried to use both pandas=1.5.3 & 2.0.3, still the same result. Write_dataframe_async is slower than write_async

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants