Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation For Data Format #25

Open
zacharycbrown opened this issue Mar 21, 2022 · 2 comments
Open

Documentation For Data Format #25

zacharycbrown opened this issue Mar 21, 2022 · 2 comments

Comments

@zacharycbrown
Copy link

zacharycbrown commented Mar 21, 2022

Dear Dr. Patrick Gelß,

Thank you for making this repository available!

How might one apply the tgEDMD algorithm to datasets with multiple simulations?

More specifically, I have a dataset comprised of S simulations of multi-dimensional time series (each with shape [d,m], if I understand this repository's naming convention correctly); that is, my dataset is of shape (S, d, m). Based on what I've been able to find in this repository, the amuset_hosvd method requires the input data to be of shape (d,m); does this mean I need to either flatten my dataset to wrap the S dimension into one of the others or instead run the amuset_hosvd method once for each simulation?

I am hoping to use the tgEDMD algorithm to simultaneously model the information gleaned across all simulations if at all possible, so any guidance towards that end would be greatly appreciated!

Thank you!

@zacharycbrown
Copy link
Author

  • Just to clarify, I looked at how the ala10 data was being processed, and I noticed that the example script was concatenating about 6 different simulation files together - is it alright if I ask whether those were independent simulations or if they were simply sequential windows of the same simulation recording?

@PGelss
Copy link
Owner

PGelss commented Mar 22, 2022

Dear Zachary,

Thanks for your interest in the Scikit-TT toolbox.

You are absolutely correct. If you want to apply tgEDMD to all snapshots at once, you first have to reshape your data tensor. Suppose X is your dataset of shape (S,d,m), where S is the number of simulations, d the dimension of the state space, and m the number of snapshots per simulation, then you can use X.transpose([1, 0, 2]).reshape([d, S*m]) as input for amuset_hosvd.

The dataset for ala10 consists of 6 independent simulations which can be simply concatenated as we do not consider any correlations between different time steps when applying tgEDMD.

I hope I could help you. Let me know, if you have further questions.

Best,
Patrick

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants