Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Benchmarking and need for optimization for large input dataset. #15

Open
evermanisha opened this issue Aug 25, 2015 · 4 comments

Comments

@evermanisha
Copy link

Have been using https://github.com/decebals/wicket-pivot for a year now.

There occured a scenario where the processing of 27 thousand + records takes close to 90% of JVM memory during execution and about 3 to 4 min for entire computation for below methods

  1. calculate() method under
    DefaultPivotModel.java
  2. create() under
    PivotTableRenderModel.java.

Is there anyway to optimize the code,to enhance the over all performance.
Link of the Pivot table resulted with large data set for reference below.
https://drive.google.com/open?id=0BxpTzw5qlCqbVGFOUmZmNkNta1E

Also provide, if any benchmarking done,on the number of records/number combinations of row column fields supported with respect to system resource available.

Looking forward for any helpful feedback.

@rototor
Copy link
Collaborator

rototor commented Aug 25, 2015

@evermanisha You are using the ResultSetPivotDataSource to load your data, aren`t you? This will load all data into the memory.

I limit the amount of records processed to 10000 in my application, because currently wicket-pivot aggregates all data in memory. For my application this is no problem, because the user can choose to pre-aggregate the data. (He can choose to aggregate the data on timestamp-granularity (i.e. days, weeks, months, years). If he reaches the limit of 10000 rows, he gets a warning, and just needs to choose a bigger granularity.

If you implement PivotDataSource yourself, you could on demand fetch the needed records. I.e. using a scrollable database cursor. This will be slow, but it will not eat up your memory. Also it would be important to setup a sensible default configuration on the pivot table. I.e. you should setup a default configuration which would aggregate everything to very few rows. Otherwise you heap will explode because of the thousand of Wicket row/cell elements, which are created for the browser.

But to be honest, if you really need to process so many rows, you shouldn't use wicket-pivot. You need to aggregate everything in the database, because the database can handle this amount of data well. The way wicket-pivot works at the moment doesn't. And it may even be to much data to display in the browser at once. (I am looking at you Internet Explorer...). Depending on which browser you need to support, you may need to stream the data as json to the browser and render it on a canvas using JavaScript. This scales very well and works without performance problems for thousand of rows even in Internet Explorer 9.

@decebals
Copy link
Owner

ResultSetPivotDataSource keeps all data in memory. In other project I wrote an implementation of PivotDataSource that keep data in Orientdb. So, I can say that we can resolve this aspect.
The problem with the template engine (wicket) it's another story. I will look to see if we can do some improvements.
I downloaded test.html posted by you on google drive and I saw that this file has around 6 MB in size; I think that it's too much for a html page. As @rototor says, you must aggregate/filter more aggressive your pivot output data. Six MB are too many data to display and in my opinion it's a little unusable (you can not read entirely a document with this size, you must do some filtering to get something useful).

@evermanisha
Copy link
Author

Thanks for the input Decebel and Emmeran .Will consider the "on demand fetch" option that would enable the processing in chunks.

Another question is, the use of Multikey Map for creating a mapping of row,column keys with the data.
Is it memory efficient?

Can
Guava Table:
http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/Table.html
also be an option,

as mentioned in one of the discussion :
http://stackoverflow.com/questions/15165293/efficiency-of-guava-table-vs-multiple-hash-maps
Please share your views.

@decebals
Copy link
Owner

@evermanisha The sensible part (from the memory point of view) of ResultSetPivotDataSource is the data field (https://github.com/decebals/wicket-pivot/blob/master/wicket-pivot/src/main/java/ro/fortsoft/wicket/pivot/ResultSetPivotDataSource.java#L28). The current implementation of RSPDS is like an offline cache (all records in memory). As @rototor says you can extends this class or implements another PivotDataSource (using a scrollable database cursor or OrientDB or MapDB).

We are using MultiKeyMap from apache commons collections in DefaultPivotModel. I think that DefaultPivotModel class can be a big memory eater (your test tells us this).

In conclusion the current default implementation of WicketPivot is all data in memory but we can supply extension points to come with other implementation for PivotDataModel, PivotModel

We are open to contributions in this direction and we are happy to help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants