Performance Benchmarking and need for optimization for large input dataset. #15

evermanisha · 2015-08-25T07:15:27Z

Have been using https://github.com/decebals/wicket-pivot for a year now.

There occured a scenario where the processing of 27 thousand + records takes close to 90% of JVM memory during execution and about 3 to 4 min for entire computation for below methods

calculate() method under
DefaultPivotModel.java
create() under
PivotTableRenderModel.java.

Is there anyway to optimize the code,to enhance the over all performance.
Link of the Pivot table resulted with large data set for reference below.
https://drive.google.com/open?id=0BxpTzw5qlCqbVGFOUmZmNkNta1E

Also provide, if any benchmarking done,on the number of records/number combinations of row column fields supported with respect to system resource available.

Looking forward for any helpful feedback.

rototor · 2015-08-25T08:05:34Z

@evermanisha You are using the ResultSetPivotDataSource to load your data, aren`t you? This will load all data into the memory.

I limit the amount of records processed to 10000 in my application, because currently wicket-pivot aggregates all data in memory. For my application this is no problem, because the user can choose to pre-aggregate the data. (He can choose to aggregate the data on timestamp-granularity (i.e. days, weeks, months, years). If he reaches the limit of 10000 rows, he gets a warning, and just needs to choose a bigger granularity.

If you implement PivotDataSource yourself, you could on demand fetch the needed records. I.e. using a scrollable database cursor. This will be slow, but it will not eat up your memory. Also it would be important to setup a sensible default configuration on the pivot table. I.e. you should setup a default configuration which would aggregate everything to very few rows. Otherwise you heap will explode because of the thousand of Wicket row/cell elements, which are created for the browser.

But to be honest, if you really need to process so many rows, you shouldn't use wicket-pivot. You need to aggregate everything in the database, because the database can handle this amount of data well. The way wicket-pivot works at the moment doesn't. And it may even be to much data to display in the browser at once. (I am looking at you Internet Explorer...). Depending on which browser you need to support, you may need to stream the data as json to the browser and render it on a canvas using JavaScript. This scales very well and works without performance problems for thousand of rows even in Internet Explorer 9.

decebals · 2015-08-25T08:46:51Z

ResultSetPivotDataSource keeps all data in memory. In other project I wrote an implementation of PivotDataSource that keep data in Orientdb. So, I can say that we can resolve this aspect.
The problem with the template engine (wicket) it's another story. I will look to see if we can do some improvements.
I downloaded test.html posted by you on google drive and I saw that this file has around 6 MB in size; I think that it's too much for a html page. As @rototor says, you must aggregate/filter more aggressive your pivot output data. Six MB are too many data to display and in my opinion it's a little unusable (you can not read entirely a document with this size, you must do some filtering to get something useful).

evermanisha · 2015-08-26T06:20:27Z

Thanks for the input Decebel and Emmeran .Will consider the "on demand fetch" option that would enable the processing in chunks.

Another question is, the use of Multikey Map for creating a mapping of row,column keys with the data.
Is it memory efficient?

Can
Guava Table:
http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/Table.html
also be an option,

as mentioned in one of the discussion :
http://stackoverflow.com/questions/15165293/efficiency-of-guava-table-vs-multiple-hash-maps
Please share your views.

decebals · 2015-08-28T12:07:42Z

@evermanisha The sensible part (from the memory point of view) of ResultSetPivotDataSource is the data field (https://github.com/decebals/wicket-pivot/blob/master/wicket-pivot/src/main/java/ro/fortsoft/wicket/pivot/ResultSetPivotDataSource.java#L28). The current implementation of RSPDS is like an offline cache (all records in memory). As @rototor says you can extends this class or implements another PivotDataSource (using a scrollable database cursor or OrientDB or MapDB).

We are using MultiKeyMap from apache commons collections in DefaultPivotModel. I think that DefaultPivotModel class can be a big memory eater (your test tells us this).

In conclusion the current default implementation of WicketPivot is all data in memory but we can supply extension points to come with other implementation for PivotDataModel, PivotModel

We are open to contributions in this direction and we are happy to help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Benchmarking and need for optimization for large input dataset. #15

Performance Benchmarking and need for optimization for large input dataset. #15

evermanisha commented Aug 25, 2015

rototor commented Aug 25, 2015

decebals commented Aug 25, 2015

evermanisha commented Aug 26, 2015

decebals commented Aug 28, 2015

Performance Benchmarking and need for optimization for large input dataset. #15

Performance Benchmarking and need for optimization for large input dataset. #15

Comments

evermanisha commented Aug 25, 2015

rototor commented Aug 25, 2015

decebals commented Aug 25, 2015

evermanisha commented Aug 26, 2015

decebals commented Aug 28, 2015