Make option to use different or limited or parts of data sets #4

tmaiaroto · 2014-11-20T21:59:56Z

The data files were put into a slice so that they could easily be configured and processed. Well, unfortunately they weren't consistent in format. So it kinda doesn't make sense to keep them like that, but it also kinda does. Especially if more will be added in the future.

This will need to be re-addressed and the more pressing issue is that there's quite a bit of memory usage for both sets as is. It would be nice to choose which sets are used because it can sacrifice accuracy for speed.

The Geonames set is far smaller and great for larger cities. The MaxMind set contains a LOT of data, but it may not necessarily be required for certain apps. It would be nice to allow the application to decide.

It might also be nice to allow certain cities to be included from the MaxMind set. For example, any with a population. Or cities from particular countries. So an option to limit the amount of data stored in memory would be great.

tmaiaroto · 2014-11-26T00:16:34Z

About 655mb of memory allocated to do the initial load. Then each lookup allocates about 1.4mb of memory. So I'd like to reduce the memory needs so that this package works on smaller virtual servers.

tmaiaroto · 2014-12-02T16:40:52Z

Apparently some of the records from MaxMind don't have lat/lng values set (they come out to 0 when parsed). So removing those has reduced the data set from 2,771,454 to 1,968,549 records.

This has reduced the memory allocation to about 496mb to load the set and 0.58mb per lookup (which are also now faster). Getting closer to running on a 512MB RAM VPS!

Removing the index on the first two characters saved a little more. 451mb to load the set now. 0.56mb per lookup (not expected to change in this case). That index wasn't being used, but could be to further increase lookup speed.

tmaiaroto · 2014-12-03T03:11:05Z

2,008,788 records now. The previous limitation was too aggressive. Still, that's 478mb to load into memory and 0.56mb per lookup (0.005s).

tooolbox · 2016-08-21T21:54:59Z

Is this memory growth bounded? I noticed that the first time I loaded geobed, it was using 2.5GB of memory (!!!!!) but on successive loads it was ~500mb and went up to ~650mb as I ran lookups. Will it eventually hit 2.5GB?

tmaiaroto · 2016-08-22T04:47:20Z

It does require a good bit of memory unfortunately. I thought it was bit less than 2.5GB though... Hmm.. I wanted to look into memory mapped files to reduce this. I was thinking about BoltDB at some point too.

tmaiaroto added the enhancement label Nov 20, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make option to use different or limited or parts of data sets #4

Make option to use different or limited or parts of data sets #4

tmaiaroto commented Nov 20, 2014

tmaiaroto commented Nov 26, 2014

tmaiaroto commented Dec 2, 2014

tmaiaroto commented Dec 3, 2014

tooolbox commented Aug 21, 2016

tmaiaroto commented Aug 22, 2016

Make option to use different or limited or parts of data sets #4

Make option to use different or limited or parts of data sets #4

Comments

tmaiaroto commented Nov 20, 2014

tmaiaroto commented Nov 26, 2014

tmaiaroto commented Dec 2, 2014

tmaiaroto commented Dec 3, 2014

tooolbox commented Aug 21, 2016

tmaiaroto commented Aug 22, 2016