Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make option to use different or limited or parts of data sets #4

Open
tmaiaroto opened this issue Nov 20, 2014 · 5 comments
Open

Make option to use different or limited or parts of data sets #4

tmaiaroto opened this issue Nov 20, 2014 · 5 comments

Comments

@tmaiaroto
Copy link
Member

The data files were put into a slice so that they could easily be configured and processed. Well, unfortunately they weren't consistent in format. So it kinda doesn't make sense to keep them like that, but it also kinda does. Especially if more will be added in the future.

This will need to be re-addressed and the more pressing issue is that there's quite a bit of memory usage for both sets as is. It would be nice to choose which sets are used because it can sacrifice accuracy for speed.

The Geonames set is far smaller and great for larger cities. The MaxMind set contains a LOT of data, but it may not necessarily be required for certain apps. It would be nice to allow the application to decide.

It might also be nice to allow certain cities to be included from the MaxMind set. For example, any with a population. Or cities from particular countries. So an option to limit the amount of data stored in memory would be great.

@tmaiaroto
Copy link
Member Author

About 655mb of memory allocated to do the initial load. Then each lookup allocates about 1.4mb of memory. So I'd like to reduce the memory needs so that this package works on smaller virtual servers.

@tmaiaroto
Copy link
Member Author

Apparently some of the records from MaxMind don't have lat/lng values set (they come out to 0 when parsed). So removing those has reduced the data set from 2,771,454 to 1,968,549 records.

This has reduced the memory allocation to about 496mb to load the set and 0.58mb per lookup (which are also now faster). Getting closer to running on a 512MB RAM VPS!

Removing the index on the first two characters saved a little more. 451mb to load the set now. 0.56mb per lookup (not expected to change in this case). That index wasn't being used, but could be to further increase lookup speed.

@tmaiaroto
Copy link
Member Author

2,008,788 records now. The previous limitation was too aggressive. Still, that's 478mb to load into memory and 0.56mb per lookup (0.005s).

@tooolbox
Copy link

Is this memory growth bounded? I noticed that the first time I loaded geobed, it was using 2.5GB of memory (!!!!!) but on successive loads it was ~500mb and went up to ~650mb as I ran lookups. Will it eventually hit 2.5GB?

@tmaiaroto
Copy link
Member Author

It does require a good bit of memory unfortunately. I thought it was bit less than 2.5GB though... Hmm.. I wanted to look into memory mapped files to reduce this. I was thinking about BoltDB at some point too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants