Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce memory consumption? #509

Open
missinglink opened this issue May 8, 2020 · 2 comments
Open

reduce memory consumption? #509

missinglink opened this issue May 8, 2020 · 2 comments

Comments

@missinglink
Copy link
Member

I was looking over the code today and noticed this block introduced in c500aaf which stores admin records in memory so they can later be used for computing hierarchies.

It seems as though the entire geojson record is being stored in RAM when we probably only need to store the wofRecord.hierarchies path which is used by the hierarchyFinder.js code.

@orangejulius I'm not that familiar, does that sound like a reasonable change to reduce RAM usage?

@missinglink missinglink added the bug label May 8, 2020
@orangejulius
Copy link
Member

orangejulius commented May 8, 2020

I don't think there are any major wins to be had here. The extractFields pipeline step runs a bit earlier, so it prunes out anything really not needed:

https://github.com/pelias/whosonfirst/blob/master/src/readStream.js#L56

Looking at one of the records as its stored in memory:

{
  id: 85633729,
  name: 'Puerto Rico',
  name_aliases: [
    'Puerto Rico',
    'Commonwealth of Puerto Rico',
    'Puerto Rican',
    'Puertorico'
  ],
  name_langs: {
    am: [ 'ፕዌርቶ ሪኮ' ],
    ar: [ 'بورتوريكو' ],
    ay: [ 'Burinkin' ],
    az: [ 'Puerto Riko' ],
    ba: [ 'Пуэрто-Рико' ],
    be: [ 'Пуэрта-Рыка' ],
    bn: [ 'পুয়ের্তো রিকো' ],
    bs: [ 'Porto Riko' ],
    bg: [ 'Пуерто Рико' ],
    cs: [ 'Portoriko' ],
    ce: [ 'Пуэрто-Рико' ],
    cv: [ 'Пуэрто-Рико' ],
    dv: [ 'ޕުއެރްތޮ ރީކޯ' ],
    el: [ 'Πουέρτο Ρίκο' ],
    en: [ 'Commonwealth of Puerto Rico', 'Puerto Rican', 'Puertorico' ],
    eo: [ 'Porto-Riko' ],
    fo: [ 'Puerto Riko' ],
    fa: [ 'پورتوریکو' ],
    fr: [ 'Porto Rico' ],
    fy: [ 'Puerto Riko' ],
    ga: [ 'Pórtó Ríce' ],
    gl: [ 'Porto Rico' ],
    gv: [ 'Yn Phurt Verçhagh' ],
    gu: [ 'પ્યુએર્ટો રિકો' ],
    ht: [ 'Pòtoriko' ],
    he: [ 'פוארטו ריקו' ],
    hi: [ 'प्युर्तो रिको', 'पोर्टो रीको' ],
    hr: [ 'Portoriko' ],
    hy: [ 'Պուերտո Ռիկո' ],
    io: [ 'Portuo Riko' ],
    ia: [ 'Porto Rico' ],
    id: [ 'Puerto Riko' ],
    is: [ 'Púertó Ríkó' ],
    it: [ 'Porto Rico', 'Portorico' ],
    jv: [ 'Puerto Riko' ],
    ja: [ 'プエルトリコ' ],
    kn: [ 'ಪೋರ್ಟೊ ರಿಕೊ' ],
    ka: [ 'პუერტო-რიკო' ],
    kk: [ 'Пуэрто-Рико' ],
    rw: [ 'Puwerito Riko' ],
    ky: [ 'Пуэрто-Рико' ],
    ko: [ '푸에르토리코' ],
    ku: [ 'Porto Rîko' ],
    la: [ 'Portus Dives' ],
    lv: [ 'Puertoriko' ],
    lt: [ 'Puerto Rikas' ],
    ml: [ 'പോർട്ടോ റിക്കോ' ],
    mr: [ 'पोर्तो रिको' ],
    mk: [ 'Порторико' ],
    mt: [ 'Porto Riku' ],
    ne: [ 'प्युर्तो रिको' ],
    os: [ 'Пуэрто-Рико' ],
    pa: [ 'ਪੁਏਰਤੋ ਰੀਕੋ' ],
    pl: [ 'Portoryko' ],
    pt: [ 'Porto Rico' ],
    qu: [ 'Burinkin' ],
    ru: [ 'Пуэрто-Рико' ],
    si: [ 'පුවටෝ රිකෝ' ],
    sk: [ 'Portoriko' ],
    sl: [ 'Portoriko' ],
    so: [ 'Buerto Riko' ],
    sq: [ 'Portorikoja' ],
    sr: [ 'Порторико' ],
    su: [ 'Puérto Riko' ],
    ta: [ 'புவேர்ட்டோ ரிக்கோ' ],
    tt: [ 'Пуэрто-Рико' ],
    te: [ 'ఫ్యూర్టో రికో' ],
    tg: [ 'Пуэрто Рико' ],
    th: [ 'ปวยร์โตรีโก' ],
    tk: [ 'Puerto-riko' ],
    tr: [ 'Porto Riko' ],
    ug: [ 'پوئېرتو رىكو' ],
    uk: [ 'Пуерто-Рико' ],
    ur: [ 'پورٹو ریکو' ],
    uz: [ 'Puerto-Riko' ],
    vo: [ 'Puertorikeäns' ],
    wa: [ 'Porto Rico' ],
    wo: [ 'Poortorikoo' ],
    yo: [ 'Púẹ́rtò Ríkò' ],
    zh: [ '波多黎各自由邦', '波多黎各' ]
  },
  abbreviation: 'PR',
  place_type: 'dependency',
  lat: 18.234668,
  lon: -66.481065,
  bounding_box: '-67.937815,17.922919,-65.244618,18.522773',
  population: 3971020,
  popularity: undefined,
  hierarchies: [
    {
      continent_id: 102191575,
      dependency_id: 85633729,
      empire_id: 136253057
    }
  ]
}

There's a lot, but its mostly names. While we could prune out some unneeded languages, we still need the names to set the parent fields later on. I think the only field we could truly remove is the bounding_box. Even popularity could possibly be useful to do things in the future like increase the popularity of neighbourhoods in popular cities.

Overall, I don't believe memory usage of this importer is much of an issue right now, and I don't see any super compelling benefits to trying to prune out stuff we don't need.

@missinglink
Copy link
Member Author

Agh ok, that makes sense, I didn't realize the name fields were also required but makes sense once you mention it.

At some point we can probably migrate this code to use the ancestors and names tables of the SQLite db.

I'm going to mark this as help wanted, we would be happy to merge a PR to that effect 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants