You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to parse a heap (relation, table) file of maximum size (1 GB for PostgeSQL).
fq consumes 90-100 times memory than file size. For 80 mb file fq requires 7,5 GB RAM.
time fq -d pgheap -o flavour=postgres14 ".Pages[0].PageHeaderData.pd_linp[0, 1, 2, -1] | tovalue" 16397
Killed
real 0m50.794s
user 1m11.962s
sys 0m8.994s
Kenel messages:
sudo dmesg | tail -2
[193541.830725] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-1.scope,task=fq,pid=454783,uid=1000
[193541.830748] Out of memory: Killed process 454783 (fq) total-vm:31508780kB, anon-rss:26629332kB, file-rss:272kB, shmem-rss:0kB, UID:1000 pgtables:58860kB oom_score_adj:0
1 GB File:
ls -alh 16397
-rw-r----- 1 pavel pavel 1.0G Aug 31 08:38 16397
Hello, at the moment fq has not been optimized to use less memory, focus has been to make it work at all :) The main reason it uses lots of memory at the moment is that it does very little "lazy" decoding, instead it will decode the full file and each field added will have to keep track of lots of metadata, it's name, parent, children, bit range, decode value, optional symbolic value, optional description string etc.
I'm not familiar with the pghep format but for example the mp4 decoder in fq has a option to skip decoding of individual samples if that not interesting (you can still decode individual samples manually) which speeds up decoding a lot.
But there are some options to improve the situation that i've thought about, most of them sadly quite complicated and not a easy task:
Try to compact down decode.Value and scalar.S somehow. They are the ones using the most memory i think.
Add some kind of slice to keep track of optional things like symbolic value, display format etc
Introduce interfaces for decode.Value and/or scalar.S
Can have type specific implementations that store less data
Can have implementation that only store actual value etc
Possible could have arrays that knows type and length
Do lazy decoding somehow
Might behave strange when there are errors or broken files as it's not noticed until you do a query etc
Not sure how it would interact with probing and possible some other things
Probably would make some queries slower as you have to decode and possibly do IO as "query time"
Do dull decode but only store ranges and then late decode again
Would detect errors
Probably would make some queries slower as you have to decode and possibly do IO as "query time"
What version are you using (
fq -v
)?How was fq installed?
My branch:
https://github.com/pnsafonov/fq/tree/postgres
Can you reproduce the problem using the latest release or master branch?
I can reproduce problem on my branch, where PostgreSQL parsers are implemented.
1 GB file:
https://github.com/pnsafonov/fq_testdata_postgres14
https://github.com/pnsafonov/fq_testdata_postgres14/raw/master/16397
What did you do?
I am trying to parse a heap (relation, table) file of maximum size (1 GB for PostgeSQL).
fq consumes 90-100 times memory than file size. For 80 mb file fq requires 7,5 GB RAM.
Kenel messages:
1 GB File:
Memory profiler results:
I check with disassembler.
TryFieldScalarFn.func1
is unnamed lambda inTryFieldScalarFn
.The text was updated successfully, but these errors were encountered: