-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
high memory usage under heavy load #56
Comments
cleanup is that low, since I was playing with the variables to find out what they do. I was hoping the cleanup kicks out old nodes faster and therefore the pastQueries with them. do we need all the data in
In’t RateLimit also dropping reply packets to my get_peers? The anacrolix/dht as a “Passiv bool” to disable responding to incoming queries completely… If ratelimit is only for incoming queries, setting it to 1 or 0 is probable the same effect? I'm running another profiling right now with ratelimit 10000 and Cleanup back to 15 Minutes default. |
Like we discussed, some of the memory usage you're seeing is normal given that GOGC=100, meaning that by default Go uses twice the amount of heap memory that it truly needs. It looks like A few alternatives:
It's been a while since I looked at this code so I could be wrong but I think the best approach here is to use a simple ring buffer that recycles the oldest event. See https://golang.org/pkg/container/ring/ It is possible that once we fix Want to submit a PR for doing the ring buffer for pastQueries? 😄 |
I've already replaced pastQueries with the lru from peerStore for a test beginning of last week, but discarded it for some reason. Running it again with the profiling enabled right now. |
had it running with max 1000 had to run it again because killed my profiler, so screenshot/data below is after about 20 minutes run.. if I substract totalNodes vs KilledNodes there are still 1.434.393 active nodes, which is quote a lot ... maybe limit the number of nodes, too?
|
totalNodes - totalKilled nodes isn't a reliable measure of your active
nodes. In theory at least, routingTable.numNodes is a better measure. Note
that supposedly the number of nodes in the routing table is already limited
by MaxNodes.
In the code you shared with me, the LRU has 1000 elements for
`pastQueries`. That's 1000 for each Node, and you have 500 MaxNodes.
Assuming each `pastQueries` entry consumes around 100b, that still wouldn't
explain 10GB of memory. I think you should make those LRUs significantly
smaller (like 10 elements), or make it a global LRU with a merged key of
infohash+node ID.
But there's something else missing.
Are you running multiple DHT nodes in a single process maybe?
…On Sun, Aug 5, 2018 at 8:42 PM Michael ***@***.***> wrote:
had it running with max 1000 pastQueries and pendingQueries but went up
to 20GB used memory in profiler...
had to run it again because killed my profiler, so screenshot/data below
is after about 20 minutes run..
[image: heap3]
<https://user-images.githubusercontent.com/9356285/43695684-59d131c0-9964-11e8-99eb-ecf048339bf1.jpg>
if I substract totalNodes vs KilledNodes there are still 1.434.393 active
nodes, which is quote a lot ... maybe limit the number of nodes, too?
"totalDroppedPackets": 0,
"totalFindNodeDupes": 491,
"totalGetPeersDupes": 59304708,
"totalKilledNodes": 291915,
"totalNodes": 3941162,
"totalNodesReached": 2587206,
"totalPacketsFromBlockedHosts": 4341,
"totalPeers": 13924652,
"totalReadBytes": 2739891552,
"totalRecv": 10306303,
"totalRecvFindNode": 106502,
"totalRecvFindNodeReply": 658,
"totalRecvGetPeers": 119602,
"totalRecvGetPeersReply": 8042017,
"totalRecvPingReply": 1903051,
"totalSelfPromotions": 1943,
"totalSent": 16217619,
"totalSentFindNode": 1031,
"totalSentGetPeers": 13092528,
"totalSentPing": 2834225,
"totalWrittenBytes": 1458361886
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#56 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAMY9vXNrxuDo0lzIR7RVOap7__2E38vks5uN7scgaJpZM4VvNzB>
.
|
for testing output I've just implemented
and the number grows way higher than MaxNodes. It's at 500.000 after 5 Minutes. MaxNodes seems only to take effect at bootstrap and we receive a ping from external, to decide if we "learn" the new node. But when sending out get_peers it stores each learned node always via But if I search 100.000 different hashes within one hour and from each search we learn 1000 new nodes (because the fact that get_peers always returns new nearest nodes) that's already 100 million nodes plus their pastQueries. I've now added a createTime to each new node, and expire them after CleanupPeriod*2 if unused.
and numNodes stays now around 1 million and heap around 4 gigabytes. I will try another version next were I keep your original code without all the lru-mod and only the code above. |
was running about 5 hours with that settings ... filling in 1000 searches/hashes in parallel and immediately refill it...
I've used a slightly modified dht.go because I had disabled the clientThrottle feature for debugging reason. so use that one regarding line numbers.
more detailed
it seems 2.5gig of the 8 get’s eaten by dht.go:610 which is
node.pastQueries[r.T] = query
also in krpc.go:189 the jackpal unmarshal eats another 1.123gig
CPU looks fine to me, just the unmarshal eats a plenty.. after I got (and still see) many of
dht.go:568] DHT: readResponse Error: unexpected EOF
, I was trying to replace it with https://github.com/IncSW/go-bencode (with is much less cpu intensive regarding the benchmarks there) to see if it solves the issue, but due to my very limited golang skills I was unable to find the right syntax parameters and value type conversations to make it work.expvar output
The text was updated successfully, but these errors were encountered: