Skip to content
This repository has been archived by the owner on Mar 3, 2023. It is now read-only.

Increased stream manager memory usage #1567

Closed
ajorgensen opened this issue Nov 16, 2016 · 13 comments · Fixed by #2073 · May be fixed by #1785
Closed

Increased stream manager memory usage #1567

ajorgensen opened this issue Nov 16, 2016 · 13 comments · Fixed by #2073 · May be fixed by #1785
Labels
Milestone

Comments

@ajorgensen
Copy link
Contributor

We've observed what appears to be the stream manager not releasing memory properly or holding onto memory indefinitely on heron 0.14.4. The following image is the rss of the heron stream manager, the red vertical line represents the same topology deployed on 0.14.4 on the left of the red line and 0.14.1 on the right of the red line. We can see that on 0.14.4 the rss for the heron stream manager grows until it pushed the container above its allocated memory and then forcibly gets killed while the same topology running on 0.14.1 shows a fairly consistent memory usage.

image

I do not currently have a test case setup that demonstrates this behavior but will work on getting one setup. It appears to only happen on topologies that utilize acking and we have not seen this behavior on those that do not.

@objmagic objmagic added the bug label Nov 16, 2016
@objmagic objmagic added this to the 0.15.0 milestone Nov 16, 2016
@congwang
Copy link
Contributor

@ajorgensen How much does the following one-line fix help?


diff --git a/heron/common/src/cpp/network/mempool.h b/heron/common/src/cpp/network/mempool.h
index a8467df..875e0ac 100644
--- a/heron/common/src/cpp/network/mempool.h
+++ b/heron/common/src/cpp/network/mempool.h
@@ -75,6 +75,7 @@ class MemPool {
     }
     B* t = pool.back();
     pool.pop_back();
+    pool.shrink_to_fit();
     return static_cast<M*>(t);
   }


I still can't reproduce it locally with AckingTopology.

@ajorgensen
Copy link
Contributor Author

I can try tomorrow. Ill also try to work out a reproducible test case for you if I can.

@congwang
Copy link
Contributor

Just in case, make sure you didn't change the default config for tuple cache:

# The frequency in ms to drain the tuple cache in stream manager
heron.streammgr.cache.drain.frequency.ms: 10

# The sized based threshold in MB for draining the tuple cache
heron.streammgr.cache.drain.size.mb: 100

@ajorgensen
Copy link
Contributor Author

ajorgensen commented Nov 18, 2016

No we have both of those still at the default.

@congwang what would happen if the spout could produce tuples faster than the bolt could consume them and the maxSpoutPending value was high? Would the spout know to stop calling nextTuple or would the buffer in the stream manager continue to increase until it ran out of memory?

@congwang
Copy link
Contributor

congwang commented Nov 18, 2016

The stream manager will send back pressure to the spout in this case.

@ajorgensen
Copy link
Contributor Author

@congwang were you able to reproduce the issue with the topology I sent you?

@congwang
Copy link
Contributor

congwang commented Jan 3, 2017

@ajorgensen I am still trying figure out how to build your topology

@ajorgensen
Copy link
Contributor Author

Oh ok. You should be able to create a simple pom file and build it with maven. Let me see if i can put one together for you.

@ajorgensen
Copy link
Contributor Author

@congwang sorry about that. I've emailed you the same project but with a working pom.xml file now. Let me know if you have any trouble building the topology

@congwang
Copy link
Contributor

congwang commented Jan 4, 2017

I guess we need the following fix:

diff --git a/heron/common/src/cpp/network/connection.cpp b/heron/common/src/cpp/network/connection.cpp
index c03ea8d..90cfbf3 100644
--- a/heron/common/src/cpp/network/connection.cpp
+++ b/heron/common/src/cpp/network/connection.cpp
@@ -240,6 +240,8 @@ void Connection::handleDataWritten() {
 
 sp_int32 Connection::readFromEndPoint(sp_int32 fd) {
   sp_int32 bytesRead = 0;
+  if (mUnderBackPressure)
+    return 0;
   while (1) {
     sp_int32 read_status = mIncomingPacket->Read(fd);
     if (read_status == 0) {

@ajorgensen
Copy link
Contributor Author

ajorgensen commented Jan 4, 2017 via email

@congwang
Copy link
Contributor

congwang commented Jan 5, 2017

@ajorgensen I am trying it, definitely not sure yet, when back pressure happens, spout should already stop sending data, only the pending data is still being transmitted. At least it could help us to rule out the case.

@objmagic
Copy link
Contributor

@congwang has informed @kramasamy that this is not a issue that could be resolved soon. Delay this to 0.15.0.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.