Retry "org.hbase.async.RemoteException: Call queue is full on" RPCs #135

manolama · 2016-04-25T20:25:44Z

HBase 1.x and later return an exception when the call queue is full. The native client will retry these calls as if it's was a recoverable exception. AsyncHBase should do the same.

vitaliyf · 2016-05-09T15:56:08Z

Hi,

Do you know what the plan is for fixing this?

devaudio · 2016-05-16T14:33:54Z

is there a workaround or anything?

vitaliyf · 2016-05-17T23:57:37Z

@manolama, can you glance at this PR - does it match what your plan for this issue was? These changes don't solve the issue for us yet, though it is catching these exceptions so presumably we need further handling to retry them.

manolama · 2016-09-07T18:00:57Z

@vitaliyf @stlava That's a good start and should help but it does invalidate the region cache which is something we likely don't want. If you want to issue a PR for this part we can get it in. Thanks.

dsimmie · 2016-09-23T10:26:34Z

I am getting this error using Cloudera CDH5.7.2 which comes with HBase 1.2. I am using v1.7.0 of the asynchbase client library in Scala.

I have been able to work around on it for some Get requests by increasing hbase.regionserver.handler.count and limiting request to a certain amount before collecting results. However I have some large Get requests that hit memory limits with so many concurrent threads. At least that is what I think they are doing they are timing out and I'm not sure why. I had increased hbase.regionserver.handler.count to 128 which worked fine until I started working with larger Get requests.

My typical use case is querying for 500k-1m random Gets (out of a total of 25m rows) stored on 9 region servers, hosting 100 pre-split regions. 1m row keys roughly equate to 8GB of data.

Are there any other workarounds or advice for dealing with the call queue full issue?

devaudio · 2016-09-23T18:39:04Z

All I keep doing is resizing/splitting regions until they are smaller and smaller. I have 40 region servers and currently 258 regions

vitaliyf · 2016-09-23T18:41:00Z

Our workaround (on same CDH5.7.x) was to set tsd.core.meta.enable_realtime_ts = false.

dsimmie · 2016-09-26T07:35:53Z

@vitaliyf from my reading the setting tsd.core.meta.enable_realtime_ts seems to be related to OpenTSDB, see entry on metadata here. I'm not using OpenTSDB. I cannot see why it is used for the async HBase client asynchbase. Is it used in asynchbase and if so where do I set it? I have added that entry to my config file and nothing has changed.

mikhail-antonov · 2017-03-15T23:03:42Z

I seems to have lost the track here, what happens here? Apache HBase 1.3 was released this January, I think this issue seems to be resolved?..

The proper behavior should be to not bail out but retry on that kind of exception, but avoid clearing location cache since it's likely temporary overload and not a permanent failure.

manolama · 2018-02-05T22:08:54Z

@mikhail-antonov This can still happen in 1.3, it simply has to do with a region server being unable to handle the request load. We can add code to AsyncHBase that would buffer and retry requests with a delay but that only makes sense for buffered writes. For reads, it makes more sense to fail the RPC and let the application figure out what to do, I think.

@dsimmie You're correct, that has no affect on AsyncHBase.

stannie42 · 2018-02-06T09:35:16Z

Thanks @manolama for the update. How does the native client behave with GetRequests ? Doesn't it retry like for PutRequest ? Is there someone working on this bug ? We are using asynchbase outside openTSDB and are highly affected by this.

manolama · 2018-04-04T19:37:50Z

@stannie42 Not too sure yet regarding the native client but we just upgraded internally to 1.3 and faced the issue when the HBase config changed and merged the read and write queues. We're separating them again and if that solves it I'd suggest you try it as well.

vitaliyf mentioned this issue May 16, 2016

org.hbase.async.RemoteException with CDH 5.7 OpenTSDB/opentsdb#783

Closed

stlava mentioned this issue May 16, 2016

Handle CallQueueTooBigException as a RecoverableException upsight/asynchbase#1

Closed

manolama added the bug label Sep 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry "org.hbase.async.RemoteException: Call queue is full on" RPCs #135

Retry "org.hbase.async.RemoteException: Call queue is full on" RPCs #135

manolama commented Apr 25, 2016

vitaliyf commented May 9, 2016

devaudio commented May 16, 2016

vitaliyf commented May 17, 2016

manolama commented Sep 7, 2016

dsimmie commented Sep 23, 2016

devaudio commented Sep 23, 2016

vitaliyf commented Sep 23, 2016

dsimmie commented Sep 26, 2016

mikhail-antonov commented Mar 15, 2017

manolama commented Feb 5, 2018

stannie42 commented Feb 6, 2018

manolama commented Apr 4, 2018 •

edited

Loading

Retry "org.hbase.async.RemoteException: Call queue is full on" RPCs #135

Retry "org.hbase.async.RemoteException: Call queue is full on" RPCs #135

Comments

manolama commented Apr 25, 2016

vitaliyf commented May 9, 2016

devaudio commented May 16, 2016

vitaliyf commented May 17, 2016

manolama commented Sep 7, 2016

dsimmie commented Sep 23, 2016

devaudio commented Sep 23, 2016

vitaliyf commented Sep 23, 2016

dsimmie commented Sep 26, 2016

mikhail-antonov commented Mar 15, 2017

manolama commented Feb 5, 2018

stannie42 commented Feb 6, 2018

manolama commented Apr 4, 2018 • edited Loading

manolama commented Apr 4, 2018 •

edited

Loading