Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect and throw throtlling and other firehose exceptions. #198

Closed
rverma-jm opened this issue Apr 5, 2020 · 2 comments
Closed

Detect and throw throtlling and other firehose exceptions. #198

rverma-jm opened this issue Apr 5, 2020 · 2 comments

Comments

@rverma-jm
Copy link

Putting json format records to kinesis, it runs for some time and then start producing warning for retry.
No idea what's going wrong behind the scene.

2020-04-05 05:01:24 +0000 [info]: #0 fluentd worker is now running worker=0
2020-04-05 05:01:33 +0000 [debug]: #0 [firehose_ok] Write chunk 5a2840b4f6b1ff93f5691ad484bde6e5 / 500 records /  221 KB
2020-04-05 05:01:33 +0000 [debug]: #0 [firehose_ok] Finish writing chunk
2020-04-05 05:01:33 +0000 [debug]: #0 [firehose_ok] Write chunk 5a2840b4f6b1ff93f5691ad484bde6e5 / 500 records /  221 KB
2020-04-05 05:01:33 +0000 [debug]: #0 [firehose_ok] Finish writing chunk
2020-04-05 05:01:33 +0000 [debug]: #0 [firehose_ok] Write chunk 5a2840b4f6b1ff93f5691ad484bde6e5 / 500 records /  221 KB
2020-04-05 05:01:33 +0000 [warn]: #0 [firehose_ok] Retrying to request batch. Retry count:   1, Retry records: 250, Wait seconds 0.26
2020-04-05 05:01:33 +0000 [debug]: #0 [firehose_ok] 70233003238780 sleep start
2020-04-05 05:01:34 +0000 [debug]: #0 [firehose_ok] 70233003238780 sleep finish
2020-04-05 05:01:34 +0000 [debug]: #0 [firehose_ok] Finish writing chunk
2020-04-05 05:01:34 +0000 [debug]: #0 [firehose_ok] Write chunk 5a2840b4f6b1ff93f5691ad484bde6e5 / 500 records /  221 KB
2020-04-05 05:01:34 +0000 [debug]: #0 [firehose_ok] Finish writing chunk
2020-04-05 05:01:34 +0000 [debug]: #0 [firehose_ok] Write chunk 5a2840b4f6b1ff93f5691ad484bde6e5 / 500 records /  221 KB
2020-04-05 05:01:34 +0000 [debug]: #0 [firehose_ok] Finish writing chunk
2020-04-05 05:01:34 +0000 [debug]: #0 [firehose_ok] Write chunk 5a2840b4f6b1ff93f5691ad484bde6e5 / 500 records /  221 KB
2020-04-05 05:01:34 +0000 [debug]: #0 [firehose_ok] Finish writing chunk
2020-04-05 05:01:34 +0000 [debug]: #0 [firehose_ok] Write chunk 5a2840b4f6b1ff93f5691ad484bde6e5 / 500 records /  221 KB
2020-04-05 05:01:34 +0000 [warn]: #0 [firehose_ok] Retrying to request batch. Retry count:   1, Retry records: 438, Wait seconds 0.32
2020-04-05 05:01:34 +0000 [debug]: #0 [firehose_ok] 70233003238780 sleep start
2020-04-05 05:01:34 +0000 [debug]: #0 [firehose_ok] 70233003238780 sleep finish
2020-04-05 05:01:34 +0000 [warn]: #0 [firehose_ok] Retrying to request batch. Retry count:   2, Retry records: 329, Wait seconds 0.33
2020-04-05 05:01:34 +0000 [debug]: #0 [firehose_ok] 70233003238780 sleep start
2020-04-05 05:01:35 +0000 [warn]: #0 no patterns matched tag="fluentd.pod.healthcheck"
2020-04-05 05:01:35 +0000 [debug]: #0 [firehose_ok] 70233003238780 sleep finish
2020-04-05 05:01:35 +0000 [warn]: #0 [firehose_ok] Retrying to request batch. Retry count:   3, Retry records: 329, Wait seconds 0.53
2020-04-05 05:01:35 +0000 [debug]: #0 [firehose_ok] 70233003238780 sleep start
2020-04-05 05:01:35 +0000 [debug]: #0 [firehose_ok] 70233003238780 sleep finish
2020-04-05 05:01:35 +0000 [warn]: #0 [firehose_ok] Retrying to request batch. Retry count:   4, Retry records: 329, Wait seconds 1.02
2020-04-05 05:01:35 +0000 [debug]: #0 [firehose_ok] 70233003238780 sleep start
2020-04-05 05:01:36 +0000 [debug]: #0 [firehose_ok] 70233003238780 sleep finish
2020-04-05 05:01:37 +0000 [debug]: #0 [firehose_ok] Finish writing chunk

Also wondering can we do firehose put directly with aggregated records

@simukappu
Copy link
Contributor

Putting json format records to kinesis, it runs for some time and then start producing warning for retry.
No idea what's going wrong behind the scene.

This warning message means that PutRecords API of Kinesis Data Streams or PutRecordBatch API of Kinesis Data Firehose returned failed records because of some kind of error, such as ProvisionedThroughputExceeded.

Gem code:

def batch_request_with_retry(batch, retry_count=0, backoff: nil, &block)
backoff ||= Backoff.new
res = yield(batch)
if failed_count(res) > 0
failed_records = collect_failed_records(batch, res)
if retry_count < @retries_on_batch_request
backoff.reset if @reset_backoff_if_success and any_records_shipped?(res)
wait_second = backoff.next
msg = 'Retrying to request batch. Retry count: %3d, Retry records: %3d, Wait seconds %3.2f' % [retry_count+1, failed_records.size, wait_second]
log.warn(truncate msg)
# TODO: sleep() doesn't wait the given seconds sometime.
# The root cause is unknown so far, so I'd like to add debug print only. It should be fixed in the future.
log.debug("#{Thread.current.object_id} sleep start")
sleep(wait_second)
log.debug("#{Thread.current.object_id} sleep finish")
batch_request_with_retry(retry_records(failed_records), retry_count+1, backoff: backoff, &block)
else
give_up_retries(failed_records)
end
end
end

Kinesis API reference:
https://docs.aws.amazon.com/kinesis/latest/APIReference/API_PutRecords.html
https://docs.aws.amazon.com/firehose/latest/APIReference/API_PutRecordBatch.html

Can you check your cloud side metrics by CloudWatch?

Also wondering can we do firehose put directly with aggregated records

Currently, we cannot put aggregated records directly into firehose. Appreciate your feedback. See also this issue #193.

@simukappu
Copy link
Contributor

Closing this issue for now. Please reopen if required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants