Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Max number of sidekiq threads for ts_delta indexer #1251

Open
atomical opened this issue Aug 15, 2023 · 2 comments
Open

Max number of sidekiq threads for ts_delta indexer #1251

atomical opened this issue Aug 15, 2023 · 2 comments

Comments

@atomical
Copy link

Hi Pat,

We have a lot of record updates coming in at the same time.

	using config file '/var/www/shared/config/qa.sphinx.conf'...
	indexing index 'schedule_delta'...
	FATAL: failed to lock /var/www/shared/db/sphinx/schedule_delta.tmp.spl: Resource temporarily unavailable, will not index. Try --rotate option.

It's followed by the worker exiting.

2023-08-15T20:59:11.182Z 3184402 TID-20bse WARN: SystemExit: exit
2023-08-15T20:59:11.193Z 3184402 TID-20bse WARN: /var/www/shared/bundle/ruby/3.1.0/gems/thinking-sphinx-5.4.0/lib/thinking_sphinx/commands/base.rb:41:in `exit'
/var/www/shared/bundle/ruby/3.1.0/gems/thinking-sphinx-5.4.0/lib/thinking_sphinx/commands/base.rb:41:in `handle_failure'

We would like to avoid setting the number of threads to 1. Currently it is at 5. Have you seen this before?

@nsennickov
Copy link

nsennickov commented Sep 8, 2023

Hello @atomical 👋
I'm facing the same issue. I use Sidekiq as a worker for delta indexing, and my Sidekiq config includes usage of relatively new feature of Sidekiq capsules which I could configure to use only 1 thread at a time and make sure that the only one job is executed at a time. The problem though is that having a separate capsule for Delta indexing leads all the Delta indexing job to perform xxx times slower.
Here is the data to compare:

  • Capsule with other queues in it, concurrency is 3 -> 0.2s average job execution time
  • Separate capsule with only ts_delta queue, concurrency is 1 -> 42s each job execution time

And I run out of ideas how to handle it properly. I my case the job fails with:

tid=igxb class=ThinkingSphinx::Deltas::SidekiqDelta::DeltaJob jid=700414f64806a88d7d48fc0e WARN:   Sphinx  Guard file for index user_delta exists, not indexing: /bla/bla/blabla/shared/db/sphinx/production/ts-user_delta.tmp.

I'd appreciate any help figuring it out

@pat
Copy link
Owner

pat commented Jul 7, 2024

Hey folks, very slow response here, but I just wanted to provide something in reply to your messages.

And unfortunately, the short answer is: Sphinx can only have one process updating a given index at once. So I think the options are either:

  • set concurrency to 1 for the ts_delta queue,
  • or: catch the errors and ignore them. Sphinx will include all delta records as part of the processing, so provided the updates to the delta index continue to be frequent, the next successful indexing will capture anything missed by the previous ones.

Beyond that, I'm not sure how else to work around this (well, short of using real-time indices instead of SQL-backed indices, and thus avoiding the need for deltas at all).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants