-
Notifications
You must be signed in to change notification settings - Fork 470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow callbacks as background jobs #1215
Comments
FYI this is how I implemented it in the project: class SphinxIndexationWorker < ApplicationJob
def perform(model, id)
indices_for_model(model).each do |index|
instance = index.scope.find_by(model.primary_key => id)
if instance
reindex(index, instance)
else
delete_from_index(index, id)
end
end
end
private
def indices_for_model(model)
ThinkingSphinx::Configuration.instance.index_set_class.new(classes: [model])
end
def reindex(index, instance)
# some indices are defined on model#base_class (*Plan) some on model itself (CMS::Page)
callback = ThinkingSphinx::RealTime.callback_for(index.model.name.underscore)
callback&.after_commit instance
end
def delete_from_index(index, *ids)
# This is how deletion is performed by ThinkingSphinx::ActiveRecord::Callbacks::DeleteCallbacks#delete_from_sphinx
ThinkingSphinx::Deletion.perform index, ids
end
end But it is good if it allows customization for different models. Because for accounts I have a little optimization to also soft delete dependent accounts: indices_for_model(model).each do |index|
if account && index.scope.find_by(pk => id)
reindex(index, account)
elsif account && !account.master?
delete_from_index(index, id, *account.buyers.pluck(:id))
else
delete_from_index(index, id)
end
end |
I have considered allowing real-time updates/deletions to occur via background jobs… but I've held off because I'm not sure if it's much of a performance benefit? If it happens immediately:
Compared to it happening via a background job:
I guess it's very much an app-specific consideration. There's also the challenge of providing appropriate integration with ActiveJob vs background job gems directly - for example, my understanding is that Sidekiq's performance is better when used directly rather than via ActiveJob.
This does sound painful - and I'm a little surprised, because I would hope that using the |
And just to clarify: I'm not against background jobs for running callbacks in theory, but right now it feels like it involves adding too much code to Thinking Sphinx for what feels like an edge-case. I wonder if it could be done as a separate gem instead of adding to TS directly? |
@pat, thank you for looking into it. I understand that it is hard to implement this in a way to fit everybody. I don't know about ActiveJob vs direct Sidekiq performance penalties. In my current project we use both and handle a huge amount of jobs. I hope that later this year we will have better monitoring and data to see if there is any practical difference. In general I'm always in favor of using a standard instead of some implementation. I think it is practical though, to first make it easy to implement a custom worker and then a common solution may more naturally occur. Getting indices: ThinkingSphinx::Configuration.instance.index_set_class.new(classes: [model]) (Re)index: callback = ThinkingSphinx::RealTime.callback_for(index.model.name.underscore)
callback&.after_commit instance Delete from index: ThinkingSphinx::Deletion.perform index, ids The above is messing up with library internals. If one can do ThinkingSphinx::RealTime.index(instance)
ThinkingSphinx::Deletion.perform(model, id) Or some equivalent easy way, that would be a huge improvement IMHO and allow indexing operations anywhere in object's lifecycle. My issue with STI is most likely because I'm using library internal details to get the callbacks on the fly (as written above) and that's different from calling the actually registered callbacks on a model. While on it, another benefit from doing indexing in the background is that this not only reduces request latency (Redis is supposedly fast) but also decouples requests from indexing so Sphinx errors would not fail it. Although Sphinx seems to be very stable. |
Prompted by discussion in #1215, this provides a direct interface to perform index-related operations on an ActiveRecord index. processor = ThinkingSphinx::Processor.new(instance) processor.delete processor.upsert # real-time indices only
This is a great suggestion. I've just added the following to the processor = ThinkingSphinx::Processor.new(instance)
# to delete from all relevant indices:
processor.delete
# to add/update in all relevant real-time indices:
processor.upsert
That's a good point about decoupling Sphinx errors - though yes, hopefully Sphinx is pretty stable with receiving updates! 😅 But perhaps this new processor class helps with your background job approach? |
In the background job I already don't have an |
An evolution of f6657c9, as per feedback from @akostadinov in #1215. If you’re dealing with a deleted record, then you may not have an instance (and cannot retrieve it from the database, as it doesn’t exist). So, the Processor class now allows for either an instance, or a model and id. The model should be the actual class, not the string name of the class.
Belatedly coming back to this, sorry for the delay! I've pushed a commit today to the
Hopefully this covers what's needed for you with the delete case? |
I looked at the commit and it looks great. Thank you! I have some very high priority things to do at the moment but I hope to be able to try out this within 2 weeks. |
Half a year later, but finally pushed a new gem release that includes these changes (v5.5.0). Do let me know if you give it a shot and whether there's further evolutions that would be helpful! In the meantime I'm going to close this issue, but happy to re-open later if required. |
is there a chance something similar to be implemented for plain indices also ? so they can be reindexed in parallel? |
This fell through the cracks, sorry about that. I just now checked this better. It still not fully useful to upsert or delete depending on index scope 😬 Will you be open if I later submit a new method Presently I see that |
No worries about the delay @akostadinov - clearly I'm not being super prompt at responding to open-source things at the moment (life is extra busy this year). I'm not entirely against a PR, I guess my concern is: is the behaviour of deleting if outside of a scope tied to your application, rather than a more general approach? (i.e. where deletions would only happen if the record is no longer in the database) |
@softbrada very sorry for the slow reply - perhaps you've found a different approach, but I just wanted to note that plain (SQL-backed) indices don't really handle parallelisation, because SQL-backed indices require rewriting files. If one index/file is being rewritten by one process, no other processes can/should touch it (and both Sphinx and Thinking Sphinx have some safeguards around this). |
@pat , please consider this pull request #1258 I think it is not application specific. If one specified an index scope, then most expected would be for that scope to filter which objects should be indexed and which - not. I found another issue that sounds similar or the same request - #1253 . This obviously comes at an additional database roundtrip cost but since the new method is optional to use, then I think it should be fair. The existing use cases should remain the same. |
Current project I'm working on tries to do index work as background jobs. Where index worker method is like this:
This mostly works but not for models using STI (single-table inheritance). For models where index is defined on the
base_class
, then one has to usemodel.class.base_class
and for models where index is on descendant, then#base_class
shouldn't be used. So it is an error prone and ugly business.Would be useful to be able to add callbacks like:
Or some other way to allow that easily.
It also applies to
ts:index
. It would be useful to queue indexing in background job batches. Presently we use the above indexation worker and can't benefit from the new clean-up functionality.== thinking-sphinx 5.4 on rails 5 (in process of upgrading).
/related to #1038/
The text was updated successfully, but these errors were encountered: