allow callbacks as background jobs #1215

akostadinov · 2022-01-25T15:17:11Z

Current project I'm working on tries to do index work as background jobs. Where index worker method is like this:

  def perform(model)
    callback = ThinkingSphinx::RealTime.callback_for(model.class.name.underscore)
    callback&.after_commit model
  end

This mostly works but not for models using STI (single-table inheritance). For models where index is defined on the base_class, then one has to use model.class.base_class and for models where index is on descendant, then #base_class shouldn't be used. So it is an error prone and ugly business.

Would be useful to be able to add callbacks like:

ThinkingSphinx::Callbacks.append(self, {behaviours: [:real_time], background: true})

Or some other way to allow that easily.

It also applies to ts:index. It would be useful to queue indexing in background job batches. Presently we use the above indexation worker and can't benefit from the new clean-up functionality.

== thinking-sphinx 5.4 on rails 5 (in process of upgrading).

/related to #1038/

The text was updated successfully, but these errors were encountered:

akostadinov · 2022-02-03T18:35:18Z

FYI this is how I implemented it in the project:

class SphinxIndexationWorker < ApplicationJob
  def perform(model, id)
    indices_for_model(model).each do |index|
      instance = index.scope.find_by(model.primary_key => id)

      if instance
        reindex(index, instance)
      else
        delete_from_index(index, id)
      end
    end
  end

  private

  def indices_for_model(model)
    ThinkingSphinx::Configuration.instance.index_set_class.new(classes: [model])
  end

  def reindex(index, instance)
    # some indices are defined on model#base_class (*Plan) some on model itself (CMS::Page)
    callback = ThinkingSphinx::RealTime.callback_for(index.model.name.underscore)
    callback&.after_commit instance
  end

  def delete_from_index(index, *ids)
    # This is how deletion is performed by ThinkingSphinx::ActiveRecord::Callbacks::DeleteCallbacks#delete_from_sphinx
    ThinkingSphinx::Deletion.perform index, ids
  end
end

But it is good if it allows customization for different models. Because for accounts I have a little optimization to also soft delete dependent accounts:

    indices_for_model(model).each do |index|
      if account && index.scope.find_by(pk => id)
        reindex(index, account)
      elsif account && !account.master?
        delete_from_index(index, id, *account.buyers.pluck(:id))
      else
        delete_from_index(index, id)
      end
    end

pat · 2022-02-17T08:00:06Z

I have considered allowing real-time updates/deletions to occur via background jobs… but I've held off because I'm not sure if it's much of a performance benefit?

If it happens immediately:

Potentially run SQL to pull in associated data for the Sphinx index schema.
Send the update statement to Sphinx.

Compared to it happening via a background job:

Store the job details in a data-store (e.g. Redis)
Load the instance from the database again.
Potentially run SQL to pull in associated data for the Sphinx index schema.
Send the update statement to Sphinx.

I guess it's very much an app-specific consideration. There's also the challenge of providing appropriate integration with ActiveJob vs background job gems directly - for example, my understanding is that Sidekiq's performance is better when used directly rather than via ActiveJob.

This mostly works but not for models using STI (single-table inheritance). For models where index is defined on the base_class, then one has to use model.class.base_class and for models where index is on descendant, then #base_class shouldn't be used. So it is an error prone and ugly business.

This does sound painful - and I'm a little surprised, because I would hope that using the index_set_class with the classes option should navigate the hierarchy of inherited classes correctly. If you're finding that's not the case, then that sounds like a bug to me. Can you provide an example of where it's not working?

pat · 2022-02-17T08:01:40Z

And just to clarify: I'm not against background jobs for running callbacks in theory, but right now it feels like it involves adding too much code to Thinking Sphinx for what feels like an edge-case. I wonder if it could be done as a separate gem instead of adding to TS directly?

akostadinov · 2022-02-17T19:11:27Z

@pat, thank you for looking into it. I understand that it is hard to implement this in a way to fit everybody.

I don't know about ActiveJob vs direct Sidekiq performance penalties. In my current project we use both and handle a huge amount of jobs. I hope that later this year we will have better monitoring and data to see if there is any practical difference. In general I'm always in favor of using a standard instead of some implementation.

I think it is practical though, to first make it easy to implement a custom worker and then a common solution may more naturally occur.

Getting indices:

ThinkingSphinx::Configuration.instance.index_set_class.new(classes: [model])

(Re)index:

    callback = ThinkingSphinx::RealTime.callback_for(index.model.name.underscore)
    callback&.after_commit instance

Delete from index:

ThinkingSphinx::Deletion.perform index, ids

The above is messing up with library internals. If one can do

ThinkingSphinx::RealTime.index(instance)
ThinkingSphinx::Deletion.perform(model, id)

Or some equivalent easy way, that would be a huge improvement IMHO and allow indexing operations anywhere in object's lifecycle.

My issue with STI is most likely because I'm using library internal details to get the callbacks on the fly (as written above) and that's different from calling the actually registered callbacks on a model.

While on it, another benefit from doing indexing in the background is that this not only reduces request latency (Redis is supposedly fast) but also decouples requests from indexing so Sphinx errors would not fail it. Although Sphinx seems to be very stable.

Prompted by discussion in #1215, this provides a direct interface to perform index-related operations on an ActiveRecord index. processor = ThinkingSphinx::Processor.new(instance) processor.delete processor.upsert # real-time indices only

pat · 2022-02-27T08:51:55Z

This is a great suggestion. I've just added the following to the develop branch if you want to give it a shot:

processor = ThinkingSphinx::Processor.new(instance)

# to delete from all relevant indices:
processor.delete

# to add/update in all relevant real-time indices:
processor.upsert

While on it, another benefit from doing indexing in the background is that this not only reduces request latency (Redis is supposedly fast) but also decouples requests from indexing so Sphinx errors would not fail it. Although Sphinx seems to be very stable.

That's a good point about decoupling Sphinx errors - though yes, hopefully Sphinx is pretty stable with receiving updates! 😅 But perhaps this new processor class helps with your background job approach?

akostadinov · 2022-02-27T21:16:12Z

In the background job I already don't have an instance. I only have an id. So it will not help for the delete case. Any suggestions?

@akostadinov

An evolution of f6657c9, as per feedback from @akostadinov in #1215. If you’re dealing with a deleted record, then you may not have an instance (and cannot retrieve it from the database, as it doesn’t exist). So, the Processor class now allows for either an instance, or a model and id. The model should be the actual class, not the string name of the class.

pat · 2022-06-11T04:33:06Z

Belatedly coming back to this, sorry for the delay! I've pushed a commit today to the develop branch that changes the possible arguments for ThinkingSphinx::Processor:

processor = ThinkingSphinx::Processor.new(instance: instance)
# or
processor = ThinkingSphinx::Processor.new(model: Model, id: 123)

Hopefully this covers what's needed for you with the delete case?

akostadinov · 2022-06-11T18:27:28Z

I looked at the commit and it looks great. Thank you! I have some very high priority things to do at the moment but I hope to be able to try out this within 2 weeks.

pat · 2022-12-30T02:18:21Z

Half a year later, but finally pushed a new gem release that includes these changes (v5.5.0). Do let me know if you give it a shot and whether there's further evolutions that would be helpful!

In the meantime I'm going to close this issue, but happy to re-open later if required.

softbrada · 2023-03-28T07:08:43Z

is there a chance something similar to be implemented for plain indices also ? so they can be reindexed in parallel?

akostadinov · 2023-06-30T16:12:57Z

This fell through the cracks, sorry about that. I just now checked this better. It still not fully useful to upsert or delete depending on index scope 😬

Will you be open if I later submit a new method Processor#update that will check whether instance is part of the index scope and then decide to upsert or delete it from that index?

Presently I see that model.find is used which doesn't account for the scope.

pat · 2023-07-22T09:32:12Z

No worries about the delay @akostadinov - clearly I'm not being super prompt at responding to open-source things at the moment (life is extra busy this year).

I'm not entirely against a PR, I guess my concern is: is the behaviour of deleting if outside of a scope tied to your application, rather than a more general approach? (i.e. where deletions would only happen if the record is no longer in the database)

pat · 2023-07-22T09:34:45Z

@softbrada very sorry for the slow reply - perhaps you've found a different approach, but I just wanted to note that plain (SQL-backed) indices don't really handle parallelisation, because SQL-backed indices require rewriting files. If one index/file is being rewritten by one process, no other processes can/should touch it (and both Sphinx and Thinking Sphinx have some safeguards around this).

akostadinov · 2024-02-28T23:39:50Z

@pat , please consider this pull request #1258

I think it is not application specific. If one specified an index scope, then most expected would be for that scope to filter which objects should be indexed and which - not. I found another issue that sounds similar or the same request - #1253 .

This obviously comes at an additional database roundtrip cost but since the new method is optional to use, then I think it should be fair.

The existing use cases should remain the same.

This was referenced Jan 25, 2022

Inheritance and indexing #1038

Closed

Update index after soft delete #1057

Closed

akostadinov mentioned this issue Feb 17, 2022

after_commit should delete instances out of scope (soft-deleting) #1216

Closed

pat closed this as completed Dec 30, 2022

This was referenced Feb 28, 2024

scope based upsert or delete #1258

Closed

How to remove records on update with real-time indexes #1253

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow callbacks as background jobs #1215

allow callbacks as background jobs #1215

akostadinov commented Jan 25, 2022 •

edited

Loading

akostadinov commented Feb 3, 2022 •

edited

Loading

pat commented Feb 17, 2022

pat commented Feb 17, 2022

akostadinov commented Feb 17, 2022 •

edited

Loading

pat commented Feb 27, 2022

akostadinov commented Feb 27, 2022

pat commented Jun 11, 2022

akostadinov commented Jun 11, 2022

pat commented Dec 30, 2022

softbrada commented Mar 28, 2023

akostadinov commented Jun 30, 2023

pat commented Jul 22, 2023

pat commented Jul 22, 2023

akostadinov commented Feb 28, 2024

allow callbacks as background jobs #1215

allow callbacks as background jobs #1215

Comments

akostadinov commented Jan 25, 2022 • edited Loading

akostadinov commented Feb 3, 2022 • edited Loading

pat commented Feb 17, 2022

pat commented Feb 17, 2022

akostadinov commented Feb 17, 2022 • edited Loading

pat commented Feb 27, 2022

akostadinov commented Feb 27, 2022

pat commented Jun 11, 2022

akostadinov commented Jun 11, 2022

pat commented Dec 30, 2022

softbrada commented Mar 28, 2023

akostadinov commented Jun 30, 2023

pat commented Jul 22, 2023

pat commented Jul 22, 2023

akostadinov commented Feb 28, 2024

akostadinov commented Jan 25, 2022 •

edited

Loading

akostadinov commented Feb 3, 2022 •

edited

Loading

akostadinov commented Feb 17, 2022 •

edited

Loading