Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Production readiness / desired contributions #25

Open
magro opened this issue Nov 6, 2020 · 8 comments
Open

Production readiness / desired contributions #25

magro opened this issue Nov 6, 2020 · 8 comments

Comments

@magro
Copy link

magro commented Nov 6, 2020

Great initiative to provide an OSS stack for e-commerce search!

So far I haven't taken a closer look at the components and the integration yet. But I would be interested in which ones you would describe yourself as ready for production and which you would rather not use in production yourself. This could also be added to the README if this is possible.

I would also be interested in which components you would like to get contributions / in which parts would you say things should be improved (thinking about it, maybe this is just a different way to ask the same thing as before ;-)). Maybe you could create issues for things where you'd like to receive contributions...?

@magro magro changed the title Production readiness Production readiness / desired contributions Nov 6, 2020
@epugh
Copy link
Collaborator

epugh commented Nov 8, 2020

Thanks for starting this discussion! We've talked about the idea of establishing an explicit roadmap, but since this is open source software, it's tough to have one, since we all contribute to scratch our own itches. Having said that, I can share what I am really interested in seeing in the immediate future:

  1. Path to Production Deploys. Today, Chorus works great as a local development environment, however there is no guidance/path to Production. I'm interested in seeing the work done to make Chorus (i.e all of it's components) deploy on Kubernetes. To get there, we need the https://github.com/bloomberg/solr-operator to be added to the Solr project, and then we need to take advantage of it. Likewise we need all of the Kubernetes work for RRE, Quepid, SMUI, and Blacklight, Prometheus, and Grafana to be done as well.

  2. Online Analytics being gathered. While every user will build their own front end webshop, I think that we can have a common shared pipeline for online analytics. We want to capture all searches, click throughs, what products people looked at, what they ignored, all of that. We need to provide the hooks/paths for people to drop in that analytics to their front end, and then have the pipelines bring that back to an analytics environment.

  3. Ongoing work to build out support for typical ecommerce data models. For example, I'm working on adding inventory status to our ecommerce setup. We don't really have a schema that supports variants or any other common data modeling challenges that ecommerce has.

I'd be interested @magro on what you think would be needed to "go to production"? Especially if the three things I highlighted above were done?

Other areas that might be interesting are to think about what it would take to use Chorus with some of the existing ecommerce packages out there. What would the data integration look like to make Chorus and SAP Hybris work together for example? What about auto suggest? What should a out of the box autosuggest for Chorus look like?

@renekrie
Copy link
Contributor

renekrie commented Nov 9, 2020

Hey @magro and @epugh,

Thank you for getting the discussion started.

I think the current status is that Chorus is more like a template application from which you can 'copy/paste' when you build your own application. I would say that the Solr - Querqy - SMUI integration probably won't need too many adjustments (if any?) in a real-life application. I'd also say that the Solr schema is a reasonably starting point but I fully agree with @epugh's third point: we don't cover all aspects of a typical e-commerce schema yet. I also agree that deployment via Kubernetes should be on our list.

I agree that it is difficult for us to follow a roadmap but I also think we need to reach agreement what solutions we pursue: Do we accept any solution that solves the problem or do we only accept what we have seen to work well in e-commerce search practice? If we follow the latter approach, I wonder how many of us do have working experience with https://github.com/bloomberg/solr-operator - on the other hand we could kindly ask @tboeghk and @JohannesDaniel and try to explore their hands-on experience in this field (=Solr via Kubernetes) ;-).

Re analytics: That would be a great step forward. I think the problem is not in providing a software for event collection. Most search teams just plug themselves into what their overall application uses anyway. It also seems to me that tracking is moving to the server-side to a considerable degree (at least in Europe). What we could provide is a common schema for the events - as a guidance to what to collect and maybe to establish a common format - and then provide the tools to extract search relevance information from the data, maybe via BigQuery etc. In my opinion, our largest gap is calculating search relevance judgments from those events, for example to use them with RRE or Quaerite.

Another big area that we do not cover is indexing. What I see in practice is that many teams are struggling with indexing times. Often it's less about the indexing speed of the search engine but about collecting data from several sources and about transforming the data into the target format. A common solution is to add another data storage that holds the (partially) transformed data outside the search engine (DB, Kafka) so that most data can be loaded from there during indexing without having to go back to the original data source. While it cannot be in the scope of Chorus to provide a complete ETL solution, we could look into adding this additional storage as a good practice.

@epugh
Copy link
Collaborator

epugh commented Nov 9, 2020

Here is the JIRA tracking the solr-operator: https://issues.apache.org/jira/browse/SOLR-14994

@JohannesDaniel
Copy link

Hi, I like the idea of exploring the Bloomberg Solr-K8 solution. I have no experience with it so far as we have our own deployment definitions for Solr. We have been running Solr stably on K8 for almost two years now.

In general, there are only two special things to consider in the context of Solr K8 deployments

  1. The Zookeeper client is needed to include this fix: https://issues.apache.org/jira/browse/ZOOKEEPER-2184?jql=project%20%3D%20ZOOKEEPER%20AND%20text%20~%20kubernetes. Otherwise, Solr will not be able to reconnect to ZK after recreation of ZK pods (e. g. due to automated system updates). So the Solr version needs to be at least 7.7.0 (as far as I remember) in order to be resilient.
  2. Readiness and liveness probes need to be configured in a way that K8 will not interrupt Solr-internal recovery processes by killing pods.

@magro
Copy link
Author

magro commented Nov 11, 2020

Many thanks for all the insights!

I think that having a template for deploying these components to k8s in production would be really awesome! A path towards this could be to migrate the docker compose based setup to e.g. minikube or kind.
If this deployment setup would not yet be running comparably in production but should be considered experimental or alpha then I'd like to see this labeled accordingly, so that it's clear what one can expect.

As you wrote there are some parts not covered, like tracking/analytics, or an indexing/preprocessing stage. Maybe there could be a diagram showing a typical setup, to show what's covered and what's not. These potential additions could also be represented by tickets of course.

I have the feeling that the even more interesting or valuable things might be what you mentioned around typical ecommerce data models, like variants etc - not sure.

@epugh
Copy link
Collaborator

epugh commented Nov 12, 2020

You are touching on the fact that the docs need a LOT of work! Not sure if that is something you'd be interesting in leading? LIkewise the minikube idea.

Here is a link to my "framework" for relevancy, not specific to Chorus, but how I think about what a Relevance Framework would look like, and what some FOSS tools are: https://docs.google.com/presentation/d/1EspLVQa9d2qZB55rBouSGSPtqzdsQt_waaiG3aKfxzE/edit#slide=id.g93b06afd5d_0_0

@magro
Copy link
Author

magro commented Nov 13, 2020

Ha, I shouldn't have asked ;-)
Regarding docs the easiest thing probably would be some text instead of nice diagrams... Let me see if I can submit a PR with things mentioned here.

A local k8s setup would be more effort, maybe @JohannesDaniel could support here? I could also see what I can come up with, but not sure when I'll find time for this. And we should agree on the tooling, i.e. minikube or kind (I'd prefer kind because I made good experiences with it, but I also never used minikube...).

Thanks for the relevancy framework slides! Could this be a basis for visualizing what Chorus covers and what not?

magro added a commit to inoio/chorus that referenced this issue Nov 13, 2020
@magro
Copy link
Author

magro commented Nov 13, 2020

I just submitted a PR as a starting point. Some statements are probably wrong, but I'm sure you'll point them out :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants