Using Spark-Solr at Scale: Productionizing Spark for Search with Apache


This talk is a case-study on how Apache Glow and the Spark-Solr library is being utilized at Flipp for driving search relevance. Flipp is a Toronto based digital flyer and ecommerce business which helps consumers save money on weekly shopping. Our consumers have the option of checking out our 5+ million products from the brick-and-mortar sellers in North America. This makes Browse an extremely difficult function in our app. How to show the most relevant and individualized search results to users on a question?

The talk will focus on using user signals such as Click Through Rate (CTR) and Impressions to increase search relevance. I will likewise discuss how PySpark is used to create the Flipp Browse ETL platform for gathering user signals and reading item information from Solr The problem circumstance will be explained in which keyword search and basic significance algorithms become inefficient when dealing with a big product database. The solutions will cover the following implementations being utilized at Flipp to drive relevancy:– Utilizing user clicks and appeal information to obtain and index normalized item weights to implement the Browse Crowd Curation designs in Apache Solr.

— How around 5+ million items are classified into Google Categories in real time utilizing Keras and Apache Spark to power product classification curation in Solr.
— How to produce a crowd sourced question intent categorizer in Solr using the Spark-Solr library.
— Making use of offline and online metrics at Flipp for evaluating modifications in search relevance.
— Future plans for integrating Kafka-connect in Apache Solr with structured streaming to perform real-time product indexing with Spark-Solr library.

Source