Friday, September 15 • 11:20am - 12:00pm
Vector Embeddings for Solr
In this session, we will go over various ways nonlinear vector embeddings can be used for search:

* Word2Vec and GloVe learn a "conceptual" vector space of lower dimension than the input vocabulary, and as such can be used in IR in similar ways to other dimensional reduction techniques such as LSI, pLSI, and LDA.
* Doc2Vec can learn better vectors for documents larger than a single sentence, and can also learn supervised class labels at the same time, to create a classifier for both documents and queries
* Shallow neural net-based embeddings trained on click data as a supervisory signal can learn a joint model over queries and document snippets to perform learning to rank.

We will demonstrate how to train all three of these models with open source tools and integrate them with a Solr-based search engine.

Jacob Mannix

Lead Data Engineer, Lucidworks
Living in the intersection of search, recommender-systems, and applied machine learning, with an eye for horizontal scalability and distributed systems. Currently Lead Data Engineer in the Office of the CTO at Lucidworks, doing research and development of data-driven applications on Lucene/Solr and Spark. | | Previously built out... Read More →

Banyan AB