Loading…
This event has ended. Create your own event → Check it out
This event has ended. Create your own
View analytic
Tuesday, September 12
 

9:00am

Pre-Conference Training
Pre-conference Training may only be attended by those registered for the "Training + Conference" registration option.

Tuesday September 12, 2017 9:00am - 5:00pm
TBA
 
Wednesday, September 13
 

9:00am

Pre-Conference Training
Pre-conference Training may only be attended by those registered for the "Training + Conference" registration option.

Wednesday September 13, 2017 9:00am - 5:00pm
TBA

5:00pm

Welcome Reception
Join us as we kick off Lucene/Solr Revolution 2017 in the Palm Foyer at Mandalay Bay with drinks, appetizers, and networking with Solr committers, sponsors, and conference attendees.

Wednesday September 13, 2017 5:00pm - 7:00pm
Palm Foyer
 
Thursday, September 14
 

7:30am

Breakfast
Thursday September 14, 2017 7:30am - 8:30am
TBA

8:30am

Opening Remarks
Thursday September 14, 2017 8:30am - 9:00am
TBA

9:05am

Keynote: #WeAreNotWaiting: Using open source to change healthcare
What if you could use open source technology to make a difference in ways previously thought impossible? And if you could make a difference in someone's life (like your own): would you wait for permission, or just do it? Learn how a community developed around the #WeAreNotWaiting movement to make a difference with open source technology to improve life with diabetes, and how these efforts are paving the way for changing the way we all can contribute to changing healthcare.

Speakers
avatar for Dana Lewis

Dana Lewis

Founder, OpenAPS
After building her own DIY “artificial pancreas”, Dana Lewis founded the open source artificial pancreas movement (known as “OpenAPS”), leading efforts in making safe and effective artificial pancreas technology available (sooner) for people with diabetes around the world. She is part of the #WeAreNotWaiting movement... Read More →


Thursday September 14, 2017 9:05am - 9:45am
South Seas Ballroom

9:45am

Break
Thursday September 14, 2017 9:45am - 10:00am
TBA

10:00am

Analytics at Scale with the Analytics Component 2.0
In this talk we will discuss the next iteration of the Solr Analytics Component. The analytics component gives users the ability to compute complex analytics, in real-time, over result sets. The focus of the new version of the analytics components was support for analytics on distributed/sharded collections while maintaining the capabilities and performance of the original. Several additional features have been added including faceting over field/function expressions, support for multi-value field expressions and significant performance improvements for high cardinality facets.

We will briefly review the capabilities of the original analytics component, followed by a demonstration of the features and use cases this new release enables. We will discuss the use cases that the analytics component is better suited for than Solr streaming. Finally, we will examine the internals of the component and how to best structure analytics requests in order to maximize performance.

Speakers
HP

Houston Putman

Software Developer, Bloomberg LP
Houston works at Bloomberg as a member of the Search Infrastructure team, providing search as a service to 300+ engineering teams. He has a BS in Computer Science & Mathematics from The University of Texas at Austin. While on the Search Infrastructure team at Bloomberg, Houston c... Read More →


Thursday September 14, 2017 10:00am - 10:40am
South Seas D

10:00am

SolrCloud Consistency and Recovery Internals
SolrCloud is a distributed search system, but how does it ensure that replicated data remains consistent? How does Solr avoid data loss when hardware inevitably fails? And how do temporary outages affect the availability of services?

In this talk, we will begin with a foundation of the relevant replication and leader election details before moving on to cover how Solr addresses failures and what recovery steps the cluster can automatically perform. We will examine the different classes of failure to better understand which ones are recoverable automatically, and which ones will require manual intervention by an operator. And in doing so, we will highlight which manual steps are safe and which ones carry more risk, as well as potential future improvements to the automated processes.

Speakers
avatar for Mano Kovacs

Mano Kovacs

Software Engineer, Cloudera
Mano has been an application developer for more than 15 years. In the past years he was focusing on distributed, large scale services. He was working on an IoT platform before he joined the Search/Solr team at Cloudera in 2016.


Thursday September 14, 2017 10:00am - 10:40am
Banyan AB

10:00am

Data to Application in Minutes
Good user experience is critical to the success of any search application. All too often, however, the design of the user interface is left until the last phase (and last dollar) of a project, even though this is what ultimately determines the value of a solution to its users. A part of the problem is that it's hard to capture user experience in wireframes from the outset. People prefer to interact directly with live data, rather than mock-ups, and as a project progresses, the data tends to evolve which impacts the UI requirements.

In this session we will talk about the importance of UX in the design of search applications, ranging from simple keyword search to more complex solutions in analytics and discovery. We will discuss common UX patterns and paradigms in search, and show how anyone, irrespective of design ability, can be guided to choosing the right patterns for their data.

We will also demo how we can apply these principles in practice, by quickly building a mobile-ready search application in a matter of minutes, using the Fusion App Studio.

Speakers
avatar for Bjarki Holm

Bjarki Holm

VP of Solutions, Lucidworks
Specialising in search applications, data analytics and data visualisation, Bjarki has a long experience of bringing search solutions to market. He is currently the VP of Solutions at Lucidworks, which is making it easier than ever before to build enterprise-grade search applicat... Read More →


Thursday September 14, 2017 10:00am - 10:40am
South Seas C

10:00am

Move from Mysql to Solr with NRT Data at Marketo
This talk will cover:

- Marketo's Solr architecture.
- Queries which used to take 1hr against Mysql now return results with in 5 seconds
- How near real time data is available in Solr by using Spark Streaming and Kafka
- How Marketo classifies their customers and gives them their own core in Solr.

Speakers
avatar for Rohit Kanchan

Rohit Kanchan

Sr Engineering Manager, Marketo
Rohit is a technologist and a leader with over 11 years experience in software industry. Working on lucene/solr for last 3 years. Rohit has worked on JVM based techs like Java, scala, spark, spark streaming etc. He has passion to build scalable applications and improve search pla... Read More →


Thursday September 14, 2017 10:00am - 10:40am
South Seas B

10:00am

Search Evolution at Kohl’s - Machine Learning and Personalization
In this talk, we will discuss the evolution of Search at Kohl’s starting from the migration to Solr, to the various capabilities we were able to integrate into our search platform.

We will also discuss how we integrated personalization into the ranking models, and how we measure the effectiveness of personalized search ranking. Finally, we will discuss the search results page itself - the machine learning models we use to show related searches and dynamic facets based on the query and the result set.

Speakers
avatar for Praveen Settipalli

Praveen Settipalli

Senior Product Manager, Search Platform, Kohl's Digital Center
Praveen Settipalli is a Senior Product Manager at Kohl’s Department Stores, where he leads a cross-functional team responsible for Omni-channel Search and Discovery. Prior to that, he ran the product management and customer success functions at Attune Inc., a Google Ventures-back... Read More →
avatar for Reddy Yakkanti

Reddy Yakkanti

Sr. Staff Engineer, Kohl's Digital Center
Reddy is a Sr. Staff Engineer at Kohls Department Stores and is responsible for Search Platform, which supports Omni-channel Search & Browse features. Prior to Kohls, Reddy managed Search teams at Walmart.com and Apple. He has been working on Search for more than 12 years.


Thursday September 14, 2017 10:00am - 10:40am
South Seas A

10:50am

Why Human Annotated Data Matters for Search
Search is a critical component of any effective website or application, connecting users to the data they need to make decisions, whether it’s to find documents, do research or complete an online purchase. Modern search engines have evolved significantly in the past 5 years, incorporating machine learning and artificial intelligence techniques in all aspects of document and query processing, as well as in user profiling and personalization. These techniques require large volumes of high quality training and evaluation data in order to be effective.

In this session, you’ll hear Kevin Vondemkamp, Vice President – Web, Social & eCommerce of Appen and Grant Ingersoll, CTO of Lucidworks discuss how Lucidworks leverages Appen’s crowdsourcing capabilities to enhance its machine learning technology to improve all aspects of the search experience.

Speakers
avatar for Grant Ingersoll

Grant Ingersoll

CTO, Lucidworks
Grant Ingersoll is a Solr/Lucene committer and CTO of Lucidworks. Grant is co-founder of the Mahout machine learning project, and a longstanding member of the Apache Software Foundation. Grant is also the co-author of Taming Text from Manning Publications.
avatar for Kevin Vondekamp

Kevin Vondekamp

Vice President – Web, Social & eCommerce, Appen
Kevin Vondemkamp is a senior executive with extensive domestic & international experience in Sales, Marketing & Business Development for both start-ups and major corporations. In his current role at Appen, he works with leading global technology companies to improve their machine... Read More →


Thursday September 14, 2017 10:50am - 11:30am
South Seas B

10:50am

Art and Science Come Together When Mastering Relevance Ranking
For most search based products, more relevant results mean a more valuable product. Getting this right is usually not something obvious. Within the world of Search, engineering relevance has become a profession in itself.
Solr comes out of the box with a scoring algorithm which is sophisticated but rarely good enough. Fortunately you can break it down and build it up again with different blocks.

This is a session where we demonstrate how to get an absolute grip on the relevance calculation of search results. We'll show the science of analyzing and manipulating the scoring algorithm and we'll show the art of shaping the score to your needs.

Speakers
avatar for Tom Burgmans

Tom Burgmans

Search Engineer, Wolters Kluwer
Tom Burgmans currently works as a Search Specialist for Wolters Kluwer, a publisher for professional content. He is responsible for maintaining and improving the quality of the search experience in the online publishing platforms. | Tom has over 11 years of experience in search... Read More →


Thursday September 14, 2017 10:50am - 11:30am
South Seas D

10:50am

Monitoring Metrics in Solr 6 and 7
This session will present in detail how the metrics subsystem in Solr 6.5 and 7 is designed and implemented, what kind of insights it provides into Solr state and performance, as well as how it integrates with external monitoring platforms., with examples of standard and custom integrations. It will also explain how metrics are reported and aggregated in SolrCloud, and how this information is used for cluster auto-scaling.

Speakers
avatar for Andrzej Białecki

Andrzej Białecki

Engineer, Lucidworks
Andrzej Białecki has over 20 years of experience in software engineering, ranging from system integration, to OS development to information retrieval, to standardization of e-commerce models. He’s been actively involved in Open Source since 1997. Currently he’s an Apache Lucene/Solr PMC Member... Read More →


Thursday September 14, 2017 10:50am - 11:30am
South Seas A

10:50am

Passage Search: the Answer to Your User Questions!
Searching for top-ranking documents with Solr is dead-simple: submit your query and Solr will spin its wheels to match your query against documents, score each one and return a ranked list of them. But what if your application requires searching for, say, a paragraph or a sentence? Indexing those as documents is not always possible, as well not very flexible, especially if different users require searching for different-sized portions of the documents.

In this talk I will describe the problem of Passage search, what use cases it addresses, and how it differs from traditional Document search with highlighting. I will also review multiple implementation approaches, all of which can be implemented at the client side, without modifying out-of-the-box Solr code. I will then conclude with evaluation results of each approach.

Speakers
avatar for Shai Erera

Shai Erera

STSM, Social Analytics & Technologies, IBM
Shai Erera is a Researcher at IBM Research, Haifa, Israel. Shai earned his M.Sc in Computer Science from the University of Haifa in 2007. Shai’s work experience includes the development of search-based systems over Lucene and Solr and he is also a Lucene/Solr committer.


Thursday September 14, 2017 10:50am - 11:30am
Banyan AB

10:50am

Search as a Force Multiplier: Measuring Search Success for Key Stakeholders
Customers, managers and content creators are all invested in the quality of the search results. The challenge is that each group has different expectations from search and different priorities for measuring search success. Learn how to quantify each type of success metric to provide clear value to each user group. These data-driven success metric models can quantify success and provide information to make better informed search and content decisions.

Speakers
avatar for JP Sherman

JP Sherman

Manager of Search & Findability, Red Hat
JP Sherman is a fifteen year veteran of Search, Findability and Competitive Intelligence. As the Search & Findability Manager for the Red Hat Customer Portal, he bridges the intention gap between tens of thousands of technical and support documents and the customers looking for them in Google and Red... Read More →


Thursday September 14, 2017 10:50am - 11:30am
South Seas C

11:40am

Designing A Search Platform? Ask These Questions First!
Search is a rather complex problem to solve and this becomes  even trickier when you are building a platform to support multiple use cases. An essential part of designing such platforms is ‘asking the right questions’.

This talk will cover a wide range of questions that should be addressed, based on the audience of the platform, while designing a search platform. Ranging from scalability questions, in terms of being able to support more data and more users, to usability questions, like auto-detection of fields and if it’s a good thing to have, this talk will cover a variety of aspects. Anshum will also talk about security and how it might be critical and what to think about in terms of both simple security and multi-tenancy based on the use case you might have.

At a high level, Anshum will share his learnings around scalability, usability, and security that he has accumulated over the years at multiple large and small organizations.

Speakers
avatar for Anshum Gupta

Anshum Gupta

Lucene/Solr Committer
Anshum Gupta is an Apache Lucene/Solr committer and PMC member with over 10 years of experience with search, and related technologies. He started dabbling with Lucene 10 years ago, and since then has worked at various organizations, including but not limited to IBM Watson, and Am... Read More →


Thursday September 14, 2017 11:40am - 12:20pm
Banyan AB

11:40am

The Apache Solr Semantic Knowledge Graph
What if instead of a query returning documents, you could alternatively return other keywords most related to the query: i.e. given a search for "data science", return me back results like "machine learning", "predictive modeling", "artificial neural networks", etc.? Solr’s Semantic Knowledge Graph does just that. It leverages the inverted index to automatically model the significance of relationships between every term in the inverted index (even across multiple fields) allowing real-time traversal and ranking of any relationship within your documents. Use cases for the Semantic Knowledge Graph include disambiguation of multiple meanings of terms (does "driver" mean truck driver, printer driver, a type of golf club, etc.), searching on vectors of related keywords to form a conceptual search (versus just a text match), powering recommendation algorithms, ranking lists of keywords based upon conceptual cohesion to reduce noise, summarizing documents by extracting their most significant terms, and numerous other applications involving anomaly detection, significance/relationship discovery, and semantic search. In this talk, we'll do a deep dive into the internals of how the Semantic Knowledge Graph works and will walk you through how to get up and running with an example dataset to explore the meaningful relationships hidden within your data.

Speakers
avatar for Trey Grainger

Trey Grainger

SVP of Engineering, Lucidworks
Trey is the SVP of Engineering at Lucidworks, where he leads their engineering efforts around both Apache Lucene/Solr, as well as Lucidwork’s commercial product offerings. Trey is also the co-author of the book Solr in Action, as well as a published researcher and frequent publ... Read More →


Thursday September 14, 2017 11:40am - 12:20pm
South Seas A

11:40am

Doing Synonyms Right
This session will start with a presentation of the deprecated SynonymFilter and provide examples of its implementation and a discussion of its limitation with handling multi-word synonyms. We will then move on to an introduction of the new SynonymGraphFilter and present examples of its configuration and use, includind its handling of multi-word synonyms. Finally, we will introduce a Synonym-URI replacement strategy for multi-word synonyms and compare its index, query and linguistic performance to that of the SynonymGraphFilter.

After this session participants will be able:

* Understand the differences between Index Time and Query Time synonym replacement
* Understand the complexities regarding multi-word synonyms
* Implement SynonymFiter based synonyms
* Implement SynonymGraphFilter based synonyms
* Implement a Synonym-URI replacement strategy

Speakers
avatar for John Marquiss

John Marquiss

Applications Architect, Wolters Kluwer
John is an applications architect at Wolters Kluwer a leading provider of health, legal, tax and accounting information. Wolters Kluwer has been heavily invested in Solr for over 5 years using it to deliver search and search related feature improvements to its products. John brin... Read More →


Thursday September 14, 2017 11:40am - 12:20pm
South Seas B

11:40am

Solr’s Missing Plugin Ecosystem
Until now, Solr plugins were mainly a developer-level concept of Java classes implementing certain interfaces. They were hard to discover, being spread around the internet, hard to install, with version conflicts and dependencies, and often you’d have to build it yourself.

The improved plugin system being proposed in this talk utilizes PF4J to add bundle packaging (zip/jar), plugin discovery (repositories), one-line install/upgrade and automatic version compatibility checks. Think of it as Homebrew or Apt-Get for Solr :) The hope is that this will encourage hundreds of new plugins being created and thus give Solr developers a sense of community and a new “stage” to perform on.

I’ll demo the current state of things with searching, installing, upgrading and uninstalling plugins both from bin/solr command line and from the Admin UI. You should attend this talk if you just want a “WOW, Give It To Me” experience or if you want to help out maturing the feature!

Speakers
avatar for Jan Høydahl

Jan Høydahl

Owner, Comnivent
Jan is a Lucene/Solr committer, freelance consultant, instructor, speaker and father of two. Having worked within IT since 1995 and with search tech since 2000, Jan is a seasoned full-stack developer and search/discovery solutions architect.


Thursday September 14, 2017 11:40am - 12:20pm
South Seas D

11:40am

Context Driven Search Ranking and Faceting
In e-commerce, when searching over a wide range of products, sorting your search result is crucial to your business. You most likely want to sort your search result depending on the search context - like the dominating category or the user's search or order history. We built a Solr plugin to determine the current category "in flight" and change sorting and facetting of the current query without adding any overhead to the query. In this talk I will guide you through the implementation and give examples on how to add search context to your Solr query.

Speakers
avatar for Torsten Bøgh Köster

Torsten Bøgh Köster

CTO, Shopping24 Internet Group
Torsten Bøgh Köster is the CTO at the shopping24 internet group. Together with his small team he is developing the search appliance and algorithms behind the companies product search portals. Each of them leverages more than 60M products out of the Apache Solr based software stac... Read More →


Thursday September 14, 2017 11:40am - 12:20pm
South Seas C

12:20pm

Lunch
Thursday September 14, 2017 12:20pm - 1:20pm
TBA

1:30pm

Personalized Search Results and Job Recommendations at Dice.com

One drawback of machine learned ranking models is the lack of personalization; the algorithm does not typically make use of a user’s behavior to tailor search results to the user’s own preferences, instead relying on relevancy judgements usually collected from a small subset of users. Relevance feedback takes a different approach – use either implicit or explicit feedback from the user to improve relevancy. By monitoring a user’s individual search behavior, their prior search and browse behavior, or by asking them for explicit feedback on the relevancy of results, the search engine can better adapt search results to the individual user, allowing for a more personalized search experience. Another form of relevance feedback, so called “Blind Feedback”, uses the initial set of search results to expand and re-execute the original query to improve recall, without the need for explicit synonym files. A user’s profile can also be used directly to improve search relevancy, or to provide content-based personalized recommendations.

In this talk, we will discuss these various approaches to improving relevancy, and how they can be incorporated into Solr via simple plugins. All code used in the presentation will be made available on GitHub following the presentation.


Speakers
avatar for Simon Hughes

Simon Hughes

Chief Data Scientist, DHI / Dice.com
Simon is currently the Chief Data Scientist at Dice.com, the technology professional recruiting site. He is also a PhD candidate at DePaul university, studying a PhD in machine learning and natural language processing. At Dice, he has developed multiple recommender engines using... Read More →


Thursday September 14, 2017 1:30pm - 2:10pm
South Seas A

1:30pm

Analytics and Graph Traversal with Solr
Analytics in Apache Solr continue to advance at a rapid pace, intersecting with other features such as faceted search and graph traversal. This talk will cover new advances in these areas as well as streaming expressions, parallel SQL, and the trade-offs and scalability characteristics of different approaches to real-time analytics.

Speakers
avatar for Yonik Seeley

Yonik Seeley

Search Engineer, Cloudera
Yonik Seeley is the creator of Solr. He works at Cloudera integrating and leveraging "Big Search" technologies into the many components comprising the Cloudera enterprise data hub (EDH). Yonik was previously a co-founder of LucidWorks, and he holds a master's degree in computer s... Read More →


Thursday September 14, 2017 1:30pm - 2:10pm
Banyan AB

1:30pm

Integrating Clickstream Data in Solr for Ranking and Dynamic Facet Optimization
Ranking with clickstream data: Clickstream data provides implicit guest feedback on search results. In this talk, I will cover extracting clickstream events to derive ranking score for a given search term and item based on previous click history. We will discuss:

Offline data computation and indexing:
- Defining clickstream events: clicks, conversion
- Spark implementation for clickstream batch processing
- Reverse indexing search term and rank score to SOLR

Query time ranking:
- Apply click scoring using SOLR function and boost queries
- Merge click score with default lucene tf/idf similarity score

Facet Optimization:
Faceted search increases conversion. Showing the right facets is key to improved user engagement and a better search experience. In this talk, I will discuss how click stream data can be used to derive function to reorder facets to optimize user engagement. Also, I will go over learning model for discovering the right facets for a query and filtering irrelevant facets. The model incrementally learns from previous iterations to dynamically adjust the ranking of facets.

Speakers
avatar for Ilayaraja Prabakaran

Ilayaraja Prabakaran

Lead Engineer, Target
Ilay is Lead Engineer for Search relevance and ranking at Target. His prior experience included indian web and e-commerce startups and Yahoo!. He has been SOLR user for several years and a big data enthusiast. Ilay has master in computer science from IIIT, Hyderabad where he spec... Read More →


Thursday September 14, 2017 1:30pm - 2:10pm
South Seas C

1:30pm

Running a Highly Available and Scalable Solr Platform in the Cloud at The Home Depot
Home Depot's online search platform which powers our ecommerce platform requires an engine that can handle high volume reads, real time indexing and batch updates on nodes taking live traffic. Leveraging Solr Cloud's distributed architecture and google cloud infrastructure, we were able to provision an HA and scalable Solr cluster across multi regions to support the high volume reads and indexing.

This session will focus on: the provisioning and automation of building a Solr cluster in the cloud; challenges involved in automation of Solr cluster creations on cloud platforms like Google Cloud Platform and how our team overcame them; replica management in ephemeral cloud infrastructure; how to plug Solr into Google Cloud Instance Group infrastructure; and learnings from usage of Go language plugins for Zookeeper and in house Go language collection management package for Solr.

We'll also cover patterns on routing traffic to a newly provisioned cluster without losing real time indexing feeds to support version upgrades or any changes. We'll talk about our performance learnings and optimizations which helped meet our aggressive SLOs. In the end we'll also cover the metrics, alerting and automated recovery patterns we implemented for Solr, Zookeeper and Fusion stack to ensure reliability.

Speakers
avatar for Navin Anandaraj

Navin Anandaraj

Principal Engineer, The Home Depot
Navin is a Principal Engineer at Home Depot specializing in cloud technologies and cloud platform architecture. Over the last 6 months he has been solely involved in leading the design and engineering work to host a highly available and scalable Solr based search engine in the cl... Read More →
avatar for Ilamgumaran Velayuthan Karunanithi

Ilamgumaran Velayuthan Karunanithi

Staff Software Engineer, The Home Depot
Ilamgumaran is a Staff Software Engineer at Home Depot leading the Search Platform implementation for Solr migration. Primarily involved in automation process required to create Solr clusters on Google Cloud Platform and design of Solr collections.


Thursday September 14, 2017 1:30pm - 2:10pm
South Seas D

2:20pm

Learning to Rank from Clicks
This talk introduces the mathematics behind the effort at Salesforce to improve search relevance using machine learning.

First, we discuss a new approach for search ranking. Much of the active research in this field has focused on the use of a large amount of labeled data. Specifically, human beings need to sift through tens of thousands of queries---along with returned results---and mark each result on a scale from `not relevant' to `extremely relevant'. At Salesforce, we have developed a novel machine learning approach, for learning a ranking function, that does not require labeled data.
Second, we discuss the mathematical foundations of our AB testing program. Since user clicks do not follow any standard distribution, we rely on extensive simulations to estimate the power of AB tests and evaluate the significance of the results.

Speakers
avatar for Zach Alexander

Zach Alexander

Principal Data Scientist, Salesforce.com
Zach has extensive experience as a data scientist. He has worked in a variety of fields, including: hard drive reliability (Seagate Technology), voice-over-IP (Skype) and enterprise search (Salesforce.com). He holds a Ph.D. in Applied Math and has been heavily involved in data sc... Read More →
avatar for Tracy Backes

Tracy Backes

Data Scientist, Salesforce.com
Tracy Backes is a Data Scientist with the Data Science for Communities, Service and Search team at Salesforce. She has been at Salesforce for one and a half years, where she works to build data-driven solutions for projects related to search relevance, sales forecasting and custo... Read More →


Thursday September 14, 2017 2:20pm - 3:00pm
South Seas A

2:20pm

Payloads in Solr
Solr now smoothly integrates with Lucene-level payloads. Payloads provide optional per-term metadata, numeric or otherwise.  Payloads help solve challenging use cases such as per-store product pricing and per-term confidence/weighting.

This session will present the payload feature from the Lucene layer up to the Solr integration, including per-store pricing, per-term weighting.

Speakers
avatar for Erik Hatcher

Erik Hatcher

Senior Solutions Architect, Lucidworks
Erik co-founded and works as a Senior Solutions Architect at Lucidworks where he ponders and solves challenging search and discovery problems. He co-authored "Lucene in Action" and is a committer on the Lucene and Solr projects.


Thursday September 14, 2017 2:20pm - 3:00pm
Banyan AB

2:20pm

The New Replica Types of Solr 7.0
Learn about new replicas types of Solr 7.0, which are very suitable for high load clusters (heavily indexed and searched) with the tradeoff of soft-commit.

Speakers
avatar for Shalin Shekhar Mangar

Shalin Shekhar Mangar

Senior Solr Consultant, Lucidworks
Shalin Shekhar Mangar is an Apache Lucene/Solr committer since 2008 as well as a member of the Lucene/Solr project management committee. He worked at AOL for five years on vertical search, content mangement systems, social/community platforms and anti-spam systems as well as AOL... Read More →
avatar for Dat Cao Manh

Dat Cao Manh

Solr Engineer, Lucidworks
Dat is a Lucene/Solr committer working at Lucidworks since September 2016.


Thursday September 14, 2017 2:20pm - 3:00pm
South Seas D

2:20pm

Introduction to Lucidworks Fusion
Build better search apps with Lucidworks Fusion, a scalable development platform leveraging the power of Apache Solr and Spark. Fusion provides everything you need to build and deploy intelligent search applications, from data acquisition tools and built-in security, to advanced signals and ML models processing for relevance tuning. Fusion’s paradigm of data pipelines on both the indexing and query sides allows for rapid application development and configuration. Fusion ships with search analytics tools, custom user interface components, data visualization tools, cloud deployment support, and many other mission critical features, meaning faster time to market for your search and data applications.

Speakers
avatar for Alexander Kanarsky

Alexander Kanarsky

Senior Software Engineer, Fusion Team, Lucidworks
Alexander Kanarsky is a Senior Software Engineer at Lucidworks, working on developing Lucidworks Fusion. Prior to joining Lucidworks, Alexander led Backend Search team at Trulia (now part of Zillow group), scaling up Trulia's Solr-based search infrastructure. He also was one of c... Read More →


Thursday September 14, 2017 2:20pm - 3:00pm
South Seas B

2:20pm

Our Tale from the Trail of Shadows at REI Co-op
Is it possible to build something fast and cheap while putting quality first?
Join us as we share our journey of a successful upgrade to the REI Co-Op Digital E-commerce search experience.

You will learn how our team was structured for success, and how we earned the trust of the business team, allowing us to work efficiently with few impediments.

We'll take a technical dive into some of the tools and methods that we used to validate our new Solr search platform, which ultimately allowed us to execute the holy grail of software upgrades - an uneventful one!

Speakers
avatar for Chris Phillips

Chris Phillips

QA Architect, REI Co-op
Chris is a QA Architect at REI Co-op. Chris has spent the last 20 years challenging his teams to put Quality first. Chris has worked on a wide variety of projects ranging from games, Medical software, and the launch of Kinect for Xbox. He has a passion for making software do really bad things, so the users... Read More →
avatar for Dale Smith

Dale Smith

Lead Engineer, REI Co-op
Dale is a Lead Engineer at REI Co-Op, and is responsible for the digital search platform. | | A (nearly) 40-year veteran of the software industry, he has spent time in an array of roles from developer, architect, even engineering director, for a number of Seattle-area tech compa... Read More →


Thursday September 14, 2017 2:20pm - 3:00pm
South Seas C

3:10pm

Apache Zeppelin Solr Interpreter
Apache Zeppelin is a web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more. With the Solr interpreter, Zeppelin can now utilize Solr as backend and allow users to issue Solr queires to visualize results in the Zeppelin UI. Currently, Solr interpreter supports basic Solr search, SQL, and Streaming expression queries. More capabilities are being developed and will be added soon. In this talk, we will walk you through how to get up and running with an example dataset to see various commands that we can use with Solr interpreter.

Speakers
avatar for Chao Han

Chao Han

Sr. Data Scientist, Lucidworks
Chao is a data scientist with over 10 years of analytical experience in both academia and industry. She got a PHD in Statistics from Virginia Tech in 2012 (dissertation: Bayesian visual analytics for high dimensional data. with 8 publications).  After graduation, she worked at JPMorgan Chase... Read More →
avatar for Andrew Thanalertvisuti

Andrew Thanalertvisuti

Solutions Architect, Lucidworks
Andrew is the Solutions Architect at Lucidworks, where he has developed the Banana (a fork of Kibana) project to visualize data in Solr. He has been working on visualization projects and analytics solutions across different teams at Lucidworks.


Thursday September 14, 2017 3:10pm - 3:50pm
Banyan AB

3:10pm

User Behavior Driven Intelligent Query Re-ranking and Suggestion

With the advancements in machine learning, Solr is becoming more intelligent. In this talk, two user behavior driven features are discussed. Search result straight coming out of index may not be optimal. When those results are re-ordered by price, review score, arrival time etc, results are further degraded. An L2R (learn to rank) method is developed to learn from user behavior and the learnt model re-ranks search result in an optimal way. Both text features and image features are converted into vector forms and combined to train ensemble models. The method can be applied to queries with or without user behavior and can also be used to intelligently sort search results by criteria other than search relevancy. The second user behavior driven feature is type-ahead, also known as auto-complete, autofill, auto-suggest. Traditional methods are based on statistics from query log and product catalog. The new method leverages users’ selection of the predicted queries along with the location and time of the interaction to provide more relevant and accurate type ahead suggestions. Both features showcase how machine learning meets solr to render a more powerful overall search experience in an e-Commerce website.  


Speakers
avatar for Rajdeep Mondal

Rajdeep Mondal

Senior Software Developer, The Home Depot
Rajdeep is a Senior Software Engineer at The Home Depot where he researches and implements new statistical and machine learning algorithms in Search Relevance and Personalization domains. Rajdeep is keenly interested in application of deep learning techniques in text and images t... Read More →
avatar for Rongkai (Alfred) Zhao

Rongkai (Alfred) Zhao

Software Engineering Manager & Architect, The Home Depot
Rongkai Zhao (Alfred) is a Software Engineering Manager and Architect at The Home Depot where he oversees the research and development of system components in search, personalization, and call center intelligence. He has worked on e-commerce search engine since 2010 and has a wid... Read More →


Thursday September 14, 2017 3:10pm - 3:50pm
South Seas A

3:10pm

Lifecycle of a Solr Search Request
This intermediate session for existing Solr users will provide a Deep Dive look into the lifecycle of a Solr Search Request. We will drill down through each layer of code, discussing what happens at each stage -- including when & how inter-node communication takes place in a multi-node SolrCloud cluster. Along the way, we will also review the various places where users can configure existing (or custom written) plugins to override or amend the default behavior.

Speakers
avatar for Chris Hostetter

Chris Hostetter

Software Engineer, Lucidworks
Chris 'Hoss' Hostetter is a Member of the Apache Software Foundation, and a committer on the Lucene/Solr Project. Prior to joining Lucidworks in 2010 to work full time on Solr development, he spent 11 years as a Principal Software Engineer for CNET Networks thinking about searchi... Read More →


Thursday September 14, 2017 3:10pm - 3:50pm
South Seas D

3:10pm

Search Relevance Organizational Maturity Model
In this talk, Eric Pugh, long-time Solr practitioner discusses his broad experience across hundreds of organizations delivering smart search. He introduces a maturity model to help think through where your team is on it's road to smarter search.

Smarter search drives value to your business. Delivering search that matches users to the right content (jobs, products, articles, whatever) is what you care about. But organizations often get stuck getting there -- why? Turns out you need quite a number of ingredients to deliver tremendous search: You need the intelligence to understand what users are searching for and whether they're satisfied. You need the domain expertise, infrastructure, and data science to extract meaningful features from your content, user personas, and user queries. Well and more mundanely you need to install, scale, and operate a search engine!

All of this can send your head spinning! This talk will give you the tools to see the search forest for the trees. Come and learn from Eric your next steps on the road to delivering great search!

Speakers
avatar for Eric Pugh

Eric Pugh

Solr Guru, OpenSource Connections
Fascinated by the “craft” of software development, Eric Pugh has been heavily involved in the open source world as a developer, committer, and user for the past 5 years. He is an emeritus member of the Apache Software Foundation and lately has been mulling over how we move from t... Read More →


Thursday September 14, 2017 3:10pm - 3:50pm
South Seas C

3:10pm

Solr for Enterprise Channels Business

What matters in current world?

Time
Speed
Performance
Framework

With Solr, the approach is to automate the mechanism in channels business to handle huge sets of data with scalable framework to reduce development lifecycle and auto expose data elements across different technologies (Hadoop, java, solr, oracle).

We are going to discuss this in detail and show how to thread multiple technologies together.


Speakers
avatar for Khalid Imam

Khalid Imam

IT Engineer, Cisco Systems
Khalid Imam has been a technical lead and a Partner Security Architect in Cisco with 5 years of experience in Cisco Channels business. He has been involved in different Cisco technologies like Spark, Cisco specific authentication mechanisms, and has a keen interest in new technol... Read More →
avatar for Srini Samudrala

Srini Samudrala

Senior Architect, Architecture IT, Cisco Systems
Srini Samudrala has been a Cisco Veteran with 18 years of experience in the Cisco Channels business as a Senior Architect. (Architecture.IT). He has been a key member of the organization in ensuring high level of standard with strong architecture components and tying different technologies and processes together to ensure... Read More →


Thursday September 14, 2017 3:10pm - 3:50pm
South Seas B

3:50pm

Break / Happy Hour
Thursday September 14, 2017 3:50pm - 4:10pm
TBA

4:10pm

Lightning Talks
Thursday September 14, 2017 4:10pm - 5:40pm
South Seas Ballroom

6:00pm

Conference Party
Join us for the Lucene/Solr Revolution 2017 Conference Party at Skyfall Lounge at the top of the Delano (Connected to Mandalay Bay) for an evening of networking, music, appetizers, drinks and fun, paired with amazing views of the Las Vegas Strip.

Thursday September 14, 2017 6:00pm - 9:00pm
Skyfall Lounge
 
Friday, September 15
 

8:00am

Breakfast
Friday September 15, 2017 8:00am - 9:00am
TBA

9:00am

Day 2 Opening Remarks
Friday September 15, 2017 9:00am - 9:10am
South Seas Ballroom

9:10am

Fusion and Vegas.com
Find out how Vegas.com, the top destination travel site in the world, uses Fusion-powered full-text search results as a major e-commerce driver.

Speakers
avatar for Paul Mello

Paul Mello

Director of Product Management, Special Projects, Vegas.com
Paul Mello is the Director of Product Management, Special Projects for Vegas.com. He is now in his 18th year at the online travel site. |   | Over his year's there, he has managed teams responsible for mobile, marketing automation, online product merchandising, e-mail marketing, customer experience, third-party vendor selection and e-commerce... Read More →


Friday September 15, 2017 9:10am - 9:30am
South Seas Ballroom

9:30am

The Search for Better Search at Reddit
Speakers
avatar for Luis Bitencourt-Emilio

Luis Bitencourt-Emilio

Senior Director of Engineering, Reddit Intelligence Group, Reddit
Luis Bitencourt-Emilio is the Senior Director of Engineering for the Reddit Intelligence Group (RIG). He leads this new team in building an industry-leading data and AI discipline at Reddit, encompassing our search, relevance, data engineering and anti-evil efforts. Luis was prev... Read More →
avatar for Nick Caldwell

Nick Caldwell

VP of Engineering, Reddit
Nick Caldwell is the VP of Engineering at Reddit where he is responsible for building and operating the 4th most visited site in the US. Prior to joining Reddit, he held various positions in engineering leadership at Microsoft across a 15 year career, including work on natural language processing, enterprise search, machine learning, in-memory databases, and business intelligence. Nick's most significant role at Microsoft was as General Manager for the Power BI where he rapidly transformed the company's business intelligence suite... Read More →
avatar for Chris Slowe

Chris Slowe

CTO, Reddit
Chris is CTO and Founding Engineer of Reddit. Though a software engineer by vocation, his first attempt at a career started with his finishing a PhD in experimental physics at Harvard where he learned about the importance of modeling, critical thinking, statistics, and (honestly... Read More →


Friday September 15, 2017 9:30am - 10:15am
South Seas Ballroom

10:15am

Break
Friday September 15, 2017 10:15am - 10:30am
TBA

10:30am

R to forecast Solr activity
This session is a deep review on Solr performance management and how to set up a scalable Solr infrastructure that will match with future user's activity.

Taking advantage of solr logs and time series functions in R, we can build custom models to analyze Solr activity and highlight periodicity. Using other kinds of R functions, such as exception management, we can highlight activity peak and keep it or not in the predictive model. Using the Association function in R, we can analyze user search behavior (such as cascading search, one search criteria leading to another facet of search).

Once a user's activity is statistically validated (exploration and discovery techniques using Dashboard, Olap, etc.), we can develop custom predictive models in R to forecast user activity and adapt Solr infrastructure to this expected workload.

The last piece of the framework is to use a comparison model between forecast and reality to adjust the custom model and address machine learning points of interest.

Speakers
avatar for Patrick Beaucamp

Patrick Beaucamp

CEO, Bpm-Conseil
Patrick Beaucamp is founder of the Vanilla project, the only true Open Source Business Intelligence Platform, and Chairman of Bpm-Conseil, the company behind the Vanilla project. | | Patrick is a regular speaker at the Open Source Conference to talk about data visualisation, b... Read More →


Friday September 15, 2017 10:30am - 11:10am
South Seas B

10:30am

Learning-to-Rank with Apache Solr and Bees
Is "machine learning + open source search + social insects" a bit of a gamble for a talk topic? Absolutely, this conference is in Las Vegas after all!

Join Lucene/Solr committer and beekeeper Christine for an easy step-by-step walk through Apache Solr's Learning-to-Rank plug-in. No prior Solr or machine learning experience needed. We will index some bees, err, I mean honey and bee related tweets, and then use those documents to come up with different machine learnt re-ranking models to improve search results' relevance.

Speakers
avatar for Christine Poerschke

Christine Poerschke

Software Developer, Bloomberg
Christine is a software developer in the UK. Originally from Germany, she joined Bloomberg in 2004 directly after BSc and PhD time at university and is currently part of the News Search Infrastructure team. Christine led the effort to integrate Bloomberg's multi-author Learning-t... Read More →


Friday September 15, 2017 10:30am - 11:10am
Banyan AB

10:30am

Autoscaling Solr

A large body of work is underway to build autoscaling into Solr with the goal of improving cluster stability and performance as well as to make cluster management simpler and easier for Solr users. The first release of autoscaling features should happen starting with Solr 7. We will go through the various features that help you setup autoscaling for your Solr clusters whether in the cloud or in-house so that:

1. The cluster can show notifications or raise alarms on important cluster events such as a node joining the cluster or leaving the cluster

2. Newly added nodes automatically start sharing the traffic and reduce overall system load across the cluster

3. The indexes hosted on nodes that either die or are decommissioned is automatically shifted to other nodes in the cluster

4. The cluster automatically attempts to reduce average system load or tries to optimize around administrator defined criteria

We shall also discuss the internal design and relevant implementation details as well as the pluggable components around these features so that interested users can customize and extend the autoscaling features as per their own needs.

Speakers
avatar for Shalin Shekhar Mangar

Shalin Shekhar Mangar

Senior Solr Consultant, Lucidworks
Shalin Shekhar Mangar is an Apache Lucene/Solr committer since 2008 as well as a member of the Lucene/Solr project management committee. He worked at AOL for five years on vertical search, content mangement systems, social/community platforms and anti-spam systems as well as AOL... Read More →


Friday September 15, 2017 10:30am - 11:10am
South Seas D

10:30am

New Replica Types: SolrCloud meets Master/Slave Replication
For the majority of the cases, current SolrCloud distributed indexing works great. There is a subset of use cases for which the legacy Master/Slave replication may be a better fit, like cases where NRT is not required, and where read availability is more important than consistency. For such cases we are adding to Solr a way to choose different types of replicas, that treat updates in different ways. With a combination of replica types, one can create a SolrCloud cluster that behaves like a Master/Slave architecture of Solr < 4.0 and provides separation of responsibilities (search vs index) while still getting most of the SolrCloud benefits, like high availability of writes, replica discovery, collections API, etc.

This talk will be a deep dive into the Replica Type feature, reasons for implementing it, differences between the existing types and how/why one would choose to use them, and implementation details of the feature.

Speakers
avatar for Tomás Fernández Löbbe

Tomás Fernández Löbbe

Software Engineer, Apple
Tomás Fernández Löbbe has 10+ years of experience as a software engineer, he is committer and PMC member of Apache Lucene/Solr and a Software Engineer at Apple. Previously, Tomás worked as Senior Software Engineer at AWS, on Amazon CloudSearch and Amazon Elasticsearch services, a... Read More →


Friday September 15, 2017 10:30am - 11:10am
South Seas A

10:30am

Evaluation of Lucidworks Fusion for Enterprise Search at Sandia National Laboratories
In the past, Sandia National Labs has used several COTS products for enterprise search. A few years ago the decision was made to abandon all COTS Enterprise Search products and use Solr and custom in-house developed applications for our enterprise search needs. The enterprise search marketplace has changed since that decision was made and the expectations of our search users have grown dramatically, making it hard for us to keep up with expectations. Lucidworks Fusion looked to us like a viable product that could be used to help us achieve our enterprise search goals at a reasonable cost. We conducted a three month "Proof of Concept" study to evaluate Lucidworks Fusion against our current and projected future requirements. This talk will discuss how we evaluated Lucidworks Fusion, what we looked at in terms of our requirements, and the results of our evaluation.

Speakers
avatar for Clay Pryor

Clay Pryor

Principal Member of Technical Staff, Sandia National Laboratories
Clay is a principal member of technical staff at Sandia National Labs where he has spent over 30 years contributing to many software development efforts ranging from stand-alone disconnected systems to enterprise web applications. For the past two years he has been given the oppo... Read More →


Friday September 15, 2017 10:30am - 11:10am
South Seas C

11:20am

Optimize Is (Not) Bad For You - Deep Dive Into The Segment Merge Abyss
They say optimize is bad for you, they say you shouldn't do it, they say it will invalidate operating system caches and make your system suffer. This is all true, but is it true in all cases? In this talk we will look closer on what optimize or better called force merge does to your Solr search engine. You will learn what segments are, how they are built and how they are used by Lucene and Solr for searching. We will discuss real-life performance implications regarding Solr collections that have many segments on a single node and compare that to the Solr where the number of segments is moderate and low. We will see what we can do to tune the merging process to trade off indexing performance for better query performance and what pitfalls are there waiting for us. Finally, at the end of the talk we will discuss possibilities of running force merge to avoid system disruption and still benefit from query performance boost that single segment index provides.

Speakers
avatar for Rafał Kuć

Rafał Kuć

Engineer, Sematext Group, Inc
Rafał Kuć is a search consultant, trainer and software engineer at Sematext Group, Inc. mainly focused on Lucene, Solr, Elasticsearch and all tools around in the ecosystem. Rafał is the author of the Apache Solr Cookbook series and Elasticsearch Server. He is a father, a consu... Read More →


Friday September 15, 2017 11:20am - 12:00pm
South Seas B

11:20am

Faster Data Analytics with Apache Spark using Apache Solr
Apache Spark is a fast and general engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing. Spark SQL allows users to execute relation queries in Spark with distributed in-memory computations. Though Spark gives us faster in-memory computations, Solr is blazing fast for some analytic queries. In this talk, we will take a deep dive into how to optimize the SQL queries from Spark to Solr by plugging into the Spark LogicalPlanner using pushdown strategies. The key take aways from the talk will be:

How to perform Spark SQL queries with Apache Solr?
What happens inside a Spark SQL query?
How to plug into Spark Logical Planner?
What type of push-down strategies are optimal with Solr?
Examples of push-down strategies

Speakers
avatar for Kiran Chiturri

Kiran Chiturri

Data Engineer, Lucidworks
Kiran Chitturi is a software developer at Lucidworks. He works on Lucidworks enterprise product Fusion and currently leads the development for spark-solr (https://github.com/LucidWorks/spark-solr). He is part of the Smart Data team at Lucidworks working on Analytics features for... Read More →


Friday September 15, 2017 11:20am - 12:00pm
South Seas A

11:20am

Vector Embeddings for Solr
In this session, we will go over various ways nonlinear vector embeddings can be used for search:

* Word2Vec and GloVe learn a "conceptual" vector space of lower dimension than the input vocabulary, and as such can be used in IR in similar ways to other dimensional reduction techniques such as LSI, pLSI, and LDA.
* Doc2Vec can learn better vectors for documents larger than a single sentence, and can also learn supervised class labels at the same time, to create a classifier for both documents and queries
* Shallow neural net-based embeddings trained on click data as a supervisory signal can learn a joint model over queries and document snippets to perform learning to rank.

We will demonstrate how to train all three of these models with open source tools and integrate them with a Solr-based search engine.

Speakers
avatar for Jacob Mannix

Jacob Mannix

Lead Data Engineer, Lucidworks
Living in the intersection of search, recommender-systems, and applied machine learning, with an eye for horizontal scalability and distributed systems. Currently Lead Data Engineer in the Office of the CTO at Lucidworks, doing research and development of data-driven applications on Lucene/Solr and Spark. | | Previously built out... Read More →


Friday September 15, 2017 11:20am - 12:00pm
Banyan AB

11:20am

Search LIKE %SQL%
Sometimes customers ask to search for substring occurrence LIKE %SQL%, completely ignoring the idea of keyword search.

We can also face this challenge in a chemical corpus and bioinformatic space.

Searching LIKE %SQL% is surprisingly hard in search engines. During the session, we'll look at the data structures behind Lucene index, and discuss what makes such search so heavy. Then, we describe common, but inefficient techniques like edge N-gramming and reversing. Finally, we'll look how to address it with the built-in algorithms, reducing customization as possible.

Note: this talk is not about introducing suffix arrays, and has nothing with the recent Solr SQL functionality.

Speakers
avatar for Mikhail Khludnev

Mikhail Khludnev

Chief Engineer in Search, EPAM
Mikhail is a chief software engineer in Epam Search Competency Center, where he helps customers with terabyte indices and above that. He worked in eCommerce search for many years mostly focusing on handling relations in indices with joins and aiming a great relevancy with concept... Read More →


Friday September 15, 2017 11:20am - 12:00pm
South Seas D

11:20am

Indexing Videos in Solr
FindLectures.com is a discovery engine for tech talks, historic speeches, and academic lectures. The site rates audio and video content for quality, showing different recommended talks each day on a variety of topics.

FindLectures.com crawls conference sites to get talk metadata, such as speaker names and bios, descriptions, and the date a video was recorded. Often these attributes are sparsely populated, or available across multiple websites. Additional attributes are inferred from audio and video content, but require more sophisticated data extraction to be useful in a text- oriented search engine like Solr.

This talk will discuss interesting lessons learned from crawling historical videos, demonstrate information extraction with machine learning, and show how to map real world problems to search engine functionality.

Speakers
avatar for Gary Sieling

Gary Sieling

Software Architect, Wingspan Technology
Gary Sieling is a Software Architect at Wingspan Technology, in Blue Bell, PA, with an interests in database technologies and software engineering practices. He is involved in curating talks for a company lunch and learn program and on the organizing committee for a Philadelphia... Read More →


Friday September 15, 2017 11:20am - 12:00pm
South Seas C

12:00pm

Lunch
Friday September 15, 2017 12:00pm - 1:00pm
TBA

1:10pm

Relevance in the Wild
Let's talk relevance. The one aspect of search that users perceive directly and will determine their satisfaction with search.

Solr allows us to steer every aspect of the calculation of the score to decide in what order we are getting results. There are many aspects that can be taken into account when determining which is the most relevant document for a user specific need. Moreover, the most important aspect is to go only forward in steadily improving relevance, and being sure of every step by having the proper toolbox to evaluate results.

In this talk we are going to describe a long list of examples used in dozens of real projects that use many types of signals from the user, documents and apply them using the wide range of features available in Solr. We will define relevance testing, a way of working with it that guarantees control about the state of relevance, and quantitative evaluation of its results. At the end of the presentation you will have the tools and methods to know what information to use and how to use it to get the relevant information to the top.

Speakers
avatar for Daniel Gomez Villanueva

Daniel Gomez Villanueva

IT Consultant / Findability Expert, Findwise
Daniel graduated in Computer Science at UPM, security master at KTH and took many information retrieval courses. he has been using Solr and many search technologies for the last 5 years to improve the access, visualization and navigation of information in companies in Scandinavia... Read More →


Friday September 15, 2017 1:10pm - 1:50pm
South Seas B

1:10pm

Exploring Direct Concept Search
This session will present experiments in extending Lucene's Dimensional Points to directly store word embedding vectors as document terms, enabling direct concept search by mapping terms and phrases at index and query time. The effects of varying distance in the word embedding space at query time will be explored. Impacts on retrieval effectiveness and speed will be presented.

Speakers
avatar for Steve Rowe

Steve Rowe

Senior Software Engineer, Lucidworks
Steve Rowe is a Member of the Apache Software Foundation, and a committer and PMC member on the Lucene/Solr Project. Prior to joining Lucidworks in 2012, he spent 10 years working on NLP as a Research Software Engineer at the Center for Natural Language Processing at Syracuse Uni... Read More →


Friday September 15, 2017 1:10pm - 1:50pm
Banyan AB

1:10pm

A Multifaceted Look At Faceting - Using Facets 'Under the Hood' to Facilitate Relevant Search
The talk will discuss some novel techniques that reveal additional ways by which Solr facets can enrich search experiences. Facets are a tried-and-true UI navigation and visualization tool. They are also used to build slick dashboards with bar and pie charts, using pivot facets, range and function queries, facet statistics and so forth. That Solr facets are determined at query time is crucial in this respect! Facet metadata can also be used to drive query intent detection that borders on NLP, and to create contextual sidecar indexes for semantically based, multi-field query typeaheads with real-time suggestion boosting. They can also be used to explore term relatedness and to develop subject classifiers based on "keyword clusters" that provide excellent unsupervised machine learning capabilities. So facets have a role not just at query time, but can also help to drive the indexing analytic processes that fuel content and thus user experience enrichment - by feeding semantic contexts revealed by facets back into sidecar collections or the source collections that they came from. One result is - you guessed it - BETTER FACETS - a "virtuous" cycle to be sure!

Speakers
avatar for Ted Sullivan

Ted Sullivan

Senior Solutions Architect, Lucidworks
Ted is a Senior Solutions Architect at Lucidworks with almost 20 years experience in Search including the "Vendor Engines" like Verity and Fast. He has contributed a number of blogs on Solr and Fusion over the past few years on topics such as Autophrasing, Query Autofiltering, Te... Read More →


Friday September 15, 2017 1:10pm - 1:50pm
South Seas D

1:10pm

Fusion Ecommerce Case Study: Bluestem Brands Inc.

Over the last 18 months, in partnership with Lucidworks, Bluestem Brands has been treated to a whirlwind tour of search engine modernization. Join us as we explore the highs and the lows of a rapidly exploding index size, strict performance goals, well-meaning work arounds, a Fusion implementation, the holiday sales season, and the business transformations that Fusion backed functionality has since motivated.  From problems to pipelines, sales to servers, and couches to sharks; we’re going to cover it all.


Speakers
JW

Jacob Wagner

Director IT - Content, Bluestem Brands Inc.
Jacob is a proud father of three, a maker, a musician, and an IT Director at Bluestem Brands Inc – a multi-branded retail company specializing in Ecommerce and Catalog driven Direct to Consumer sales.


Friday September 15, 2017 1:10pm - 1:50pm
South Seas C

1:10pm

Fusion Use Cases at Morgan Stanley Wealth Management
Two years ago, Morgan Stanley Technology decided to use Fusion/Solr as the Analytics Datastore. In his talk, Gyan will cover the current implemented and future use cases for Fusion, the feedback and learnings so far, and why they chose Fusion.

Speakers
avatar for Gyanendra Singh

Gyanendra Singh

Executive Director, Morgan Stanley Technology
Gyan is Executive Director, Channel Analytics Solutions of Morgan Stanley Technology Division, responsible for Next-Best Offer/Action, Insights, Book Analysis and Search platform. For several years now, Gyan has pioneered the use of Hadoop and Fusion/Solr to solve Analytics and S... Read More →


Friday September 15, 2017 1:10pm - 1:50pm
South Seas A

2:00pm

Apache Solr: Upgrading Your Upgrade Experience
Despite widespread enterprise adoption, Solr lacks automated upgrade tooling. It has long been a challenge for users to understand the implications of a Solr upgrade. Users must manually review the Solr release notes to identify configuration changes either to fix backwards incompatibilities or to utilize latest features in the new version. Additionally, users must identify a way to migrate existing index data to the new version (either via an index upgrade or re-indexing the raw data). Clearly, the Solr upgrade process can be cumbersome and error-prone.

In this talk, we will provide an overview of the typical challenges faced by users during a Solr upgrade. We will discuss a strategy that uses a set of config migration tools, as well as the backup and disaster recovery capability to help users navigate the Solr upgrade process reliably and with peace of mind. Finally, we will share common tips and tricks to remember while planning a Solr upgrade.

Speakers
avatar for Hrishikesh Gadre

Hrishikesh Gadre

Software Engineer, Cloudera Inc.
Hrishikesh Gadre is a software engineer at Cloudera working on Cloudera Search and a contributor to Apache Lucene/Solr, Apache Hadoop and Apache Sentry projects. Prior to Cloudera, Hrishikesh worked for virtualization giant VMware building next-generation network/security virtualization platform. He has a... Read More →


Friday September 15, 2017 2:00pm - 2:40pm
South Seas B

2:00pm

Running Solr at Memory Speed with Alluxio
In this talk, I introduce Alluxio, the fastest growing open source project in the big data ecosystem, and show how to leverage it for optimizing Solr performance. I'll begin with a brief introduction about how Alluxio works and why it's interesting for the Solr community. Next, I describe how to run Solr on Alluxio and cover basic integration scenarios. Lastly, I provide some performance comparisons between running Solr on Alluxio vs. a local FS and HDFS. Attendees will come away with a new toolset to help them use Solr to tackle a wide array of big data problems.

Speakers
avatar for Timothy Potter

Timothy Potter

Solr committer & PMC member, Lucidworks
Timothy is a Lucene/Solr committer, PMC member, and senior engineer at Lucidworks where he leads the analytics framework team for Fusion. His current focus is on integrating Solr and Spark for large-scale analytics use cases. Prior to joining Lucidworks, Timothy was a big data ar... Read More →


Friday September 15, 2017 2:00pm - 2:40pm
South Seas A

2:00pm

Solr and Machine Vision
Facial recognition in production is difficult because neural networks are slow and expensive to train, and must be retrained to recognize new faces added to the set. Older approaches which address these issues such as eigenfaces exist don’t scale as they require a matrix decomposition. Apache Mahout offers a distributed singular value decomposition method, which scales to matrices of arbitrary sizes on Apache Spark, making it possible to use the older yet still powerful Eigenfaces approach to recognize and add new faces in near real time (with the help of Solr).

In this talk we present a full stack lambda-style facial recognition system. The offline component uses Apache Mahout to compute the eigenfaces. The online component identifies faces in an image with an interchangeable module, decomposes the face into a linear combination of the eigenfaces, searches for a matching face using SOLR, and if no match is found adds the face as a “new face”.

Speakers
SC

Scott Cote

Senior Software Engineer, Lucidworks
avatar for Trevor Grant

Trevor Grant

Open Source Technical Evangelist, IBM
Trevor Grant is PMC Member on the Apache Mahout project and PPMC on Apache Streams (incubating).  By day he is an Open Source Technical Evangelist at IBM. In former roles he called himself a data scientist, but the term is so overused these days. He holds an MS in Applied Math an... Read More →


Friday September 15, 2017 2:00pm - 2:40pm
Banyan AB

2:00pm

Taxonomical Semantical Magical Search
Search practitioners often overlook the user's entire journey to seek and find. Users strike out with broad searches, unsure what they'll find. They begin with broader concepts (such as 'laptop bag'). The results give them an overview of what's possible. They refine with finer grained distinctions. 'Childs Laptop bag' or 'satchel bag.' They continue to refine to narrower or adjacent concepts until they purchase or give up.

In this talk, I walk through how we build semantic search with Solr based on how users mentally structure your information. They key is taxonomies! I walk through how we generate taxonomies from search logs to build hierarchical synonyms, hypernyms, and hyponyms. I then discuss how manipulate relevance scoring to get the effect user's expect: high recall and a broad overview on broad queries and high precision on narrower queries. I go on to discuss a practice for refining managed taxonomies based on evolving user behavior, ever evolving to findability nirvana!

Speakers
avatar for Doug Turnbull

Doug Turnbull

Search Relevance Consultant, OpenSource Connections
Doug Turnbull is author of [Relevant Search](http://manning.com/books/relevant-search) and the Lead Relevance Consultant at OpenSource Connections. Doug builds smarter (and more profitable!) search for organizations like O'Reilly Media, Careerbuilder, and the US Patent and Tradem... Read More →


Friday September 15, 2017 2:00pm - 2:40pm
South Seas D

2:00pm

Real Time Indexing Pipeline
In the session, we will explain Trulia's (Zillow Group) search infrastructure architecture which supports real time property updates with all changes visible to end users (renters or sellers) in less than 5 minutes. We will cover the technical architecture of the "Real Time Indexing Pipeline", which is built using Solr Cloud, Lucene, Storm, Kafka, Redis, and micro-services all hosted on AWS. This talk will explore how we achieved and built a scalable search infrastructure and pipeline and how we eliminated any cache at search to avoid a delay in data visibility to users. We will also discuss how we made deployment easy and fast to expand and scale.

Speakers
avatar for Girish Gudla

Girish Gudla

Senior Software Engineer, Zillow Group
Girish is a Senior Developer who graduated from ASU before joining Trulia Search team. He works on building Cloud Search Infrastructure, data pipelines & scalable services powering different products.
avatar for Ashwani Kapoor

Ashwani Kapoor

Senior Software Engineer, Zillow Group
It is Ashwani's passion to play with data and to create distributed/scalable search/big data solutions. Currently working as Senior Search Engineer at Zillow Group and work/manage search infrastructure to lists properties for sale and rent as well as tools and information needed... Read More →


Friday September 15, 2017 2:00pm - 2:40pm
South Seas C

2:50pm

Securing Solr: Tips and Tricks and Things You Really Ought to Know
As of release 5.2, Solr comes out-of-the-box with both authentication and authorization APIs, allowing you to define users, roles and permissions, using the RuleBasedAuthorizationPlugin and the BasicAuthPlugin. That's the good news. The not-as-good-news is that these plug-ins, while powerful, are a bit counter-intuitive when it comes to configuration. Thus we took it upon ourselves to spend some quality time with the Solr security architecture and understand a) just how this framework operates; and b) to identify it's various idiosyncrasies. To that end, we've compiled a handy list of things to keep in mind when setting up and/or managing your Solr security.

Speakers
avatar for Kevin Cowan

Kevin Cowan

Search Engineer, Lucidworks
Despite being an English/Sociology Major in college, I have actually worked in technology most of my life. I began writing programs in BASIC in the 70's and 80's, and then started writing in HTML and JavaScript in the 90s. I started writing enterprise-class software in 1998, and... Read More →
SH

Steve Harris

Lucidworks


Friday September 15, 2017 2:50pm - 3:30pm
South Seas B

2:50pm

Solr on Docker: the Good, the Bad and the Ugly
This session has two goals: first, we'll discuss the tradeoffs for running Solr on Docker. For example, you get dynamic allocation of operating system caches, but you also get some CPU overhead. We'll keep in mind that Solr nodes tend to be different than your average container: Solr is usually long running, takes quite some RSS and a lot of virtual memory. This will imply, for example, that it makes more sense to use Docker on big physical boxes than on configurable-size VMs (like Amazon EC2).

The second goal is to discuss issues with deploying Solr on Docker and how to work around them. For example, many older (and some of the newer) combinations of Docker, Linux Kernel and JVM have memory leaks. We'll go over Docker operations best practices, such as using container limits to cap memory usage and prevent the host OOM killer from terminating a memory-consuming process - usually a Solr node. Or running Docker in Swarm mode over multiple smaller boxes to limit the spread of a single issue.

Speakers
avatar for Radu Gheorghe

Radu Gheorghe

Software Engineer, Sematext Group, Inc.
Radu Gheorghe is a search consultant, software engineer and trainer at Sematext, working mainly with Elasticsearch, Solr and logging-related projects.


Friday September 15, 2017 2:50pm - 3:30pm
South Seas A

2:50pm

An Intelligent, Personalized Information Retrieval Environment
Current enterprise search engines return prioritized results to users based on the engines’ internal ranking algorithms. The unique attributes of the user and documents are not taken into account. To improve users’ information retrieval experience, we are creating an environment that allows users to retrieve information that best matches their personal, time-sensitive needs.
To achieve this level of personalization, analysis of documents published by members of the work force is conducted utilizing clustering and classification algorithms. By combining users' past information retrieval behavior with document metadata generated by our analytic techniques, we can build predictive models. These models are able to predict information needed by specific groups of users and recommend appropriate content. We utilize state-of-the-art technologies, including Spark machine learning in a Hadoop environment and Convolutional Neural Networks, a deep learning architecture, to extract useful features from a large corpus of unstructured data. In addition, we developed and improved several machine learning algorithms in clustering, classification and auto labeling.

Building profiles of user's information retrieval activity required the development of an extensive query and click tracking facility. In order to achieve a highly integrated information retrieval environment, we are replacing our home grown query and click tracking database with the Lucidworks Fusion's signal capabilities. We will discuss how we approached the task of migrating from an internally developed logging system to the Fusion platform.

Speakers
avatar for John Herzer

John Herzer

Enterprise Search Project Manager, Sandia National Laboratories
John Herzer leads the enterprise search analytics effort at Sandia National Laboratories.  He was instrumental in achieving the migration to open source search technology at Sandia and is guiding the effort to incorporate machine learning techniques to improve search results.  Be... Read More →
avatar for Pengchu Zhang

Pengchu Zhang

Computer Science Researcher & Developer, Sandia National Laboratories
Pengchu Zhang has more than ten years of experience in developing methods to improve enterprise information findabilities, most in unstructured data with various machine learning technologies. Recently, he's focused on creating an information retrieval environment in organization... Read More →


Friday September 15, 2017 2:50pm - 3:30pm
Banyan AB

2:50pm

The Evolution of Solr Streaming Expressions: from Stream Processing to Distributed Functional Programming
Streaming Expressions started from a few simple concepts:

* Stream sources that originate streams
* Stream decorators that transform streams
* A simple functional syntax to tie it all together

Streaming Expressions now have over 80 expressions and evaluators, conditional logic, variables and data structures. These functions form the basis of a sophisticated functional programming language that supports a large number of parallel computing use cases including: Parallel SQL, MapReduce, Machine Learning, Anomaly Detection, Streaming NLP, Graph Traversal and Time Series Analysis.

This talk will cover the evolution of the language to date and where Streaming Expressions will likely be headed in the future.

Speakers
avatar for Joel Bernstein

Joel Bernstein

Search Engineer, Alfresco
Joel Bernstein is a Lucene/Solr committer and PMC member and search engineer for Alfresco.
avatar for Dennis Gove

Dennis Gove

Senior Software Engineer, Bloomberg LP
Dennis Gove is a member of the Search Infrastructure team at Bloomberg LP in New York. He is a Lucene/Solr Committer and lives in Massachusetts with his wife and two kids.


Friday September 15, 2017 2:50pm - 3:30pm
South Seas D

2:50pm

Event Search at Stubhub
This session presents event search at Stubhub. Event search at Stubhub has significantly evolved over the last few years and in this talk, we will go over how we use machine learning to assist users in finding & selecting the right event. Specifically, we will go over some of the NLP, relevancy and ranking algorithms that we have built on top of SolrCloud and also how we use SolrCloud to power our ChatBot and near-real time indexing platform. Using a combination of Solr and machine learning, we are able to connect users with inspiring live events and hence improve conversion.

Speakers
avatar for Mayank Gupta

Mayank Gupta

Lead Product Manager, StubHub
Mayank is a Product Manager and leads Search and Browse experience teams at Stubhub . Previously, Mayank worked at Williams-Sonoma where he led the development of new Search engine which resulted in improved customer experience and conversion. Prior to that Mayank worked for a Ga... Read More →
avatar for Gopal Patwa

Gopal Patwa

Engineering Manager, Search & Discovery, StubHub
Gopal Patwa is an engineering manager for Search and Discovery team at StubHub. He has 16 years of experience in developing Java applications using open-source technology and 7 years in information retrieval. At StubHub, he has design and architect distributed systems for process... Read More →


Friday September 15, 2017 2:50pm - 3:30pm
South Seas C

3:30pm

Break
Friday September 15, 2017 3:30pm - 3:50pm
TBA

3:50pm

Closing Remarks with Grant Ingersoll

10 years of Lucidworks and Solr: Lessons Learned and a Look Forward

10 years.  It’s a long time in the business world and an even longer time, it seems, in the search world.  During that time, Lucidworks, Solr and search in general have seen a massive shift from keywords and ten blue links to advanced search analytics, machine learning and the ever promised, never quite there, hype of AI.  As we wrap up another Lucene/Solr Revolution, Grant will take a look at some of the highlights and key lessons learned using Solr during the past 10 years of Lucidworks and, more importantly, take a look at some of the key technical challenges facing the Solr community as we move forward.

**Prize winner for the App Game will be announced during this session. Winner must be present to claim prize!!**


Speakers
avatar for Grant Ingersoll

Grant Ingersoll

CTO, Lucidworks
Grant Ingersoll is a Solr/Lucene committer and CTO of Lucidworks. Grant is co-founder of the Mahout machine learning project, and a longstanding member of the Apache Software Foundation. Grant is also the co-author of Taming Text from Manning Publications.


Friday September 15, 2017 3:50pm - 4:30pm
South Seas Ballroom