EBay Shopbot Conversational AI Recommendation System - SYMPOSIUM - Big Data & Business Analytics
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
eBay Shopbot Conversational AI Recommendation System Getting Your Transformation Right SYMPOSIUM March 22-23, 2018 5th Annual Big Data & Business Analytics Symposium – March 22-23, 2018
eBay Shopbot Presentation Origin Story Presentation based on “eBay ShopBot: Graph- powered Conversational Commerce” Ajinkya Kale, Senior Applied Researcher, AI, New Product Development, eBay, Anju Vasta, New Product Development, eBay eBay ShopBot: Graph-powered Conversational Commerce, GraphConnect (2017, Oct. 23-24) [Video]. 5th Annual Big Data & Business Analytics Symposium – March 22-23, 2018 2
eBay Marketplace at a Glance One of the world’s largest and most vibrant marketplaces Takeaway: Scale, $20B 1.1B 80% Items sold as Volume, Variety, GMV Live listings new and Velocity Matters 67% 12M 60% Transactions New listings Platform that ship for added via GMV free (US, UK, DE) mobile per touched by week mobile Data as of Q1 2017 5th Annual Big Data & Business Analytics Symposium – March 22-23, 2018 3
eBay Velocity and Variety 2017 Fun Facts: Velocity Frequency of product purchases via desktop and mobile Stats by Region Takeaway: Behavior is complex US UK A tool is purchased every 11 sec A makeup product is purchased every 3 sec A smartphone is purchased every 5 sec A car part is purchased every 1 sec A watch is purchased every 4 sec An appliance is purchased ever 8 sec DE AU A tire is purchased every 16 sec A wedding item is purchased every 26 sec A tablet is purchased every 3 sec A home decor item is purchased every 14 sec A Lego is purchased every 18 sec A car or truck part is purchased every 4 sec 5th Annual Big Data & Business Analytics Symposium – March 22-23, 2018 4
eBay Shopbot – Natural Language Understanding NLU is not a rule based search engine – Personalization is K EY! Personal Powered Developed Shopping By AI at eBay’s Assistant NPD Group Conversational eBay’s goal is Shopbot Commerce to be as close launched in bridges the gap as possible 2016 on between using AI tech facebook stateless such as Natural messenger search engine Language platform. Now and a Understanding, also available shopper’s Knowledge thru Google actual intent Graphs and Home Computer Vision 5th Annual Big Data & Business Analytics Symposium – March 22-23, 2018 5
eBay Shopbot Why – did eBay build their own Shopbot? Why - is it hard to build a Shopbot • 3rd Party Bot Frameworks Most production chatbots have curated dialog flows Tuned for general purpose bots • It’s easy for restricted domains like flight booking bots o Intent detection (e.g. weather, flight schedules,…) • With 20k+ categories of products with 150K+ attributes, o Entity extraction (e.g. number, temperature,…) its hard to get right and scale! Limited non-linear conversation support • Need a solution which scale for dialog conversations on Coarse grained bot memory thousands of products No inherent gain with experience • Different interaction / input formats such as voice and Existing offerings in various states of maturity o Scalability (data size, complexity, and speed) image o Capabilities, API’s, tools, extensibility… • Speed and effectiness at (any) scale is required by users What – is eBay trying to solve with Shopbot? • Conversational Commerce not search Why - is Shopbot important to eBay • Dialog systems • Unreasonable Effectiveness of Data – Halevy,Norvig,Pereria • What’s the next best question to ask • eBay has seen “almost” all queries that are important • How do we build the learning system that humans • No production system runs dialogs just based on Deep gain inherently with experience Learning models • Can collaborative inferencing be used In other terms… To sell more stuff! 5th Annual Big Data & Business Analytics Symposium – March 22-23, 2018 6
Natural Language Understanding 5th Annual Big Data & Business Analytics Symposium – March 22-23, 2018 7
eBay Shopbot How eBay addressed the need for continuous NLU Shopbot functionality ● Build a probabilistic inference graph o Data comes from eBay.com user behavioral patterns and other sources o When a user is looking for a “running shoes” what’s the most important aspect of the shoe that they care about (maybe NLU style query flow here with next questions) ● Enter GraphDB (Neo4j) for eBay Shopbot knowledge cache o Tried RDBMS o Needed query / speed / schema flexibility / production system capabilities provided by the Neo4j property graph database technology and tools 5th Annual Big Data & Business Analytics Symposium – March 22-23, 2018 8
eBay ShopKnowledge Workflow at a Glance Product Knowledge Query Understanding Trends Apache Airflow Price Knowledge Prediction Graph Aspect Reco Entity Extraction Data Sources Microservices 9 5th Annual Big Data & Business Analytics Symposium – March 22-23, 2018 9
eBay Data Not just com m erce data Data Sources Graph DB Characteristics and High Level Functionality ● Latest wikidata stats ● > 0.5B nodes ● Items / txn items ● > 16B relationships ● Buyer behavioral data… ● Functionality ● WiW – What’s It Worth o Probabilistic graphical models ● And… o Supervised and semi-supervised models 10 5th Annual Big Data & Business Analytics Symposium – March 22-23, 2018 10
Under the Shopbot hood ... ● Probabilistic inferencing on behavioral patterns from past users ● Almost like Expert system navigation o Parse, categorize, group, contextualize natural language ● Easy to combine World Knowledge to augment user behavior data with external datasets: o Curation to augment data o Machine learning models to augment data ● ML model as a cache in Neo4j 5th Annual Big Data & Business Analytics Symposium – March 22-23, 2018 11
Inferencing From the Graph How to ask the “nex t best question” Color Red Women’s Ask for shoes by color – Shoes likely to be a woman or man? Blue Basis to “ask the next shoes best question” Coach e.g. F: Brand question / Men’s suggestion Shoes M: Color question / ALDO suggestion Brand 5th Annual Big Data & Business Analytics Symposium – March 22-23, 2018 13
An expanded look at part of a Knowledge Graph K now ledge Graph Encapsulates Shopping Behavior 5th Annual Big Data & Business Analytics Symposium – March 22-23, 2018 14
An expanded look at part of a Knowledge Graph Look ed sim ple. W hy need queries and algorithm s? Bags 5th Annual Big Data & Business Analytics Symposium – March 22-23, 2018 15
An expanded look at part of a Knowledge Graph W hat’s the ”nex t best question” (a.k .a. query the graph) Handbag Men’s Backpack 5th Annual Big Data & Business Analytics Symposium – March 22-23, 2018 16
Transfer Learning From Interaction Ex am ple Product Conversation Category: Athletic Shoes Product: Eggplant Foamposite Sneakers Brand: Nike Style: Basketball Shoes Release Date: 2009 I am looking for the eggplant foamposites. Color: Material: Foamposite Purple Image Source: https://www.sneakershouts.com/sneaker-deals/2017/8/4/nike-air-foamposite-one-eggplant-under-retail 5th Annual Big Data & Business Analytics Symposium – March 22-23, 2018 17
Which one would you pick? ”K now ledge” added to graph I am looking for the eggplant iphone case. Color: Purple Product: Phone Case Data mine from the graph that eggplant == purple Derived meaning from data (unsupervised) 5th Annual Big Data & Business Analytics Symposium – March 22-23, 2018 18
Why Neo4j ?! ● Graph database is the right choice to store Knowledge Graphs ● Neo4j is battle tested and fast! ● New data sources are easy to integrate to extend Graph inferencing ability ● Great set of support tools, to name a few – ○ Interactive browser * ○ In-Database Procedures ○ Bulk imports ○ Graph-algorithms *Walter 1st tried to build with RDBMS 5th Annual Big Data & Business Analytics Symposium – March 22-23, 2018 19
Graphs can do some amazing things W alter: “Say data joins one m ore tim e” eBay “initial tried RDBMS for knowledge graph”. What happened? Find all direct reports and how many people they manage, up to 3 levels down SQL Query Graph DB Query (using Cypher Query Language) MATCH (sub)-[:REPORTS_TO*0..3]->(boss), (report)-[:REPORTS_TO*1..3]->(sub) WHERE boss.name = “John Doe” RETURN sub.name AS Subordinate, count(report) AS Total Graph database queries can do amazing things fast, but graph algorithms add a whole new level of capability… 5th Annual Big Data & Business Analytics Symposium – March 22-23, 2018 20
eBay – Graphs are the future for AI… … and recommendation engines, and … ● Business logic and interference, and “creative juices” in graph. ● Interactive search ● Search science pieces as graph ● Everything except backend source systems (origin indexes, etc.) ● Semi-supervised techniques like label propagation 5th Annual Big Data & Business Analytics Symposium – March 22-23, 2018 21
Neo4j – Some Graph Algorithms Path Finding and Traversal Algorithm s Algorithm Type What It Does Example Uses Parallel Breadth - Traverses a tree data structure by fanning out to explore the nearest BFS can be used to locate neighbor nodes in peer-to-peer networks First Search (BFS) neighbors and then their sub- level neighbors. It’s used to locate like BitTorrent, GPS systems to pinpoint nearby locations and social connections and is a precursor network services to find people within a specific distance. to many other algorithms. BFS is preferred when the tree is less balanced or the Find the shortest target is closer to the starting point. It can also be used to find the shortest path path or evaluate the between nodes or avoid recursive processes of DFS. availability and quality Parallel Depth- First Traverses a tree data structure by exploring as far as possible down each DFS is often used in gaming simulations where each choice or of routes. Search (DFS) branch before backtracking. It’s used on deeply hierarchical data and is a action leads to another, expanding into a tree-shaped graph of precursor to many other algorithms. DFS is preferred when the tree is more possibilities. It will traverse the choice tree until it discovers an balanced or the target is closer to an endpoint. optimal solution path (e.g., win). Single-Source Calculates a path between a node and all other nodes whose summed value Single-Source Shortest Path is often applied to automatically Shortest Path (weight of relationships such as cost, distance, time or capacity) to all other obtain directions between physical locations, such as driving nodes are minimal. directions via Google Maps. It’s also essential in logical routing such as telephone call routing (least cost routing). All-Pairs Shortest Calculates a shortest path forest (group) containing all shortest paths between All-Pairs Shortest Path can be used to evaluate alternate routes for Path the nodes in the graph. situations such as a freeway backup or network capacity. Commonly used for understanding alternate routing when the shortest route is It’s also key in logical routing to offer multiple paths, for example, blocked or becomes suboptimal. call routing alternatives. Minimum Weight Calculates the paths along a connected tree structure with the smallest value MWST is widely used for network designs: least cost logical or Spanning Tree (weight of the relationship such as cost, time or physical routing such as laying cable, fastest garbage collection (MWST) capacity) associated with visiting all nodes in the tree. It’s also employed to routes, capacity for water systems, efficient circuit designs and approximate some NP-hard problems such as the traveling salesman problem much more. It also has real-time applications with rolling and randomized or iterative rounding. optimizations such as processes in a chemical refinery or driving route corrections. 5th Annual Big Data & Business Analytics Symposium – March 22-23, 2018 22
Neo4j – Some Graph Algorithms Centrality Algorithm s Algorithm Type What It Does Example Uses PageRank Estimates a current node’s importance from PageRank is used in quite a few ways to estimate importance and influence. It’s its linked neighbors and then again from their used to suggest Twitter accounts to follow and for general sentiment analysis. neighbors. A node’s rank is derived from the PageRank is also used in machine learning to identify the most influential number and quality of features for extraction. Determine the its transitive links to estimate influence. importance of In biology, it’s been used to identify which species extinctions within a food web Although popularized by Google, it’s widely distinct nodes in would lead to biggest chain- reaction of species death. recognized as a way of detecting influential a network of nodes in any network. connected data. Degree Measures the number of relationships a Degree Centrality looks at immediate connectedness for uses such as evaluating Centrality node (or an entire graph) has. It’s broken the near-term risk of a person catching a virus or hearing information. In social into indegree (flowing in) and outdegree studies, indegree of friendship can be used to estimate popularity and (flowing out) where relationships are directed. outdegree as gregariousness. Closeness Measures how central a node is to all its Closeness centrality is applicable in a number of resources, communication and Centrality neighbors within its cluster. Nodes with the behavioral analysis, especially when interaction speed is significant. shortest paths to all other nodes are It has been used to identifying the best location of new public services for assumed to be able to reach the entire maximum accessibility. In social analysis, it can be used to find people with the group the fastest. ideal social network location for faster dissemination of information. Betweenness Measures the number of shortest paths Betweenness Centrality applies to a wide range of problems in network science Centrality (first found with BFS) that pass through a and can be used to pinpoint bottlenecks or likely attack targets in communication node. Nodes that most frequently lie on and transportation networks. shortest paths have higher betweenness In genomics, it has been used to understand the control certain genes have in centrality scores and are the bridges protein networks for improvements such as better drug- disease targeting. between different clusters. It is often Betweenness Centrality has also be used to evaluate information flows between associated with the control over the flow of multiplayer online gamers and expertise sharing communities of physicians. resources and information. 5th Annual Big Data & Business Analytics Symposium – March 22-23, 2018 23
Neo4j – Some Graph Algorithms Community Detection Algorithms (a.k.a. clustering and partitioning) Algorithm Type What It Does Example Uses Label Propagation Spreads labels based on neighborhood majorities as a Label Propagation has diverse applications from understanding consensus means of inferring clusters. This extremely fast graph formation in social communities to identifying sets of proteins that are partitioning requires involved together in a process (functional modules) for biochemical little prior information and is widely used in large-scale networks. Evaluate how a networks for community detection. It’s a key method for It’s also used in semi- and unsupervised machine learning as an initial group is clustered understanding the organization of a graph and is often a preprocessing step. or partitioned, as primary step in other analysis. well as its tendency Strongly Connected Locates groups of nodes where each node is reachable Strongly Connected is often used to enable running other algorithms to strengthen or from every other node in the same group following the independently on an identified cluster. As a preprocessing step for directed break apart. direction of relationships. It’s often applied from a depth- graphs, it can help quickly identify disconnected groups. first search. In retail recommendations, it can help identify groups with strong affinities that then can be used for suggesting commonly preferred items to those within that group who have not yet purchased the item. Union-Find / Finds groups of nodes where each node is reachable Union-Find / Connected Components is often used in conjunction with Connected from any other node in the same group, regardless of other algorithms, especially for high-performance grouping. As a preposing Components / the direction of relationships. It provides near step Weakly Connected constant-time (independent of input size) operations for undirected graphs, it can help quickly identify disconnected groups. to add new groups, merge existing groups and determine whether two nodes are in the same group. Louvain Modularity Measures the quality (i.e., presumed accuracy) of a Louvain is used to evaluate social structures in Twitter, LinkedIn and community grouping by comparing its relationship YouTube. It’s used in fraud analytics to evaluate whether a group has just a density to a suitably defined random network. It’s often few bad behaviors or is acting as a fraud ring that would be indicated by a used to evaluate the organization of complex networks, higher relationship density than average. in particular, community hierarchies. It’s also useful for Louvain revealed a six-level customer hierarchy in a Belgian telecom initial data preprocessing in unsupervised machine network. learning. 5th Annual Big Data & Business Analytics Symposium – March 22-23, 2018 24
Neo4j – Some Graph Algorithms Community Detection Algorithms (a.k.a. clustering and partitioning) Algorithm Type What It Does Example Uses Local Clustering For a particular node, it quantifies how close its neighbors are to Local Cluster Coefficient is important to estimating Coefficient / being a clique (every node is directly connected to every other resilience by understanding the likelihood of group Node Clustering node). For example, if all your friends knew each other directly, coherence or fragmentation. Evaluate how a Coefficient your local clustering coefficient would be 1. Small values for a Analysis of a European power grid using this method group is clustered cluster would indicate that although a grouping exists, the found that clusters with sparsely connected nodes were or partitioned, as nodes are not tightly connected. more resilient against widespread failures. well as its tendency Triangle-Count and Measures how many nodes have triangles and the degree to The Average Clustering Coefficient is often used to to strengthen or Average Clustering which nodes tend to cluster together. The average clustering estimate whether a network might exhibit “small-world” break apart. Coefficient coefficient is 1 when there is a clique, and 0 when there a no behaviors which are based on tightly knit clusters. It’s also a connections. For the clustering coefficient to be meaningful it factor for cluster stability and resiliency. Epidemiologists should be significantly higher than a version of the network have used the average clustering coefficient to help where all of the relationships have been shuffled randomly. predict various infection rates for different communities. Local Clustering For a particular node, it quantifies how close its neighbors are to Local Cluster Coefficient is important to estimating Coefficient / being a clique (every node is directly connected to every other resilience by understanding the likelihood of group Node Clustering node). For example, if all your friends knew each other directly, coherence or fragmentation. Coefficient your local clustering coefficient would be 1. Small values for a Analysis of a European power grid using this method cluster would indicate that although a grouping exists, the found that clusters with sparsely connected nodes were nodes are not tightly connected. more resilient against widespread failures. Triangle-Count and Measures how many nodes have triangles and the degree to The Average Clustering Coefficient is often used to Average Clustering which nodes tend to cluster together. The average clustering estimate whether a network might exhibit “small-world” Coefficient coefficient is 1 when there is a clique, and 0 when there a no behaviors which are based on tightly knit clusters. It’s also a connections. For the clustering coefficient to be meaningful it factor for cluster stability and resiliency. Epidemiologists should be significantly higher than a version of the network have used the average clustering coefficient to help where all of the relationships have been shuffled randomly. predict various infection rates for different communities. 5th Annual Big Data & Business Analytics Symposium – March 22-23, 2018 25
Some related and interesting material eBay Presentation / Article (one of many) Ajinkya Kale and Anju Vasta, New Product Development, eBay https://www.youtube.com/watch?v=hRpmIeJjx-Y https://www.youtube.com/playlist?list=PL9Hl4pk2FsvVdYIoyksOyAMyDEDYv6O4K https://www.forbes.com/sites/rachelarthur/2017/07/19/conversational-commerce-ebay-ai- chatbot/#a38d7361efb9 Why Knowledge Graphs Are Foundational to Artificial Intelligence Jim Webber, Chief Data Scientist, Neo4j, March 20, 2018 https://www.datanami.com/2018/03/20/why-knowledge-graphs-are-foundational-to-artificial-intelligence Graph Analytics: Graph Algorithms inside Neo4j Amy Hodler and Michael Hunger, Neo4j, January 26, 2018 https://www.youtube.com/watch?v=y10Bt7OkCRM 5th Annual Big Data & Business Analytics Symposium – March 22-23, 2018 26
Some related and interesting material Thank you! Neo4j Remember – “The world is connected. Value comes from being able to make and understand the connections”* * A wise quote from my wonderful wife spouse before saying: “I know you did it, and don’t do it again.” 5th Annual Big Data & Business Analytics Symposium – March 22-23, 2018 27
You can also read