Machine Learning @ Amazon - Ralf Herbrich Amazon - WSDM
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Background 1992 – 1997 (Berlin, Diploma) 1997 – 2000 (Berlin, PhD) 2000 – 2009 (Microsoft Research) 2009 – 2011 (Microsoft) 2011 – 2012 (Facebook) 2012 – Present (Amazon) 2/21/17 2
Overview • What is Amazon? • Machine Learning in Practise • Probabilities • Finite Resource • Machine Learning @ Amazon • Forecasting • Machine Translation • Visual Systems • Conclusions and Challenges 2/21/17 3
Amazon’s Virtuous Cycles Lower Cost Lower Structure Prices Selection & Convenience Customer Sellers Growth Experience Traffic 2/21/17 2016 (c) Amazon 3
Overview • What is Amazon? • Machine Learning in Practise • Probabilities • Finite Resource • Machine Learning @ Amazon • Forecasting • Machine Translation • Visual Systems • Conclusions and Challenges 2/21/17 7
Artificial Intelligence and Machine Learning Science Artificial Intelligence Machine Learning • Computer Science • Knowledge Representation • Rule Extraction from Past (Training) • Statistics • Knowledge Extraction • Forecast of Future (Prediction) • Neuroscience • Reasoning • Taking Actions Now (Decision • Operations Research • Planning Making) 2/21/17 2016 (c) Amazon 8
Overview • What is Amazon? • Machine Learning in Practise • Probabilities • Finite Resource • Machine Learning @ Amazon • Forecasting • Machine Translation • Visual Systems • Conclusions and Challenges 2/21/17 9
Machine Learning: Formal Definition • Labelled Data • Unlabelled Data • Probability is a central concept in Machine Learning! 2/21/17 10
Why Probability? 1. Mathematics of Uncertainty (Cox’ axioms) 2/21/17 11
Cox Axioms: Probabilities and Beliefs • Design: System must assign degree of plausability to each logical statement A. • Axiom: • is a real number • is independent of Boolean rewrite • P must be a probability measure! 2/21/17 12
Why Probability? 1. Mathematics of Uncertainty (Cox’ axioms) 2. Variables and Factors map to Memory & CPU 2/21/17 13
Factor Graphs • Definition: Graphical representation of product structure of a function (Wiberg, 1996) • Nodes: = Factors = Variables • Edges: Dependencies of factors on variables. • Semantic: a b • Local variable dependency of factors c 2/21/17 14
Inference in a Factor Graph s1 s2 s3 s4 t1 t2 t3 y12 y23 2/21/17 15
Factor Graphs and Cloud Computing Belief Store (“Memory”) ϑ1 ϑ2 ϑϑ3 ϑ4 ϑ5 Message Passing (“Communicate”) Data Messages (“Compute”) Y1 Y2 Y3 Y4 Y5 Y6 Y7 2/21/17 16
Factor Graphs and MXNet 2/21/17 2016 (c) Amazon 17
Why Probability? 1. Mathematics of Uncertainty (Cox’ axioms) 2. Variables and Factors map to Memory & CPU 3. Decouple Data Modeling and Decision Making 2/21/17 18
Infer-Predict-Decide Cycle Inference: Decision Making: P(Parameters) + Data à Loss(Action,Data) + P(Data) P(Parameters|Data) à Action • Requires a (structural) model P(Data|Parameters) • Business-loss not learning-loss! • Allows to incorporate prior • Often involves optimization! information P(Parameters|Data) Prediction: P(Parameters) + Data à P(Data) • Requires integration/summ ation of parameter uncertainty • Does not change state! 2/21/17 19
Overview • What is Amazon? • Machine Learning in Practise • Probabilities • Finite Resource • Machine Learning @ Amazon • Forecasting • Machine Translation • Visual Systems • Conclusions and Challenges 2/21/17 20
Finite Resources: Time • Real-Time Prediction Service: 5ms • Number of search & ads candidates to rank: 10,000 è 500 ns/candidate • Time to read from main memory: 10ns è 50 variables • L1/L2/L3 ranking pipelines to use more complex predictors • Real-Time Learning Service: 1B examples/day • Number of second per day: 86,400 è 86,400 ns/example • Time to write to RAM: 100ns è 864 writes only (excluding disc access!) • Sparse models are the result of time constraints!
Finite Resource: Cost Economics 101 • Profit = Revenue – Cost • In the long run, a business that generates negative profits is not viable! Facebook 2015 It’s power, stupid! Annual Revenue $17,928,000,000.00* Daily Revenue $49,117,808.22 Some constraints might not be obvious: building new datacenters and powering Number of DAU 1,038,000,000** them is non-trivial. Number of Story Candidates 1,500*** Example: 1 GPU box = 20 CPU boxes Number of Daily Stories 1.557E+12 (in terms of power consumption) Maximum Cost per Story Candidate $0.0000315 *http://www.statista.com/statistics/277229/facebooks-annual-revenue-and-net- income/ **http://www.statista.com/statistics/346167/facebook-global-dau/ ***https://www.facebook.com/business/news/News-Feed-FYI-A-Window-Into-News- Feed
Overview • What is Amazon? • Machine Learning in Practise • Probabilities • Finite Resource • Machine Learning @ Amazon • Forecasting • Machine Translation • Visual Systems • Conclusions and Challenges 2/21/17 23
Locations S9 ML Berlin ML Seattle ML Cambridge A2Z A9 Ivona Evi ML Los Angeles ML Bangalore 2/21/17 24
Machine Learning Opportunities @ Amazon Retail Customers Seller Catalog Digital • Demand • Product • Fraud Detection • Browse-Node • Named-Entity Forecasting Recommendation • Predictive Help Classification Extraction • Vendor Lead Time • Product Search • Seller Search & • Meta-data • XRay Prediction • Visual Search Crawling validation • Plagiarism • Pricing • Product Ads • Review Analysis Detection • Packaging • Shopping Advice • Hazmat Prediction • Echo Speech • Substitute • Customer Problem Recognition Prediction Detection • Knowledge Acquisiion 2/21/17 25
Overview • What is Amazon? • Machine Learning in Practise • Probabilities • Finite Resource • Machine Learning @ Amazon • Forecasting • Machine Translation • Visual Systems • Conclusions and Challenges 2/21/17 26
Fashion Forecasting (2014 – 2016) Setting • Given past sales of a fashion product, predict market demand up to 18 months into the future Challenges • Sparsity: Huge skew – many products sell very few items • Size: Variable for Amazon retail in buying but not a variable for end customers • Seasonal: Most fashion products only run for one season – often 12 months in advance • Distributions: Future is uncertain è predictions must be distributions • Scale: 5M+ fashion products in each market è 109 training examples • Censored: Past sales ≠ past demand (inventory constraint) 2/21/17 2016 (c) Amazon 27
Demand Forecasting Example fashion product to illustrate the challenges of forecasting. Training Range: Non-fashion items Missing Features or Input: have longer training ranges that we Unexplained spikes in demand are can leverage. Need to information likely caused by missing features or share across new and old products. incomplete input data. Seasonality: This item has Christmas seasonality with higher growth over time. This is where we need growth features in addition to date features. 2/21/17 2016 (c) Amazon 28
Learning and Prediction sales/demand P (zi t | ✓) ⇠ time Learning Forecasting Model Parameters
Slow Moving Inventory Typical midsize dataset: • About 5M items • About 4.5B item-days • About 98% zero demand
Sampling Predictions P (zi t | ✓) ⇠ • 0 or ≥1 ? Binary classification #1 • 1 or ≥2 ? Binary classification #2 • If ≥2: Count regression z-2
x1 x2 x3 x4 x5 l1,2 l2,2 l3,2 l4,2 l5,2 Latent State l1,1 l2,1 l3,1 l4,1 l5,1 l1,0 l2,0 l3,0 l4,0 l5,0 y1,2 y2,2 y3,2 y4,2 y5,2 Multistage Likelihood y1,1 y2,1 y3,1 y4,1 y5,1 y1,0 y2,0 y3,0 y4,0 y5,0 z1 z2 z3 z4 z5
In Practice x1 x2 x3 x4 x5 x1 x2 x3 x4 x5 x1 x2 x3 x4 x5 l1,2 l2,2 l3,2 l4,2 l5,2 l1,1 l2,1 l3,1 l4,1 l5,1 l1,0 l2,0 l3,0 l4,0 l5,0 l1,0 l2,0 l3,0 l4,0 l5,0 y 1,2 y 2,2 y 3,2 y 4,2 y 5,2 y 1,2 y 2,2 y 3,2 y 4,2 y 5,2 y 1,1 y 2,1 y 3,1 y 4,1 y 5,1 y 1,1 y 2,1 y 3,1 y 4,1 y 5,1 y 1,0 y 2,0 y 3,0 y 4,0 y 5,0 y 1,0 y 2,0 y 3,0 y 4,0 y 5,0 y 1,0 y 2,0 y 3,0 y 4,0 y 5,0 z1 z2 z3 z4 z5 z1 z2 z3 z4 z5 z1 z2 z3 z4 z5
Modelling Out of Stock GLM Bridge
Overview • What is Amazon? • Machine Learning in Practise • Probabilities • Finite Resource • Machine Learning @ Amazon • Forecasting • Machine Translation • Visual Systems • Conclusions and Challenges 2/21/17 35
Product Machine Translation (2013 – 2015) Lifetime Profit Human Translation Machine Translation Selection Gap Products 2/21/17 2016 (c) Amazon 36
Machine Translation Pipeline Input Request Input Normalization Tokenization Detection & Escaping of Lowercasing Sentence Segmentation Non-translatables Re-insertion of (converted) Translation/Decoding Recasing Nontranslatables Translated Request Post-processing De-Tokenization 2/21/17 2016 (c) Amazon 37
Machine Translation: Deep Dive p ( English) ´ p (German | English) p ( English | German) = p (German) µ p ( English) ´ p (German | English) Language Translation Model Model • Language Model: What are fluent English sentences? • Translation Model: What English sentences account well for a given German sentence? 2/21/17 2016 (c) Amazon 38
Overview • What is Amazon? • Machine Learning in Practise • Probabilities • Finite Resource • Machine Learning @ Amazon • Forecasting • Machine Translation • Visual Systems • Conclusions and Challenges 2/21/17 39
Automated Produce Inspection: The Goal Current Inspection New Automated Inspection Computer Vision 2/21/17 2016 (c) Amazon 40
Challenges • Illumination • Clutter/Occlusions • Viewpoint • Scale • Intra-class variability
Computer Vision Pipeline • Segmentation • Defect Mining • Produce Classification Amazon Confidential
Predicting Longevity Strawberry ID 2/21/17 Age à 2016 (c) Amazon 43
Age Aligned Strawberries (Test Set)
Overview • What is Amazon? • Machine Learning in Practise • Probabilities • Finite Resource • Machine Learning @ Amazon • Forecasting • Machine Translation • Visual Systems • Conclusions and Challenges 2/21/17 45
Conclusions • Machine Learning “translates” data from the past into accurate predictions about the future! • In practice, probabilistic models and finite resources matter. • Machine Learning helps to improve customer experience at Amazon! 2/21/17 2016 (c) Amazon 46
Thanks! 2/21/17 2016 (c) Amazon 47
You can also read