Machine Learning @ Amazon - Ralf Herbrich Amazon - WSDM

Page created by Clinton Dawson
CONTINUE READING
Machine Learning @ Amazon - Ralf Herbrich Amazon - WSDM
Machine Learning @ Amazon
                    Ralf Herbrich
                      Amazon

2/21/17                               1
Machine Learning @ Amazon - Ralf Herbrich Amazon - WSDM
Background
              1992 – 1997 (Berlin, Diploma)

                1997 – 2000 (Berlin, PhD)

             2000 – 2009 (Microsoft Research)

                 2009 – 2011 (Microsoft)

                 2011 – 2012 (Facebook)

                 2012 – Present (Amazon)
2/21/17                                         2
Machine Learning @ Amazon - Ralf Herbrich Amazon - WSDM
Overview
• What is Amazon?
• Machine Learning in Practise
      • Probabilities
      • Finite Resource
• Machine Learning @ Amazon
      • Forecasting
      • Machine Translation
      • Visual Systems
• Conclusions and Challenges

2/21/17                          3
Machine Learning @ Amazon - Ralf Herbrich Amazon - WSDM
Our Customers

2/21/17                   4
Machine Learning @ Amazon - Ralf Herbrich Amazon - WSDM
Devices

2/21/17   5
Machine Learning @ Amazon - Ralf Herbrich Amazon - WSDM
Amazon’s Virtuous Cycles
               Lower Cost                 Lower
               Structure                  Prices

                      Selection &
                      Convenience

                                               Customer
          Sellers           Growth             Experience

                            Traffic

2/21/17                     2016 (c) Amazon                 3
Machine Learning @ Amazon - Ralf Herbrich Amazon - WSDM
Overview
• What is Amazon?
• Machine Learning in Practise
      • Probabilities
      • Finite Resource
• Machine Learning @ Amazon
      • Forecasting
      • Machine Translation
      • Visual Systems
• Conclusions and Challenges

2/21/17                          7
Machine Learning @ Amazon - Ralf Herbrich Amazon - WSDM
Artificial Intelligence and Machine Learning

          Science                   Artificial Intelligence        Machine Learning
          •   Computer Science      •   Knowledge Representation   • Rule Extraction from Past (Training)
          •   Statistics            •   Knowledge Extraction       • Forecast of Future (Prediction)
          •   Neuroscience          •   Reasoning                  • Taking Actions Now (Decision
          •   Operations Research   •   Planning                     Making)

2/21/17                                     2016 (c) Amazon                                                 8
Machine Learning @ Amazon - Ralf Herbrich Amazon - WSDM
Overview
• What is Amazon?
• Machine Learning in Practise
      • Probabilities
      • Finite Resource
• Machine Learning @ Amazon
      • Forecasting
      • Machine Translation
      • Visual Systems
• Conclusions and Challenges

2/21/17                          9
Machine Learning @ Amazon - Ralf Herbrich Amazon - WSDM
Machine Learning: Formal Definition
• Labelled Data

• Unlabelled Data

• Probability is a central concept in Machine Learning!

2/21/17                                                   10
Why Probability?
1. Mathematics of Uncertainty (Cox’ axioms)

2/21/17                                       11
Cox Axioms: Probabilities and Beliefs
• Design: System must assign degree of plausability   to each logical
  statement A.
• Axiom:
      •   is a real number
      •   is independent of Boolean rewrite
      •

            P must be a probability measure!
2/21/17                                                                 12
Why Probability?
1. Mathematics of Uncertainty (Cox’ axioms)

2. Variables and Factors map to Memory & CPU

2/21/17                                        13
Factor Graphs
• Definition: Graphical representation of product structure of a
  function (Wiberg, 1996)
      • Nodes:    = Factors       = Variables
      • Edges: Dependencies of factors on variables.

• Semantic:                                            a       b

      • Local variable dependency of factors               c

2/21/17                                                            14
Inference in a Factor Graph

           s1         s2        s3         s4

           t1              t2              t3

                y12                  y23

2/21/17                                         15
Factor Graphs and Cloud Computing

                                                          Belief Store
                                                          (“Memory”)
                     ϑ1   ϑ2   ϑϑ3   ϑ4    ϑ5
                                                          Message Passing
                                                          (“Communicate”)

                                                          Data Messages
                                                          (“Compute”)

           Y1   Y2        Y3   Y4     Y5        Y6   Y7

2/21/17                                                                     16
Factor Graphs and MXNet

2/21/17           2016 (c) Amazon   17
Why Probability?
1. Mathematics of Uncertainty (Cox’ axioms)

2. Variables and Factors map to Memory & CPU

3. Decouple Data Modeling and Decision Making

2/21/17                                         18
Infer-Predict-Decide Cycle
                                                          Inference:
             Decision Making:                             P(Parameters) + Data à
             Loss(Action,Data) + P(Data)                  P(Parameters|Data)
             à Action                                     • Requires a (structural) model
                                                            P(Data|Parameters)
             • Business-loss not learning-loss!           • Allows to incorporate prior
             • Often involves optimization!                 information
                                                            P(Parameters|Data)

                                           Prediction:
                                           P(Parameters) +
                                           Data à P(Data)
                                           • Requires
                                             integration/summ
                                             ation of parameter
                                             uncertainty
                                           • Does not change
                                             state!
2/21/17                                                                                     19
Overview
• What is Amazon?
• Machine Learning in Practise
      • Probabilities
      • Finite Resource
• Machine Learning @ Amazon
      • Forecasting
      • Machine Translation
      • Visual Systems
• Conclusions and Challenges

2/21/17                          20
Finite Resources: Time
• Real-Time Prediction Service: 5ms
   • Number of search & ads candidates to rank: 10,000 è 500 ns/candidate
   • Time to read from main memory: 10ns è 50 variables
   • L1/L2/L3 ranking pipelines to use more complex predictors

• Real-Time Learning Service: 1B examples/day
   • Number of second per day: 86,400 è 86,400 ns/example
   • Time to write to RAM: 100ns è 864 writes only (excluding disc access!)

• Sparse models are the result of time constraints!
Finite Resource: Cost
                                                           Economics 101
       • Profit = Revenue – Cost
       • In the long run, a business that generates negative profits is not viable!

Facebook                                                                  2015              It’s power, stupid!
Annual Revenue                                                $17,928,000,000.00*
Daily Revenue                                                          $49,117,808.22       Some constraints might not be obvious:
                                                                                            building new datacenters and powering
Number of DAU                                                        1,038,000,000**        them is non-trivial.
Number of Story Candidates                                                       1,500***
                                                                                            Example: 1 GPU box = 20 CPU boxes
Number of Daily Stories                                                         1.557E+12            (in terms of power consumption)
Maximum Cost per Story Candidate                                               $0.0000315
*http://www.statista.com/statistics/277229/facebooks-annual-revenue-and-net-
income/
**http://www.statista.com/statistics/346167/facebook-global-dau/
***https://www.facebook.com/business/news/News-Feed-FYI-A-Window-Into-News-
Feed
Overview
• What is Amazon?
• Machine Learning in Practise
      • Probabilities
      • Finite Resource
• Machine Learning @ Amazon
      • Forecasting
      • Machine Translation
      • Visual Systems
• Conclusions and Challenges

2/21/17                          23
Locations
                                                S9
                                                           ML Berlin
          ML Seattle             ML Cambridge

                           A2Z
            A9                                                 Ivona
                                                     Evi

          ML Los Angeles
                                                                       ML Bangalore

2/21/17                                                                               24
Machine Learning Opportunities @ Amazon

      Retail               Customers            Seller              Catalog               Digital
      • Demand             • Product            • Fraud Detection   • Browse-Node         • Named-Entity
        Forecasting          Recommendation     • Predictive Help     Classification        Extraction
      • Vendor Lead Time   • Product Search     • Seller Search &   • Meta-data           • XRay
        Prediction         • Visual Search        Crawling            validation          • Plagiarism
      • Pricing            • Product Ads                            • Review Analysis       Detection
      • Packaging          • Shopping Advice                        • Hazmat Prediction   • Echo Speech
      • Substitute         • Customer Problem                                               Recognition
        Prediction           Detection                                                    • Knowledge
                                                                                            Acquisiion

2/21/17                                                                                                    25
Overview
• What is Amazon?
• Machine Learning in Practise
      • Probabilities
      • Finite Resource
• Machine Learning @ Amazon
      • Forecasting
      • Machine Translation
      • Visual Systems
• Conclusions and Challenges

2/21/17                          26
Fashion Forecasting (2014 – 2016)

          Setting

          • Given past sales of a fashion product, predict market demand up to 18 months into the future

          Challenges

          • Sparsity: Huge skew – many products sell very few items
          • Size: Variable for Amazon retail in buying but not a variable for end customers
          • Seasonal: Most fashion products only run for one season – often 12 months in advance
          • Distributions: Future is uncertain è predictions must be distributions
          • Scale: 5M+ fashion products in each market è 109 training examples
          • Censored: Past sales ≠ past demand (inventory constraint)

2/21/17                                            2016 (c) Amazon                                         27
Demand Forecasting
Example fashion product to illustrate the challenges of forecasting.

            Training Range: Non-fashion items                               Missing Features or Input:
            have longer training ranges that we                             Unexplained spikes in demand are
            can leverage. Need to information                               likely caused by missing features or
            share across new and old products.                              incomplete input data.
                                      Seasonality: This item has Christmas
                                      seasonality with higher growth over time.
                                      This is where we need growth features in
                                      addition to date features.
 2/21/17                                        2016 (c) Amazon                                       28
Learning and Prediction
                                   sales/demand
                                                    P (zi t | ✓) ⇠

                                                                     time

            Learning
                                                  Forecasting

                       Model Parameters
Slow Moving Inventory

      Typical midsize dataset:
      • About 5M items
      • About 4.5B item-days
      • About 98% zero demand
Sampling Predictions

       P (zi t | ✓) ⇠
       • 0 or ≥1 ?
         Binary classification #1
       • 1 or ≥2 ?
         Binary classification #2
       • If ≥2:
         Count regression z-2
x1     x2     x3     x4     x5

l1,2   l2,2   l3,2   l4,2   l5,2

                                       Latent State
l1,1   l2,1   l3,1   l4,1   l5,1

l1,0   l2,0   l3,0   l4,0   l5,0

y1,2   y2,2   y3,2   y4,2   y5,2

                                   Multistage
                                   Likelihood
y1,1   y2,1   y3,1   y4,1   y5,1

y1,0   y2,0   y3,0   y4,0   y5,0

z1     z2     z3     z4     z5
In Practice

              x1      x2      x3      x4      x5      x1      x2      x3      x4      x5      x1      x2      x3      x4      x5

                                                      l1,2    l2,2    l3,2    l4,2    l5,2

                                                      l1,1    l2,1    l3,1    l4,1    l5,1

              l1,0    l2,0    l3,0    l4,0    l5,0    l1,0    l2,0    l3,0    l4,0    l5,0

                                                      y 1,2   y 2,2   y 3,2   y 4,2   y 5,2   y 1,2   y 2,2   y 3,2   y 4,2   y 5,2

                                                      y 1,1   y 2,1   y 3,1   y 4,1   y 5,1   y 1,1   y 2,1   y 3,1   y 4,1   y 5,1

              y 1,0   y 2,0   y 3,0   y 4,0   y 5,0   y 1,0   y 2,0   y 3,0   y 4,0   y 5,0   y 1,0   y 2,0   y 3,0   y 4,0   y 5,0

              z1      z2      z3      z4      z5      z1      z2      z3      z4      z5      z1      z2      z3      z4      z5
Modelling Out of Stock

      GLM

      Bridge
Overview
• What is Amazon?
• Machine Learning in Practise
      • Probabilities
      • Finite Resource
• Machine Learning @ Amazon
      • Forecasting
      • Machine Translation
      • Visual Systems
• Conclusions and Challenges

2/21/17                          35
Product Machine Translation (2013 – 2015)

                                     Lifetime Profit
                                                       Human Translation

                                                                  Machine Translation

                                                       Selection Gap       Products

2/21/17            2016 (c) Amazon                                                      36
Machine Translation Pipeline

           Input Request             Input Normalization                 Tokenization

          Detection & Escaping of
                                        Lowercasing               Sentence Segmentation
            Non-translatables

                                    Re-insertion of (converted)
          Translation/Decoding                                            Recasing
                                         Nontranslatables

          Translated Request             Post-processing               De-Tokenization

2/21/17                                         2016 (c) Amazon                           37
Machine Translation: Deep Dive
                                   p ( English) ´ p (German | English)
           p ( English | German) =
                                                p (German)

                                 µ p ( English) ´ p (German | English)

                             Language                         Translation
                             Model                            Model

          • Language Model: What are fluent English sentences?

          • Translation Model: What English sentences account
            well for a given German sentence?
2/21/17                               2016 (c) Amazon                       38
Overview
• What is Amazon?
• Machine Learning in Practise
      • Probabilities
      • Finite Resource
• Machine Learning @ Amazon
      • Forecasting
      • Machine Translation
      • Visual Systems
• Conclusions and Challenges

2/21/17                          39
Automated Produce Inspection: The Goal
          Current Inspection                     New Automated Inspection

                                                        Computer Vision

2/21/17                        2016 (c) Amazon                              40
Challenges

• Illumination

• Clutter/Occlusions

• Viewpoint

• Scale

• Intra-class variability
Computer Vision Pipeline
• Segmentation

• Defect Mining

• Produce Classification

                           Amazon Confidential
Predicting Longevity
Strawberry ID

2/21/17
                       Age à
                       2016 (c) Amazon   43
Age Aligned Strawberries (Test Set)
Overview
• What is Amazon?
• Machine Learning in Practise
      • Probabilities
      • Finite Resource
• Machine Learning @ Amazon
      • Forecasting
      • Machine Translation
      • Visual Systems
• Conclusions and Challenges

2/21/17                          45
Conclusions

• Machine Learning “translates” data from the past into accurate
  predictions about the future!

• In practice, probabilistic models and finite resources matter.

• Machine Learning helps to improve customer experience at Amazon!

2/21/17                          2016 (c) Amazon                     46
Thanks!

2/21/17    2016 (c) Amazon   47
You can also read