Universal Search & Recommendation - Paul Bennett - Microsoft

Page created by Monica Alexander

Uncategorized

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Universal Search & Recommendation - Paul Bennett - Microsoft

Universal Search & Recommendation

Paul Bennett

When I say “search” what is it you think of searching?

Search is Everywhere in Microsoft

And in the World

Leaving My Fingerprints: Motivations and Challenges of Contributing to OSS for Social Good. Huang et al. ICSE 2021.

Why search is not a “solved problem”
                Search engines perform best where search logs are
                     available to learn from implicit feedback.

                   Many key scenarios lack large search logs.
                   Low
   Emerging                 Enterprise   Extreme      Email   Personalized   Search as
                 Resource
    Verticals                Search      Verticals   Search      Search      a Service
                Languages

   We need to be able to truly understand content, queries, and people.
                                            5

Universal Search and Recommendation

      Office   Azure   Health   Dynamics   LinkedIn   Xbox   Bing   GitHub

High-performing search should not be limited to the open web.

Provide push-button customized search and recommendation
 …. for any person,
        any vertical,
        any data.

Trends: Toward Universal Search and Recommendation
Dense retrieval based on representation learning as the foundation
 Efficiency and accuracy advances enable dense retrieval to outperform inverted index.

Universal search and recommendation to scenarios without search logs
 Robust weakly & self-supervised learning generalize to the tail.

Universal search and recommendation for all data
 Representation learning enables dense representations to plug into dense retrieval stack.

Universal hyper-personalization
 Learning from feedback from all scenarios provides best search and recommendation
  experience for each user everywhere.

Current Search Landscape
             Web Search
        and Limited Scenarios
  Bag-of-Words      1970s-1990s

     Classic IR
                    1990s-2000s
     Features

 Learning to Rank 2000s-2010s

    Neural IR       2010s-2020s

  Dense Retrieval   2020s-Future

                                   8

Current Search Landscape
             Web Search                   Search Most
        and Limited Scenarios           Everywhere Else
  Bag-of-Words      1970s-1990s         Bag-of-Words      1970s-1990s
     Classic IR
                    1990s-2000s        “BM25”
     Features
                                       [1970s/1980s]
 Learning to Rank 2000s-2010s

    Neural IR       2010s-2020s

  Dense Retrieval   2020s-Future

                                   9

Current Search Landscape
             Web Search                            Search Most
        and Limited Scenarios                    Everywhere Else
  Bag-of-Words      1970s-1990s                  Bag-of-Words
                                   Decades of                      1970s-1990s
     Classic IR                      Gap!
                    1990s-2000s                 “BM25”
     Features
                                                [1970s/1980s]
 Learning to Rank 2000s-2010s

    Neural IR       2010s-2020s

  Dense Retrieval   2020s-Future

                                         10

Current Search Landscape and Recent Breakthroughs
                      Web Search                                                Search Most
                 and Limited Scenarios                                        Everywhere Else
           Bag-of-Words       1970s-1990s                                      Bag-of-Words
                                                     Decades of                                     1970s-1990s
              Classic IR                               Gap!
                              1990s-2000s                                   “BM25”
              Features
                                                                            [1970s/1980s]
          Learning to Rank 2000s-2010s

              Neural IR       2010s-2020s

          Dense Retrieval    2020s-Future

Recent breakthroughs in dense retrieval provide a promising path to close the gap in a scalable way:
  1.    Efficient serving with ANN retrieval
  2.    Short texts match with Dense retrieval: title/query matching in ads and common documents.
  3.    Full document body vectorization with Dense retrieval.

                                                              11

Breakthrough #1: Efficient ANN Retrieval
                                                                                                                           Vector Match & ANN
        Large scale, efficient serving of dense retrieval with ANN index
         Approximation of KNN with sub linear efficiency
         Often by enforcing sparsity in the searching graph
         Bing has industry leading ANN infrastructure with SPTAG [1] and DiskANN [2]
         Serving multiple scenarios in Bing and expanding

                                        SPTAG                                                                        DiskANN

[1] SPTAG: A library for fast approximate nearest neighbor search. Chen et al. 2019.
                                                                                                      12
[2] DiskANN: Fast accurate billion-point nearest neighbor search on a single node. Subramanya et al. NeurIPS 2019.

Breakthrough #2: Short Text Dense Retrieval
        Effectively capturing user interests in short texts with dense retrieval in Bing Ads
         Ads encourages “softer matches”
         Bid keyword ensures a lower bound of relevance
         Short text fields easier for neural network-based dense retrieval
         DSSM [1], RNNs, and now Transformers
         Extreme Classification (XC) as a form of dense retrieval [2]

                                                                                                   13
[1] Learning Deep Structured Semantic Models for Web Search using Clickthrough Data. Huang et al. CIKM 2013.
[2] SiameseXML: Siamese Networks meet Extreme Classifiers with 100M Labels. Dahiya et al. 2021. ICML 2021.

Breakthrough #2: Short Text Dense Retrieval
Effective relevance match with document’s short text fields (AUTC) for the Web
      ANN Search

                                            AUTC: Anchor, URL, Title, and Click Stream
                                            • Short texts fields
 Transformer         Transformer
                                            • More paraphrase-like matching with query
                                            • Helpful mainly for common documents with AUTC
   query                AUTC

Mature dense retrieval stack at scale
 Accelerated hardware inference
 Efficient indexes for dense vectors
 .. Much more

                                                 14

Breakthrough #3: Full Body Vectorization Dense Retrieval
        Effective Full Body Vectorization (FBV) Dense Retrieval with ANCE Training [1]
                     ANCE Training
                                                                                                    ANN Search

                                                                                           Transformer           Transformer

                                                                                               query             Body text

        ANCE enables dense retrieval model to capture relevance match with full body text of documents
         No hard dependency on AUTC: Generalize to entire web in Bing
                                        Generalize beyond the web
         10+ recall gain
         Better information compression with 2x improvement in serving cost

                                                                                                      15
[1] Approximate Nearest Neighbor Contrastive Learning for Dense Text Retrieval. Xiong et al. 2021. ICLR 2021.

The Research Path
                                                 2014 - 2016
                                                • Attempts to leverage word2vec
2010 and Before                                   and other embedding based
                                                  models for search.                    2019
• Many attempts from community to leverage
                                                • Pessimistic view on embedding-        • Dense retrieval with BERT performs on par with BM25 on Open-QA
  learned representations, e.g., LSI and LDA.
                                                  based search in literature                 • More “paraphrase” questions
                                                • Many switched to reranking with            • Short passages
                                                  term-level soft matches               • Questions on whether one embedding is sufficient for document retrieval
                                                                 2018
                                                               • Attempts to use BERT in dense
                                                                 retrieval setting failed to        2020
                                                                 outperform BM25
                                                               • BERT reranking improves search     • Identified the limitation is in selection of training negatives
                                                                                                    • ANCE show SOTA on various search tasks:
                                                                                                          • Web search
                                                                                                          • Conversational search
                                                                                                          • Open QA, multi-hop QA
                                                                                                    • How to better select training negatives is part of the new
                                                                                                      research frontier in dense retrieval

                                                                            16

Why is a dense representation for text retrieval hard, intuitively?
Dense Retrieval with standard learning to rank:

        Training                               Testing
         Phrase                                Phrase

  q           doc          0               q                     doc
  q           doc          1                                     doc
  q           doc          1                                     doc
  q           doc          0                                     doc
Query    Candidate Docs   Label         Query      Full Corpus   Top
                                                                 Docs

• Candidate docs are from existing sparse retrieval pipeline
   • Very different from those to retrieve in testing!
• Different training and testing distributions!

                                                         17

Why is a dense representation for text retrieval hard, intuitively?
Dense Retrieval with random negatives:

        Training                              Testing
         Phrase                               Phrase

  q           doc         0               q                     doc
  q           doc         1                                     doc
  q           doc         1                                     doc
  q           doc         0                                     doc
Query     Random Docs   Label           Query     Full Corpus    Top
                                                                 Docs

• Random documents are often too easy
• Retrieval is more about distinguish relevant from hard negatives

                                                        18

ANCE: Global ANN Negatives for Dense Retrieval Training
Dense Retrieval Inference

                       ANN
                     Retrieval

                                 19

Generalization to Unseen
Directly apply trained ANCE models as a fixed API, no change at all,
plug-and-play
                                         Drop-In NDCG@10
                   0.7
                   0.6
                   0.5
                   0.4
                   0.3
                   0.2
                   0.1
                    0
                            Developer
                         MSDOC (Enterprise)           Gaming
                                                Minecraft                Biomedical
                                                          (Gaming) TREC-COVID (Biomed)

                          BM25 (Azure Search)      ANCE      ANCE+BM25 (RR Fusion)

BM25 + ANCE lead to stable gains on developer, gaming, and
biomedical workloads.
                                                        22

Toward Universal Search and Recommendation
Accurate: Best search and recommendations experience with semantic understanding and user modeling
Generalizable: One solution for all search and recommendation scenarios
Scalable: Index and serving the entire world

                                               One Learned Representation Space
                                                                                       Web
                  Query     Speech                                                               Molecular
    Browsing                                                                          Corpus
                                                 Vector Match & ANN       Encoders
     History                                                                                     Video       Paper
                                     Encoder
                Dialog
                          User                                                         Data
                                                                                                 Database
     Target                                                                           Music
                 Email     Molecular                                                                            Evidence
    Protocol
                History    Structure                                                 Procedure      Table        Cases

                                                           23

Representation Learning in Search
Goals:
        • Accurately captures the semantics of content
        • Strong generalization ability to all domains

Technical directions:
        • Efficient Relevance-Oriented Pretraining that provides
          better starting point for universal search and
          recommendation.
        • Optimization techniques that better captures user feedback
          signals in the representation space
        • Theoretical Understanding of the properties of the
          Representation space in generalization
                                                                                                                     Better representation learning that improves
        • Document Representation Models that focus on capturing
          semantics in long documents.                                                                                the overall performance and generalization
        • Multi-modal representations that can jointly model                                                        ability of universal search and recommendation
          semantics of image-text-video.

Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training, Li et al., AAAI 2020.
                                                                                                      24
M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training, Ni et al., CVPR 2021.

Graph/Structure Learning
Goals:
    Improve universal embedding estimation by leveraging
       relationships among (1) users, (2) data elements, and
       (3) interactions between people and data.

Technical directions:
    Content+graph learning to leverage implicit
     relational signals in universal embeddings
    Composite embedding methods to dynamically
     compute implicit representations that reflect temporal
     context                                                                                                             One Learned Representation Space
    Pretrained graph models to generalize across
     different settings.                                                                                                                            Encoders
                                                                                                                              Vector Match & ANN
    Public-private learning techniques to preserve privacy                                                  Encoder
                                                                                                User                                                           Data
     in recommendation settings.

User-as-Graph: User Modeling with Heterogeneous Graph Pooling for News Recommendation. Wu et al., IJCAI 2021.
Uni-FedRec: A Unified Privacy-Preserving News Recommendation Framework for Model Training and Online Serving. Qi et al., EMNLP Findings 2021.
                                                                                                 25
Attentive Knowledge-aware Graph Convolutional Networks with Collaborative Guidance for Recommendation. Chen et al.,

Causality and Robustness
Goals:
    Improve robustness and generalizability in dynamic
     environments (e.g., changing traffic, corpus), across domains
     and into the tail.
    Target rewards beyond clicks, such as long-term rewards,
     user outcomes, empowerment.

Technical directions:
    Algorithmic advances
          Inverse reinforcement learning
          Causal representation learning
          Causal extreme learning
                                                                                                                   Relying on causal rather than correlational
    Incorporation of domain knowledge
          Simulations + causal modeling for decision making
                                                                                                                    relationships provides more stable and
          Leveraging knowledge of experiments and environments                                                  generalizable foundation for machine learning
    Causal ML under real-world constraints
          Causal modeling and online ML pipelines
          Federated causal ML

                                                                                                26
Causal Transfer Random Forest: Combining Logged Data and Randomized Experiments for Robust Prediction, Zeng et al., WSDM 2021.
Domain Generalization using Causal Matching, Mahajan et al., ICML 2021.

Universal Search and Recommendation

      Office   Azure   Health   Dynamics   LinkedIn   Xbox   Bing   GitHub

Provide push-button customized search and recommendation
 …. for any person,
        any vertical,
        any data.

© Copyright Microsoft Corporation. All rights reserved.

You can also read

The Power of Graph-Based Search - WHITE PAPER The #1 Platform for Connected Data - Neo4j

2021 Prenatal/Perinatal Care Preventive Health Guidelines

SAN FRANCISCO PLANNING COMMISSION - NOTICE OF CANCELLATION, REINSTATEMENT AND CONTINUANCES

Universal-Fit Wakeboard Towers and Accessories - If You Don't Love Your Monster Tower, We Want it Back.

About AJNIFM International Training Programme on Budgeting, Accounting & Financial Management - National Institute of Financial Management

On Demand and Live Webinar Training - NATIONAL EDUCATORS LAW INSTITUTE - 2 Ways To Earn Your Stripes - National Educators Law ...

SPONSORSHIP - Foundation Learning Centre

NATIONAL SPORTS DAY - MINISTRY OF PUBLIC HEALTH Healthcare Protocol and Public Health Measures

Selection of high-yielding varieties and hybrids of pan patty squash, determining their most favorable planting dates

Education Recovery Support for early years settings, schools and providers of 16-19 education - GOV.UK

Using Interpreters when Conducting Research with Survivors of Gender-based Violence

2022 School & 2-week Course - Sheffield School of Aeronautics (est. 1948) Information

TRAINER INCENTIVE PROGRAM EXPECTATIONS & REGULATIONS

Wet and Wild Along the Mississippi River

First Findings Eurofound Report: Active inclusion of young people with health problems or disabilities Anna Ludwinek, Research Manager, Eurofound ...

SCALABLE AI INFRASTRUCTURE ACCELERATES AUTONOMOUS VEHICLE DEVELOPMENT - SUCCESS STORY | ZENUITY - Nvidia

I accepted - ''Études en France'' Application Campus France Process - Campus France Singapore

Dampier Launches Aggressive 2021 Exploration Program

Implementation of Data Mining Techniques for Weather Report Guidance for Ships Using Global Positioning System

Web Search Advances & Link Analysis - This material was prepared by Diana Inkpen, University of Ottawa, 2005, updated 2021. Some of these slides ...

Wilderness Ranger Internship 2020 Description - Squarespace

BRIDGING THE GENDER GAP IN LOCAL POLITICS - Ocean Publishing

Behaviour of Cold-Formed Z Purlins with Sag- Rods in Pre-Engineered Buildings - IJRASET

2 Day Royal Etiquette & Protocol Masterclass - SAO PAULO, Brazil 2021

WINONA CAMPS TRAINING CLINICS 2019

Application Guide The International Technical Exhibition of Medical Imaging 2021 - ITEM 2021

IMAT ITALIAN MARITIME ACADEMY TECHNOLOGIES

Collin College Strategic Plan 2016-2020

2 Day Housekeeping Executive Masterclass - SAO PAULO, Brazil 2021

PROGRAM ENERGY TRANSITION 2 - MEDports TRAINING

Careers at Kapama Executive Chef - Kapama Southern Camp - July 2021

2018-2019 JAMAICA FOR INTERNATIONAL MARITIME ORGANIZATION COUNCIL - MFAFT

Marie Skłodowska-Curie Actions 2018-20 calls - National and European Funding Opportunities in the Social Sciences for Young Researchers - European ...

UCAS at TREC-2014 Microblog Track

Competitive Gymnastics - Policies & Procedures 2020/2021 Season September 2019

HOPE Springs Eternal - The HOPE Program

Developing a Skills Policy for the - 21st Century

SharePoint Server 2019 - Quick Start Guide for Single Server Farms

Spring Into Recovery - Al-Anon and Alateen Santa Barbara

PLANNING FOR THE FUTURE - Living with advanced heart failure

VRS Community Partners + VRS Staff Forum - June 1, 2021 - Minnesota.gov

Referencing Advice for Harvard Style Guide 2020-2021 v.1 - Dyslexia ...

Raptors Hockey Club 2021-22 Peewee 09AA Program Guide

2019 Fire Safety Report - California State University, Sacramento - Sacramento State

MUSICAL THEATRE COURSES 2019 - Ph. (02) 43 39 46 42 457 Pacific HWY, Wyoming, NSW, 2250 www.palmtreestudios.com.au - Palm Tree Studios

RICHARD BRERETON - MULTIVARIATE PATTERN RECOGNITION FOR CHEMOMETRICS - BFR

Join us in our events and activities to find out more about local Mental Health & Wellbeing services

DOCKYARD PORT OF PLYMOUTH HARBOUR SAFETY PLAN 2019 -2022 - Royal Navy

NEWSPOINT EXTRA - Viewpoint Housing Association

German (code 020) Class IX and X (2020-21) - CBSE Academic