Unbiased Learning-to-Rank with Biased Feedback - IJCAI
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) Unbiased Learning-to-Rank with Biased Feedback∗ Thorsten Joachims1 , Adith Swaminathan2 , Tobias Schnabel1 1 Cornell University, Ithaca, NY 2 Microsoft Research, Redmond, WA tj@cs.cornell.edu, adswamin@microsoft.com, tbs49@cornell.edu Abstract particular, the order of presentation has a strong influence on where users click [Joachims et al., 2007]. This presentation Implicit feedback (e.g., clicks, dwell times, etc.) bias leads to an incomplete and skewed sample of relevance is an abundant source of data in human-interactive judgments that is far from uniform, thus leading to biased systems. While implicit feedback has many ad- learning-to-rank. vantages (e.g., it is inexpensive to collect, user- Second, treating clicks as preferences between clicked and centric, and timely), its inherent biases are a key skipped documents has been found to be accurate [Joachims, obstacle to its effective use. For example, posi- 2002; Joachims et al., 2007], but it can only infer preferences tion bias in search rankings strongly influences how that oppose the presented order. This again leads to severely many clicks a result receives, so that directly us- biased data, and learning algorithms trained with these pref- ing click data as a training signal in Learning-to- erences tend to reverse the presented order unless additional Rank (LTR) methods yields sub-optimal results. To heuristics are used [Joachims, 2002]. overcome this bias problem, we present a coun- Third, probabilistic click models (see Chuklin et terfactual inference framework that provides the al. [2015]) have been used to model how users produce clicks. theoretical basis for unbiased LTR via Empirical By estimating latent parameters of these generative click Risk Minimization despite biased data. Using this models, one can infer the relevance of a given document for a framework, we derive a propensity-weighted rank- given query. However, inferring reliable relevance judgments ing SVM for discriminative learning from implicit typically requires that the same query is seen multiple times, feedback, where click models take the role of the which is unrealistic in many retrieval settings (e.g., personal propensity estimator. Beyond the theoretical sup- collection search) and for tail queries. port, we show empirically that the proposed learn- ing method is highly effective in dealing with bi- Fourth, allowing the LTR algorithm to randomize what is ases, that it is robust to noise and propensity model presented to the user, like in online learning algorithms [Ra- mis-specification, and that it scales efficiently. We man et al., 2013; Hofmann et al., 2013] and batch learning also demonstrate the real-world applicability of our from bandit feedback (BLBF) [Swaminathan and Joachims, approach on an operational search engine, where it 2015] can overcome the problem of bias in click data in a substantially improves retrieval performance. principled manner. However, requiring that rankings be ac- tively perturbed during system operation whenever we collect training data decreases ranking quality and, therefore, incurs 1 Introduction a cost compared to observational data collection. Batch training of retrieval systems requires annotated test col- In this paper we present a theoretically principled and em- lections that take substantial effort and cost to amass. While pirically effective approach for learning from observational economically feasible for web search, eliciting relevance an- implicit feedback that can overcome the limitations outlined notations from experts is infeasible or impossible for most above. By drawing on counterfactual estimation techniques other ranking applications (e.g., personal collection search, from causal inference [Imbens and Rubin, 2015] and work intranet search). For these applications, implicit feedback on correcting sampling bias at the query level [Wang et al., from user behavior is an attractive source of data. Unfortu- 2016], we first develop a provably unbiased estimator for nately, existing approaches for Learning-to-Rank (LTR) from evaluating ranking performance using biased feedback data. implicit feedback – and clicks on search results in particular Based on this estimator, we propose a Propensity-Weighted – have several limitations or drawbacks. Empirical Risk Minimization (ERM) approach to LTR, which First, the naı̈ve approach of treating a click/no-click as a we implement efficiently in a new learning method we call positive/negative relevance judgment is severely biased. In Propensity SVM-Rank. While our approach uses a click model, the click model is merely used to assign propensities ∗ to clicked results in hindsight, not to extract aggregate rele- Invited IJCAI18 summary based on WSDM17 Best-Paper Award. vance judgments. This means that our Propensity SVM-Rank 5284
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) does not require queries to repeat, making it applicable to a to talk about query instances xi that include contextual infor- large range of ranking scenarios. Finally, our methods can use mation about the user, instead of query strings x. For a given observational data and we do not require that the system ran- query instance xi , we denote with ri (y) the user-specific rel- domizes rankings during data collection, except for a small evance of result y for query instance xi . One may argue that pilot experiment to estimate the propensity model. what expert assessors try to capture with rel(x, y) is the mean of the relevances ri (y) over all query instances that share the 2 Full-Info Learning to Rank query string. Relying on implicit feedback instead for learn- ing allows us to remove a lot of guesswork about what the Before we derive our approach for LTR from biased implicit distribution of users meant by a query. feedback, we first review the conventional problem of LTR However, when using implicit feedback as a relevance sig- from editorial judgments. In conventional LTR, we are given nal, unobserved feedback is an even greater problem than a sample X of i.i.d. queries xi ∼ P(x) for which we as- missing judgments in the pooling setting. In particular, im- sume the relevances rel(x, y) of all documents y are known. plicit feedback is distorted by presentation bias, and it is Since all relevances are assumed to be known, we call this not missing completely at random [Little and Rubin, 2002]. the Full-Information Setting. The relevances can be used to To nevertheless derive well-founded learning algorithms, we compute the loss ∆(y|x) (e.g., negative DCG) of any ranking adopt the following counterfactual model. It closely follows y for query x. Aggregating the losses of individual rankings [Schnabel et al., 2016], which unifies several prior works on by taking the expectation over the query distribution, we can evaluating information retrieval systems. define the overall risk of a ranking system S that returns rank- For concreteness and simplicity, assume that relevances are ings S(x) as Z binary, ri (y) ∈ {0, 1}, and our performance measure of inter- est is the sum of the ranks of the relevant results R(S) = ∆(S(x)|x) d P(x). (1) X ∆(y|xi , ri ) = rank(y|y) · ri (y). (2) The goal of learning is to find a ranking function S ∈ S that y∈y minimizes R(S) for the query distribution P(x). Since R(S) Analogous to (1), we can define the risk of a system as cannot be computed directly, it is typically estimated via the Z empirical risk R(S) = ∆(S(x)|x, r) d P(x, r). (3) 1 X R̂(S) = ∆(S(xi )|xi ). In our counterfactual model, there exists a true vector of |X| xi ∈X relevances ri for each incoming query instance (xi , ri ) ∼ A common learning strategy is Empirical Risk Minimization P(x, r). However, only a part of these relevances is observed (ERM) [Vapnik, 1998], which corresponds to picking the sys- for each query instance, while typically most remain unob- tem Ŝ ∈ S that optimizes the empirical risk served. In particular, given a presented ranking ȳi we are n o more likely to observe the relevance signals (e.g., clicks) for Ŝ = argminS∈S R̂(S) , the top-ranked results than for results ranked lower in the list. Let oi denote the 0/1 vector indicating which relevance values possibly subject to some regularization in order to control were revealed, oi ∼ P(o|xi , ȳi , ri ). For each element of oi , overfitting. There are several LTR algorithms that follow this denote with Q(oi (y) = 1|xi , ȳi , ri ) the marginal probability approach (see Liu [2009]), and we use SVM-Rank [Joachims, of observing the relevance ri (y) of result y for query xi , if 2002] as a representative algorithm in this paper. the user was presented the ranking ȳi . We refer to this prob- The relevances rel(x, y) are typically elicited via expert ability value as the propensity of the observation. We discuss judgments. Apart from being expensive and often infeasible how oi and Q can be obtained more in Section 4. (e.g., in personal collection search), expert judgments come Using this counterfactual modeling setup, we can get an with at least two other limitations. First, it is clearly impossi- unbiased estimate of ∆(y|xi , ri ) for any new ranking y (typ- ble to get explicit judgments for all documents, and pooling ically different from the presented ranking ȳi ) via the inverse techniques [Sparck-Jones and van Rijsbergen, 1975] often in- propensity scoring (IPS) estimator [Horvitz and Thompson, troduce bias. The second limitation is that expert judgments 1952; Rosenbaum and Rubin, 1983; Imbens and Rubin, 2015] rel(x, y) have to be aggregated over all intents that underlie ˆ IP S (y|xi , ȳi , oi ) = X rank(y|y)·ri (y) the same query string, and it can be challenging for a judge to ∆ properly conjecture the distribution of query intents to assign Q(oi (y) = 1|xi , ȳi , ri ) y:oi (y)=1 an appropriate rel(x, y). X rank(y|y) = . 3 Partial-Info Learning to Rank Q(oi (y) = 1|xi , ȳi , ri ) y:oi (y)=1 V ri (y)=1 Learning from implicit feedback has the potential to over- come the above-mentioned limitations of full-information This is an unbiased estimate of ∆(y|xi , ri ) for any y, if LTR. By drawing the training signal directly from the user, it Q(oi (y) = 1|xi , ȳi , ri ) > 0 for all y that are relevant naturally reflects the user’s intent, since each user acts upon ri (y) = 1 (but not necessarily for the irrelevant y). The proof their own relevance judgement subject to their specific con- for this is quite straightforward and can be found in the orig- text and information need. It is therefore more appropriate inal paper [Joachims et al., 2017]. 5285
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) An interesting property of ∆ ˆ IP S (y|xi , ȳi , oi ) is that only Under this propensity model, we can simplify the IPS esti- those results y with [oi (y) = 1 ∧ ri (y) = 1] (i.e. clicked mator from (4) by substituting pr as the propensities and by results, as we will see later) contribute to the estimate. We using ci (y) = 1 ↔ [oi (y) = 1 ∧ ri (y) = 1] therefore only need the propensities Q(oi (y) = 1|xi , ȳi , ri ) for relevant results. Since we will eventually need to estimate n 1X X rank(y|S(xi )) the propensities Q(oi (y) = 1|xi , ȳi , ri ), an additional re- R̂IP S (S) = . (5) n i=1 prank(y|ȳi ) quirement for making ∆ ˆ IP S (y|xi , ȳi , oi ) computable while y:ci (y)=1 remaining unbiased is that the propensities only depend on observable information (i.e., unconfoundedness, see Imbens R̂IP S (S) is an unbiased estimate of R(S) under the position- and Rubin [2015]). based propensity model if pr > 0 for all ranks. While absence Having a sample of N query instances xi , recording the of a click does not imply that the result is not relevant (i.e., partially-revealed relevances ri as indicated by oi , and the ci (y) = 0 6→ ri (y) = 0), the IPS estimator has the nice propensities Q(oi (y) = 1|xi , ȳi , ri ), the empirical risk of a property that such explicit negative judgments are not needed system is simply the IPS estimates averaged over query in- to compute an unbiased estimate of R(S) for the loss in (2). stances: Similarly, while absence of a click leaves us unsure about N whether the result was examined (i.e., ei (y) = ?), the IPS 1X X rank(y|S(xi )) estimator only needs to know the indicators oi (y) = 1 for R̂IP S (S) = . (4) N i=1 Q(oi (y) = 1|xi , ȳi , ri ) results that are also relevant (i.e., clicked results). y:oi (y)=1 V ri (y)=1 3.2 Propensity SVM-Rank ˆ IP S (y|xi , ȳi , oi ) is unbiased for each query instance, Since ∆ the aggregate R̂IP S (S) is also unbiased for R(S) from (3), We now derive a concrete learning method that imple- ments propensity-weighted LTR. It is based on SVM-Rank E[R̂IP S (S)] = R(S). [Joachims, 2002; Joachims, 2006], but we conjecture that propensity-weighted versions of other LTR methods can be Furthermore, it is easy to verify that R̂IP S (S) converges to derived as well. the true R(S) under mild additional conditions (i.e., propen- Consider a dataset of n examples of the following form. sities bounded away from 0) as we increase the sample size For each query-result pair (xj , yj ) that is clicked, we com- N of query instances. So, we can perform ERM using this pute the propensity qi = Q(oi (y) = 1|xi , ȳi , ri ) of the click propensity-weighted empirical risk, according to our click propensity model. We also record the n o candidate set Yj of all results for query xj . Typically, Yj Ŝ = argminS∈S R̂IP S (S) . contains a few hundred documents – selected by a stage-one ranker [Wang et al., 2011] – that we aim to rerank. Note that Finally, using standard results from statistical learning theory each click generates a separate training example, even if mul- [Vapnik, 1998], consistency of the empirical risk paired with tiple clicks occur for the same query. capacity control implies consistency also for ERM. In intu- Given this propensity-scored click data, we define Propen- itive terms, this means that given enough training data, the sity SVM-Rank as a generalization of conventional SVM- learning algorithm is guaranteed to find the best system in S. Rank. Propensity SVM-Rank learns a linear scoring function 3.1 Position-Based Propensity Model f (x, y) = w · φ(x, y) that can be used for ranking results, where w is a weight vector and φ(x, y) is a feature vector The previous section showed that the propensities of the ob- that describes the match between query x and result y. servations Q(oi (y) = 1|xi , ȳi , ri ) are the key component for Propensity SVM-Rank optimizes the following objective, unbiased LTR from biased observational feedback. To derive propensities of observed clicks, we consider a straightforward n examination model analogous to Richardson et al. [2007], 1 CX 1 X ŵ = argminw,ξ w·w+ ξjy where a click on a search result depends on the probability 2 n j=1 qj y∈Yj that a user examines a result (i.e., ei (y)) and then decides to click on it (i.e., ci (y)) in the following way: s.t. ∀y ∈ Y1 \{y1 } : w·[φ(x1 , y1 ) − φ(x1 , y)] ≥ 1−ξ1y .. P (ei (y) = 1| rank(y|ȳ)) · P (ci (y) = 1| ri (y), ei (y) = 1). . ∀y ∈ Yn \{yn } : w·[φ(xn , yn ) − φ(xn , y)] ≥ 1−ξny In this model, examination depends only on the rank of y in ȳ. So, P (ei (y) = 1| rank(y|ȳi )) can be represented by a ∀j∀y : ξjy ≥ 0. vector of examination probabilities pr , one for each rank r, which are precisely the propensities Q(oi (y) = 1|xi , ȳi , ri ). C is a regularization parameter that is typically selected via These examination probabilities can model presentation bias cross-validation. The training objective optimizes an upper found in eye-tracking studies [Joachims et al., 2007], where bound on the regularized IPS estimated empirical risk of (5), users are more likely to see results at the top of the ranking since each line of constraints corresponds to the rank of a than those further down. relevant document (minus 1). 5286
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) In particular, for any feasible (w, ξ) Propensity SVM-Rank Interleaving Experiment wins loses ties X rank(yi |y)−1 = 1w·[φ(xi ,y)−φ(xi ,yi )]>0 y6=yi against Prod 87 48 83 X against Naive SVM-Rank 95 60 102 ≤ max(1−w · [φ(xi , yi ) − φ(xi , y)], 0) y6=yi Table 1: Per-query balanced interleaving results for detecting rela- tive performance between the hand-crafted production ranker used X ≤ ξiy . for click data collection (Prod), Naive SVM-Rank and Propensity y6=yi SVM-Rank. We can solve this type of Quadratic Program efficiently via a one-slack formulation [Joachims, 2006], and we are us- We partition the click-logs into a train-validation split: the ing SVM-Rank with appropriate modifications to include IPS first 16 days are the train set and provide 5437 click-events, weights 1/qj . The code is available online1 . while the remaining 5 days are the validation set with 1755 In the empirical evaluation, we compare against the naive click events. The hyper-parameter C is picked via cross vali- application of SVM-Rank, which minimizes the rank of the dation. We use the IPS estimator for Propensity SVM-Rank, clicked documents while ignoring presentation bias. In par- and the naive estimator with Q(o(y) = 1|x, ȳ, r) = 1 for ticular, Naive SVM-Rank sets all the qi uniformly to the same Naive SVM-Rank. With the best hyper-parameter settings, constant (e.g., 1). we re-train on all 21 days worth of data to derive the final weight vectors for either method. 4 Empirical Evaluation We fielded these learnt weight vectors in two online inter- The original paper [Joachims et al., 2017] takes a two- leaving experiments [Chapelle et al., 2012], the first compar- pronged approach to evaluation. First, it uses synthetically ing Propensity SVM-Rank against Prod and the second com- generated click data to explore the behavior of our methods paring Propensity SVM-Rank against Naive SVM-Rank. The over the whole spectrum of presentation bias severity, click results are summarized in Table 1. We find that Propensity noise, and propensity mis-specification. Due to space con- SVM-Rank significantly outperforms the hand-crafted pro- straints, we do not include these experiments here, but focus duction ranker that was used to collect the click data for train- on a real-world experiment that evaluates our approach on ing (two-tailed binomial sign test p = 0.001 with relative risk an operational search engine using real click-logs from live 0.71 compared to null hypothesis). Furthermore, Propensity traffic. In particular, we examine the performance of Propen- SVM-Rank similarly outperforms Naive SVM-Rank, demon- sity SVM-Rank when learning a new ranking function for the strating that even a simple propensity model provides benefits Arxiv Full-Text Search2 based on real-world click logs from on real-world data (two-tailed binomial sign test p = 0.006 this system. The search engine uses a linear scoring func- with relative risk 0.77 compared to null hypothesis). Note tion f (x, y) = w · φ(x, y). The Query-document features that Propensity SVM-Rank not only significantly, but also φ(x, y) are represented by a 1000−dimensional vector, and substantially outperforms both other rankers in terms of ef- the production ranker used for collecting training clicks em- fect size – and the synthetic data experiments suggest that ploys a hand-crafted weight vector w (denoted Prod). Ob- additional training data will further increase its advantage. served clicks on rankings served by this ranker over a period of 21 days provide implicit feedback data for LTR as outlined in Section 3.2. 5 Conclusions and Future To estimate the propensity model, we consider the sim- This paper introduced a principled approach for learning-to- ple position-based model of Section 3.1 and we collect new rank under biased feedback data. Drawing on counterfac- click data via randomized interventions for 7 days as detailed tual modeling techniques from causal inference, we present a in [Joachims et al., 2017] with landmark rank k = 1. In theoretically sound Empirical Risk Minimization framework short, before presenting the ranking, we take the top-ranked for LTR. We instantiate this framework with a propensity- document and swap it with the document at a uniformly-at- weighted ranking SVM. Real-world experiments on a live random chosen rank j ∈ {1, . . . 21}. The ratio of observed search engine show that the approach leads to substantial re- click-through rates (CTR) on the formerly top-ranked doc- trieval improvements. ument now at position j vs. its CTR at position 1 gives a noisy estimate of pj /p1 in the position-based click model. Beyond the specific learning methods and propensity mod- We additionally smooth these estimates by interpolating with els we propose, this paper may have even bigger impact for the overall observed CTR at position j (normalized so that its theoretical contribution of developing the general coun- CT R@1 = 1). This yields pr that approximately decay with terfactual model for LTR, thus articulating the key compo- rank r, and the smallest pr ' 0.12. For r > 21, we impute nents necessary for LTR under biased feedback. First, the pr = p21 . Since the original paper appeared, another tech- insight that propensity estimates are crucial for ERM learn- nique for propensity estimation from observational data has ing opens a wide area of research on designing better propen- been proposed that could also be used [Wang et al., 2018]. sity models. Second, the theory demonstrates that LTR meth- ods should optimize propensity-weighted ERM objectives, 1 http://www.joachims.org/svm light/svm proprank.html raising the question of which other learning methods can be 2 http://search.arxiv.org:8081/ adapted. Third, we conjecture that propensity-weighted ERM 5287
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) approaches can be developed also for pointwise and listwise On Knowledge Discovery and Data Mining (KDD), pages LTR methods using techniques from Schnabel et al. [2016]. 217–226, 2006. Beyond learning from implicit feedback, propensity- [Little and Rubin, 2002] Roderick J. A. Little and Donald B. weighted ERM techniques may prove useful even for opti- Rubin. Statistical Analysis with Missing Data. John Wiley, mizing offline IR metrics on manually annotated test collec- 2002. tions. First, they can eliminate pooling bias, since the use of sampling during judgment elicitation puts us in a controlled [Liu, 2009] Tie-Yan Liu. Learning to rank for information setting where propensities are known (and can be optimized retrieval. Foundations and Trends in Information Re- [Schnabel et al., 2016]) by design. Second, propensities esti- trieval, 3(3):225–331, March 2009. mated via click models can enable click-based IR metrics like [Raman et al., 2013] Karthik Raman, Thorsten Joachims, click-DCG to better correlate with test set DCG. P. Shivaswamy, and Tobias Schnabel. Stable coactive This work was supported in part through NSF Awards learning via perturbation. In International Conference on IIS-1247637, IIS-1513692, IIS-1615706, and a gift from Machine Learning (ICML), pages 837–845, 2013. Bloomberg. We thank Maarten de Rijke, Alexey Borisov, [Richardson et al., 2007] Matthew Richardson, Ewa Domi- Artem Grotov, and Yuning Mao for valuable feedback and nowska, and Robert Ragno. Predicting clicks: Estimat- discussions. ing the click-through rate for new ads. In International Conference on World Wide Web (WWW), pages 521–530. References ACM, 2007. [Chapelle et al., 2012] Oliver Chapelle, Thorsten Joachims, [Rosenbaum and Rubin, 1983] Paul R. Rosenbaum and Filip Radlinski, and Yisong Yue. Large-scale validation Donald B. Rubin. The central role of the propensity score and analysis of interleaved search evaluation. ACM Trans- in observational studies for causal effects. Biometrika, actions on Information Systems (TOIS), 30(1):6:1–6:41, 70(1):41–55, 1983. 2012. [Schnabel et al., 2016] Tobias Schnabel, Adith Swami- [Chuklin et al., 2015] Aleksandr Chuklin, Ilya Markov, and nathan, Peter Frazier, and Thorsten Joachims. Unbiased Maarten de Rijke. Click Models for Web Search. Syn- comparative evaluation of ranking functions. In ACM thesis Lectures on Information Concepts, Retrieval, and International Conference on the Theory of Information Services. Morgan & Claypool Publishers, 2015. Retrieval (ICTIR), pages 109–118, 2016. [Hofmann et al., 2013] Katja Hofmann, Anne Schuth, Shi- [Sparck-Jones and van Rijsbergen, 1975] Karen Sparck- mon Whiteson, and Maarten de Rijke. Reusing historical Jones and Cornelis J. van Rijsbergen. Report on the need interaction data for faster online learning to rank for IR. In for and provision of an “ideal” information retrieval test International Conference on Web Search and Data Mining collection. Technical report, University of Cambridge, (WSDM), pages 183–192, 2013. 1975. [Horvitz and Thompson, 1952] Daniel Horvitz and Donovan [Swaminathan and Joachims, 2015] Adith Swaminathan and Thompson. A generalization of sampling without replace- Thorsten Joachims. Batch learning from logged bandit ment from a finite universe. Journal of the American Sta- feedback through counterfactual risk minimization. Jour- tistical Association, 47(260):663–685, 1952. nal of Machine Learning Research (JMLR), 16:1731– 1755, Sep 2015. Special Issue in Memory of Alexey Cher- [Imbens and Rubin, 2015] Guido Imbens and Donald Rubin. vonenkis. Causal Inference for Statistics, Social, and Biomedical Sciences. Cambridge University Press, 2015. [Vapnik, 1998] Vladimir Vapnik. Statistical Learning The- ory. Wiley, Chichester, GB, 1998. [Joachims et al., 2007] Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, Filip Radlinski, and Geri [Wang et al., 2011] Lidan Wang, Jimmy J. Lin, and Donald Gay. Evaluating the accuracy of implicit feedback from Metzler. A cascade ranking model for efficient ranked re- clicks and query reformulations in web search. ACM trieval. In ACM Conference on Research and Development Transactions on Information Systems (TOIS), 25(2), April in Information Retrieval (SIGIR), pages 105–114, 2011. 2007. [Wang et al., 2016] Xuanhui Wang, Michael Bendersky, [Joachims et al., 2017] Thorsten Joachims, Adith Swami- Donald Metzler, and Marc Najork. Learning to rank with nathan, and Tobias Schnabel. Unbiased learning-to-rank selection bias in personal search. In ACM Conference on with biased feedback. In ACM Conference on Web Search Research and Development in Information Retrieval (SI- and Data Mining (WSDM), pages 781–789, 2017. GIR). ACM, 2016. [Joachims, 2002] Thorsten Joachims. Optimizing search en- [Wang et al., 2018] Xuanhui Wang, Nadav Golbandi, gines using clickthrough data. In ACM SIGKDD Confer- Michael Bendersky, Donald Metzler, and Marc Najork. ence on Knowledge Discovery and Data Mining (KDD), Position bias estimation for unbiased learning to rank in pages 133–142, 2002. personal search. In Conference on Web Search and Data Mining (WSDM), 2018. [Joachims, 2006] Thorsten Joachims. Training linear SVMs in linear time. In ACM SIGKDD International Conference 5288
You can also read