Safe Weakly Supervised Learning - IJCAI
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21) Early Career Track Safe Weakly Supervised Learning Yu-Feng Li National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China liyf@nju.edu.cn Abstract al., 2006], which aims to learn a prediction model by leveraging a number of unlabeled data. Weakly supervised learning (WSL) refers to learn- ing from a large amount of weak supervision data. • Inexact supervised data, i.e., only coarse-grained labels This includes i) incomplete supervision (e.g., semi- are given. Reconsider the image categorization task, it is supervised learning); ii) inexact supervision (e.g., desirable to have every object in the images annotated, multi-instance learning) and iii) inaccurate super- however, usually we only have image-level labels rather vision (e.g., label noise learning). Unlike su- than object-level labels. One representative technique pervised learning which typically achieves perfor- for this scenario is multi-instance learning [Carbonneau mance improvement with more labeled data, WSL et al., 2018], which aims to improve the performance by may sometimes even degenerate performance with considering the coarse-grained label information. more weak supervision data. It is thus desired to • Inaccurate supervised data, i.e., the given labels have study safe WSL, which could robustly improve per- not always been ground-truth. Such situation occurs in formance with weak supervision data. In this ar- various tasks when the annotator is careless or weary, or ticle, we share our understanding of the problem the annotator is not an expert. For this type of label in- from in-distribution data to out-of-distribution data, formation, label noise learning techniques are one main and discuss possible ways to alleviate it, from the paradigm to learn a promising prediction from noisy la- aspects of worst-case analysis, ensemble-learning, bel [Frénay and Verleysen, 2014]. and bi-level optimization. We also share some open problems, to inspire future researches. In traditional machine learning, it is often expected that machine learning techniques, such as supervised learning, with the usage of more data will be able to improve learn- 1 Introduction ing performance. Such observation, however, no longer holds for weakly supervised learning. There are many stud- Machine learning has achieved great success in numerous ies [Li and Zhou, 2015; Li et al., 2017; Guo and Li, 2018; tasks, particularly in supervised learning such as classifica- Oliver et al., 2018] reporting that the usage of weakly su- tion and regression. But most successful techniques, such as pervised data may sometimes lead to performance degrada- deep learning [LeCun et al., 2015], require ground-truth la- tion, that is, the learning performance is even worse than that bels to be given for a big training data set. It is noteworthy of baseline methods without using weakly supervised data. that in many tasks, however, it can be difficult to attain strong More specifically, semi-supervised learning using unlabeled supervision due to the fact that the hand-labeled data sets are data may be worse than vanilla supervised learning with only time-consuming and expensive to collect. Thus, it is desirable limited labeled data [Li and Zhou, 2015; Li et al., 2016]. for machine learning techniques to be able to work well with Multi-instance learning may be outperformed by the naive weakly supervised data [Zhou, 2017]. learning methods which simply assign the coarse-grained la- Compared to the data in traditional supervised learning, bel to a bag of instances [Carbonneau et al., 2018]. Label weakly supervised data does not have a large amount of pre- noise learning may be worse than that of learning from a cise label information. Specifically, three types of weakly small amount of high-quality labeled data [Frénay and Ver- supervised data commonly exist [Zhou, 2017]. leysen, 2014]. Such phenomena undoubtedly go against of • Incomplete supervised data, i.e., only a small subset of the expectation of WSL and limits its effectiveness in a large training data is given with labels whereas the other data number of practical tasks. remain unlabeled. For example, in image categorization, Building a safe WSL, that is to say, WSL using extra it is easy to get a huge number of images fro the Internet, weakly supervised data will not be inferior to a simple su- whereas only a small subset of images can be annotated pervised learning model, it the Holy Grail of WSL [Chapelle due to the annotation cost.Representative techniques for et al., 2006; Li and Zhou, 2015; Zhou, 2017]. Since the prob- this situation are semi-supervised learning [Chapelle et lem was pointed out in [Cozman et al., 2003], there are many 4951
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21) Early Career Track attempts trying to solve this important and challenging prob- By assuming that the ground-truth label assignment f ∗ can lem. In this article, we will review the recent developments be realized as a convex combination of base learners, specif- Pb on safe WSL, and share our contributions on two aspects of ically, f ∗ = i=1 αi fi where α = [α1 ; α2 ; · · · ; αb ] ≥ 0 be safe WSL: Pb the weight of base learners and i=1 αi = 1, then we have • For WSL with in-distribution data, we proposed a gen- the following objective function: eral ensemble scheme that maximize the performance b b gain in the worse-case to improve the safeness of WSL. X X max `(f0 , αi fi ) − `(f , α i fi ) (1) • For WSL with out-of-distribution (OOD) data, we give a f i=1 i=1 particular focus on SSL with unseen class unlabeled data This is in line with our goal that is to find a prediction f that and propose a bi-level optimization based framework to maximizes the performance gain against the baseline f0 . alleviate the potential performance hurt caused by OOD In practice, however, on may be hard to know about the unlabeled examples. precise weight of base learners. We further assume that α We will also discuss some open challenges in real-world is from a convex set M to make the proposal more practi- applications that may have been less noticed and desire more cal, where M captures the priors knowledge about the im- attentions. portance of base learners. Without any further information to locate the weight of base learners, to guarantee the safeness, 2 Safe WSL with In-Distribution Data we aim to optimize the worse-case performance gain, since, WSL with in-distribution data, i.e., all supervised data and intuitively, the algorithm would be robust as long as the good weakly supervised data are drawn from a same distribution, is performance is guaranteed in the worst case. Then we obtain the most natural situation. [Cozman et al., 2003] pointed out a general formulation for weakly supervised data as, that WSL could suffer performance degradation problem with b X b X in-distribution data. There are multiple reasons, for example, max min `(f0 , αi fi ) − `(f , α i fi ) (2) f α∈M the adopted assumption of WSL algorithm is not suitable for i=1 i=1 the data distribution [Chapelle et al., 2006]; there are many We have the following theorem to guarantees the safeness candidate large-margin decision boundaries existing in semi- of our proposal for commonly used convex loss functions supervised support vector machine (SVM) and prior knowl- in both classification and regression tasks, e.g., hinge loss, edge is insufficient to help choose the best one [Li and Zhou, cross-entropy loss, mean-square loss, etc. 2015], and so on. Some attempts have been devoted to this problem [Li and Theorem 1. Suppose the ground-truth f ∗ can be constructed Pb Zhou, 2015; Loog, 2015; Li et al., 2017; Krijthe and Loog, by base learners, i.e., f ∗ ∈ {f | i=1 αi fi , α ∈ M}. Let f̂ 2017]. For example, [Li and Zhou, 2015] builds safe semi- and α̂ be the optimal solution to Eq.(2). We have `(f̂ , f ∗ ) ≤ supervised SVMs through optimizing the worst-case perfor- `(f0 , f ∗ ) and f̂ has already achieved the maximal perfor- mance gain given a set of candidate low-density separators. mance gain against f0 . [Loog, 2015] proposes to maximize the likelihood gain over Theorem 1 show that Eq.(2) is a reasonable formulation a supervised model in the worst-case for generative models. [Balsubramani and Freund, 2015] proposes to learn a robust for our purpose, that is, the derived optimal solution f̂ from prediction given that the ground-truth label assignment is re- Eq.(2) often outperforms f0 and it would not get any worse stricted to a specific candidate set. [Wei et al., 2018] study than f0 . safe multi-label learning of weakly labeled data. They opti- The objective formulation can be globally and efficiently mize multi-label evaluation metrics (F1 score and Top-k pre- addressed via a simple convex quadratic program or linear cision) given that the ground-truth label assignment is real- program. For example, with mean square loss, the objective ized by a convex combination of base multi-label learners. can be equivalently written as More introductions can be found in our recent summary [Li min α> Fα − v> α (3) α∈M and Liang, 2019]. To address this problem, we propose a general ensemble where F ∈ Rb×b is a linear kernel matrix of fi ’s, i.e., Fij = learning scheme, S AFE W (SAFE Weakly supervised learn- fi> fj , ∀1 ≤ i, j ≤ b and v = [2f1> f0 ; . . . ; 2fb> f0 ]. Since F is ing) [Li et al., 2021], which learning prediction by integrat- positive semi-definite, Eq.(3) is convex and can be efficiently ing multiple weakly supervised learners. Specifically, we solved. After solving the optimal solution α∗ , the optimal propose a maximin framework, which maximize the perfor- Pb f̄ = i=1 αi∗ fi can be obtained. mance gain in the worse case. Suppose we have obtained b Moreover, the optimization can be written as a geometric predictions {f1 , · · · , fb } generated by base weakly supervised Pb projection problem. Specifically, let Ω = {f | i=1 αi fi , α ∈ learners, let f0 denote the prediction of baseline approaches, i.e., directly supervised learning with only limited labeled M}, f̄ can be rewritten as, data. Our ultimate goal is here to derive a safe prediction f̄ = arg min kf − f0 k2 , (4) f = g({f1 , · · · , fb }, f0 ), which often outperforms the base- f ∈Ω line f0 , meanwhile it would not be worse than f0 . In other which learns a projection of f0 onto the convex set Ω. Fig- words, we wold like to maximize the performance gain be- ure 1 illustrates the intuition of our proposed method via the tween our prediction and the baseline prediction. viewpoint of geometric projection. 4952
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21) Early Career Track The goal of SSL is to learn a model h(x; θ) : {X ; Θ} → Y parameterized by θ ∈ Θ from training data to minimize the generalization risk R(h) = E(X,Y ) [`(h(X; θ), Y )], where ` : Y × Y → R refers to certain loss function, e.g., mean squared error or cross entropy loss. The way SSL utilizes unlabeled data structures is usually through the introduction of regularization. The objective of SSL is typically formulated as following: Xn min `(h(xi ; θ), yi ) + Ω(x; θ) s.t. x ∈ Dl ∪ Du . (5) θ∈Θ i=1 where Ω(x; θ) refers to the regularization term, e.g., entropy- minimization regularization [Grandvalet and Bengio, 2005], consistency regularization [Sohn et al., 2020]. Figure 1: Illustration of the intuition of our proposal via the pro- Different from the existing SSL techniques which use all jection viewpoint. Intuitively, the proposal learns a projection of f0 unlabeled data, DS3L uses it selectively and keeps tracking onto a convex feasible set Ω. the effect of the supervised learning model to prevent perfor- mance hazards. Meanwhile, DS3L uses beneficial unlabeled data as much as possible to improve performance, preventing It is noteworthy that, compared with previous studies performance gains from being too conservative. in [Li and Zhou, 2015; Balsubramani and Freund, 2015; On one hand, DS3L uses the unlabeled selectively. The Li et al., 2017], the S AFE W framework brings multiple ad- main methodology is to design a weighting function w : vantages to safe WSL. i) It can be shown that the proposal RD → R parameterized by α ∈ Bd that maps an instance is probably safe as long as the ground-truth label assignment can be expressed as a convex combination of base learners. to a weight. Then, DS3L tries to find the optimal θ̂(α) that In contrast to [Li and Zhou, 2015] which requires that the minimizes the corresponding weighted empirical risk, ground-truth is one of the base learners, the condition in The- Xn n+m X orem 1 is looser and more practical. ii) Prior knowledge re- θ̂(α) = min `(h(xi ; θ), yi ) + w(xi ; α)Ω(xi ; θ) θ∈Θ lated to the weight of base learners can be easily embedded i=1 i=n+1 in this framework. iii) The framework is readily applicable (6) for many loss functions in both classification and regression, where θ̂(α) is denoted as the model trained with the weight which is more general in contrast to [Li et al., 2017] that fo- function paramaterized by α. cuses on regression. iv) The proposed formulation can be On the other hand, DS3L keeps tracking supervised per- globally and efficiently addressed and have intuitive geome- formance to prevent performance degradation. Specifically, try interpretation. DS3L requires that the model returned by the weighted em- pirical risk process should maximize the generalization per- 3 Safe WSL with OOD Data formance, i.e., Previous WSL studies are based on a basic assumption that α∗ = argmin E(X,Y ) [`(h(X; θ̂(α)), Y )] (7) α∈Bd labeled data and weakly supervised data come from the same distribution. Such an assumption is difficult to hold in many In real practice, the distribution is unknown, similar to the practical applications, among which one common case is that empirical risk minimization, DS3L tries to find the optimal OOD unlabeled data that contains classes that are not seen parameters α̂ such that the model returned by optimizing the in the labeled set occurs for SSL. For example, in medi- weighted instance loss, should also have a good performance cal diagnosis, unlabeled medical images often contain dif- on the labeled data which acts as an unbiased and reliable ferent foci from the diseases to be diagnosed. Faced with estimation of the underlying distribution, i.e., n the OOD weakly supervised data, WSL no longer works well X and may even be accompanied by severe performance degra- α̂ = argmin `(h(xi ; θ̂(α)), yi ) (8) α∈Bd dation [Oliver et al., 2018]. i=1 Efforts on safe WSL with OOD data remains to be lim- To simplify the notation, we denote θ̂(α) as θ̂. Taking both ited. We have made particularly efforts to safe SSL problem the Eq.(6) and Eq.(8) into consideration, the objective of our and proposes a simple and effective safe deep SSL frame- framework can be formulated as the following bi-level opti- work DS3L (Deep Safe Semi-Supervised Learning) [Guo et mization problem, al., 2020]. Xn Specifically, in SSL scenarios, we are given a set of train- min `(h(xi ; θ̂), yi ) (9) ing data from an unknown distribution, which includes n la- α∈Bd i=1 beled instances Dl = {(x1 , y1 ), · · · , (xn , yn )} and m unla- s.t. beled instances Du = {xn+1 , · · · , xn+m }. x ∈ X ∈ RD , n X n+m X y ∈ Y = {1, · · · , C} where D is the number of input di- θ̂ = argmin `(h(xi ; θ), yi ) + w(xi ; α)Ω(xi ; θ) mension and C is the number of output class in labeled data. θ∈Θ i=1 i=n+1 4953
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21) Early Career Track Eq.(9) can be understood by two stages: first, DS3L seeks et al., 2014], however, the issue of safeness remains the optimal model parameter θ̂ via the weighted empirical risk an open problem for weakly-supervised learning in dy- minimization, then evaluates it on n labeled instances and op- namic environments, e.g., an interesting problem is timizes the weight function parameter α to make the learned θ̂ when the unlabeled data are useful in online learning. to achieve better reliable performance. Moreover, the bi-level • Automated safe WSL [Feurer et al., 2015]. AutoML, optimization can be efficiently solved via stochastic gradient which seeks to build an appropriate machine learn- descent methods [Ren et al., 2018]. ing model for an unseen dataset in an automatic man- In order to show the safeness of DS3L, we analyze the ner (without human intervention), has received increas- empirical risk of DS3L compared with the simple supervised ing attention recently. However, existing AutoML sys- method and obtain the following theorem, tems focus on supervised learning, and existing AutoML Let θSL be the supervised model, i.e., θSL = Theorem 2. P techniques could not directly be used for the automated n arg minθ∈Θ i=1 `(h(xi ; θ), yi ). Define the empirical risk WSL problem. Efforts on automated WSL, remain lim- as: ited right now. Automated WSL introduces some new n 1X challenges, e.g., various meta-features extracted from R̂(θ) = [`(h(xi ; θ), yi )] limited number of supervised data are no longer avail- n i=1 able and suitable; the use of auxiliary weakly supervised Then we have the empirical risk of θ̂ returned by DS3L to examples may sometimes even be outperformed by di- be never worse than θSL that is learned from merely labeled rect supervised learning. Therefore, safeness if one of data, i.e., R̂(θ̂) ≤ R̂(θSL ). the crucial aspects of AutoWSL, since it is not desir- able to have an automated yet performance degenerated Theorem 2 reveals that compared with previous SSL meth- WSL system. [Li et al., 2019] first present an auto- ods, DS3L can achieve safeness in terms of empirical risk, mated learning system for SSL. They incorporate meta- i.e., the performance is not worse than its supervised counter- learning with enhanced meta-features to help searching part, with the learned α. well-perform instantiations, and a large margin separa- We further analyze the generalization risk of DS3L to bet- tion method to fine-tune the hyper-parameters as well as ter understand the effect of the parameter dimension and the alleviate performance deterioration. More efforts are ex- size of labeled data to α and drive the following theorem, pected to be devoted to this direction. Theorem 3. Assume that the loss function is λ-Lipschitz con- • Safe deep WSL. Although we have introduced S AFE W tinuous w.r.t. α. Let α ∈ Bd be the parameter of example and other related methods that aim to solve the safe WSL weighting function w in a d-dimensional unit ball. Let n be problem with in-distribution data. However, current safe the labeled data size. Define the generalization risk as: WSL studies typically work on shallow models such as R(θ) = E(X,Y ) [`(h(X; θ), Y )] support vector machines, logistic regression, linear re- gression, etc. Applying WSL techniques to deep neural Let α∗ = arg maxα∈Bd R(θ̂(α)) be the optimal parameter in networks has attracted much attention in recent years for the unit ball, and α̂ = arg maxα∈A R̂(θ̂(α)) be the empir- the promising results achieved by deep models. How- ically optima among a candidate set A. With probability at ever, studies of safe WSL with deep neural networks re- least 1 − δ we have, main to be limited. It is expected to design an efficient p scheme for safe deep WSL. ∗ (3λ + 4d ln(n) + 8 ln(2/δ)) R(θ̂(α )) ≤ R(θ̂(α̂)) + √ • Safe imbalanced WSL. Previous WSL studies typically n assume a balanced class distribution in both labeled sets Theorem 3 establishespthat DS3L approaches the optimal and unlabeled sets. However, it is well-known that real- weight in the order O( d ln(n)/n). Based on theorem 2 world dataset is often imbalanced or long-tailed. The and theorem 3, from both the safeness and generalization, it performance of previous WSL studies is seriously de- is reasonable to expect that DS3L can achieve better gen- creased when the class distribution is imbalanced since eralization performance compared with baseline supervised their predictions are biased toward majority classes and learning methods. result in low recall on minority classes [Kim et al., 2020]. There are some efforts begin to address the im- balanced SSL problem [Kim et al., 2020]. But how to 4 Open Problems achieve safe performance for imbalanced WSL is still Although significant progress has been made in safe WSL under study and remains an open problem. with in-distribution data and out-of-distribution data, there still remain many open problems in this area. Acknowledgments • Safe WSL in dynamic environments. Learning in dy- namic environments is far more difficult than in static This work was partially supported by NSFC (61772262), and ones. The challenges come from distribution drift, new the Collaborative Innovation Center of Novel Software Tech- class emerging, feature space change, and so on. There nology and Industrialization. The author would like to thanks are some studies trying to tackle these problems [Da Lan-Zhe Guo for improving the paper. 4954
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21) Early Career Track References [Li and Liang, 2019] Yu-Feng Li and De-Ming Liang. Safe [Balsubramani and Freund, 2015] Akshay Balsubramani semi-supervised learning: a brief introduction. Frontiers and Yoav Freund. Optimally combining classifiers using Computer Science, 13(4):669–676, 2019. unlabeled data. In Proceedings of the 28th Conference on [Li and Zhou, 2015] Yu-Feng Li and Zhi-Hua Zhou. To- Learning Theory, pages 211–225, 2015. wards making unlabeled data never hurt. IEEE Trans- [Carbonneau et al., 2018] Marc-André Carbonneau, actions on Pattern Analysis and Machine Intelligence, Veronika Cheplygina, Eric Granger, and Ghyslain 37(1):175–188, 2015. Gagnon. Multiple instance learning: A survey of problem [Li et al., 2016] Yu-Feng Li, James T Kwok, and Zhi-Hua characteristics and applications. Pattern Recognition, Zhou. Towards safe semi-supervised learning for mul- 77:329–353, 2018. tivariate performance measures. In Proceedings of the [Chapelle et al., 2006] Olivier Chapelle, Bernhard AAAI conference on Artificial Intelligence, pages 1816– Schölkopf, and Alexander Zien, editors. Semi-Supervised 1822, Phoenix, AZ, 2016. Learning. The MIT Press, 2006. [Li et al., 2017] Yu-Feng Li, Han-Wen Zha, and Zhi-Hua [Cozman et al., 2003] Fábio Gagliardi Cozman, Ira Cohen, Zhou. Learning safe prediction for semi-supervised re- and Marcelo Cesar Cirelo. Semi-supervised learning of gression. In Proceedings of the AAAI Conference on Arti- mixture models. In Proceedings of the 20th International ficial Intelligence, pages 2217–2223, San Francisco, CA, Conference on Machine Learning, pages 99–106, Wash- 2017. ington, DC, 2003. [Li et al., 2019] Yu-Feng Li, Hai Wang, Tong Wei, and Wei- [Da et al., 2014] Qing Da, Yang Yu, and Zhi-Hua Zhou. Wei Tu. Towards automated semi-supervised learning. In Learning with augmented class by exploiting unlabeled Proceedings of The Thirty-Third AAAI Conference on Ar- data. In Proceedings of the 28th AAAI Conference on Ar- tificial Intelligence, pages 4237–4244, 2019. tificial Intelligence, pages 1760–1766, 2014. [Li et al., 2021] Yu-Feng Li, Lan-Zhe Guo, and Zhi-Hua [Feurer et al., 2015] Matthias Feurer, Aaron Klein, Katha- Zhou. Towards safe weakly supervised learning. IEEE rina Eggensperger, Jost Tobias Springenberg, Manuel Transactions on Pattern Analysis and Machine Intelli- Blum, and Frank Hutter. Efficient and robust automated gence, 43(1):334–346, 2021. machine learning. In Advances in Neural Information Pro- [Loog, 2015] Marco Loog. Contrastive pessimistic likeli- cessing Systems, pages 2962–2970, 2015. hood estimation for semi-supervised classification. IEEE [Frénay and Verleysen, 2014] Benoˆit Frénay and Michel Transactions on Pattern Analysis and Machine Intelli- Verleysen. Classification in the presence of label noise: gence, 38(3):462–475, 2015. a survey. IEEE Transactions on Neural Networks and [Oliver et al., 2018] Avital Oliver, Augustus Odena, Colin Learning Systems, 25(5):845–869, 2014. Raffel, Ekin Dogus Cubuk, and Ian J. Goodfellow. Re- [Grandvalet and Bengio, 2005] Yves Grandvalet and Yoshua alistic evaluation of deep semi-supervised learning algo- Bengio. Semi-supervised learning by entropy minimiza- rithms. In Advances in Neural Information Processing tion. In Advances in Neural Information Processing Sys- Systems, pages 3239–3250, 2018. tems, pages 529–536, 2005. [Ren et al., 2018] Mengye Ren, Wenyuan Zeng, Bin Yang, [Guo and Li, 2018] Lan-Zhe Guo and Yu-Feng Li. A gen- and Raquel Urtasun. Learning to reweight examples for eral formulation for safely exploiting weakly supervised robust deep learning. In Proceedings of the 35th Inter- data. In Proceedings of the AAAI conference on Artificial national Conference on Machine Learning, pages 4331– Intelligence, pages 3126–3133, New Orleans, LA, 2018. 4340, 2018. [Guo et al., 2020] Lan-Zhe Guo, Zhen-Yu Zhang, Yuan [Sohn et al., 2020] Kihyuk Sohn, David Berthelot, Nicholas Jiang, Yu-Feng Li, and Zhi-Hua Zhou. Safe deep semi- Carlini, Zizhao Zhang, Han Zhang, Colin Raffel, Ekin Do- supervised learning for unseen-class unlabeled data. In gus Cubuk, Alexey Kurakin, and Chun-Liang Li. Fix- Proceedings of The 37th International Conference on Ma- match: Simplifying semi-supervised learning with consis- chine Learning, pages 3897–3906, 2020. tency and confidence. In Advances in Neural Information [Kim et al., 2020] Jaehyung Kim, Youngbum Hur, Sejun Processing Systems, pages 596–608, 2020. Park, Eunho Yang, Sung Ju Hwang, and Jinwoo Shin. Dis- [Wei et al., 2018] Tong Wei, Lan-Zhe Guo, Yu-Feng Li, and tribution aligning refinery of pseudo-label for imbalanced Wei Gao. Learning safe multi-label prediction for weakly semi-supervised learning. In Advances in Neural Informa- labeled data. Machine Learning, 107(4):703–725, 2018. tion Processing Systems, pages 14567–14579, 2020. [Zhou, 2017] Zhi-Hua Zhou. A brief introduction to weakly [Krijthe and Loog, 2017] Jesse H Krijthe and Marco Loog. supervised learning. National Science Review, 5(1):44–53, Projected estimators for robust semi-supervised classifica- 2017. tion. Machine Learning, 106(7):993–1008, 2017. [LeCun et al., 2015] Yann LeCun, Yoshua Bengio, and Ge- offrey Hinton. Deep learning. Nature, 521(7553):436– 444, 2015. 4955
You can also read