Context-Conditional Adaptation for Recognizing Unseen Classes in Unseen Domains
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Context-Conditional Adaptation for Recognizing Unseen Classes in Unseen Domains Puneet Mangla ∗1 , Shivam Chandhok ∗ 2 , Vineeth N Balasubramanian1 and Fahad Shahbaz Khan2 1 Department of Computer Science, Indian Institute of Technology, Hyderabad 2 Mohamed bin Zayed University of Artificial Intelligence pmangla261@gmail.com, shivam.chandhok@mbzuai.ac.ae, vineethnb@cse.iith.ac.in, fahad.khan@mbzuai.ac.ae Abstract arXiv:2107.07497v1 [cs.CV] 15 Jul 2021 one of them will occur – there is a need to make systems ro- bust to the domain and semantic shifts simultaneously. The Recent progress towards designing models that can problem of tackling domain shift and semantic shift together, generalize to unseen domains (i.e domain gener- as considered herein, can be grouped as the Zero-Shot Do- alization) or unseen classes (i.e zero-shot learn- main Generalization (which we call ZSLDG) problem setting ing) has embarked interest towards building mod- [Mancini et al., 2020; Maniyar et al., 2020]. In ZSLDG, the els that can tackle both domain-shift and semantic model has access to a set of seen classes from multiple source shift simultaneously (i.e zero-shot domain general- domains and has to generalize to unseen classes in unseen ization). For models to generalize to unseen classes target domains at test time. Note that this problem of recog- in unseen domains, it is crucial to learn feature nizing unseen classes in unseen domains (ZSLDG) is much representation that preserves class-level (domain- more challenging than tackling zero-shot learning (ZSL) or invariant) as well as domain-specific information. domain generalization (DG) separately [Mancini et al., 2020] Motivated from the success of generative zero- and growingly relevant as deep learning models get reused shot approaches, we propose a feature generative across domains. A recent approach, CuMix [Mancini et al., framework integrated with a COntext COnditional 2020], proposed a methodology that mixes up multiple source Adaptive (COCOA) Batch-Normalization to seam- domains and categories available during training to simulate lessly integrate class-level semantic and domain- semantic and domain shift at test time. This work also es- specific information. The generated visual fea- tablished a benchmark dataset, DomainNet, for the ZSLDG tures better capture the underlying data distribu- problem setting with an evaluation protocol, which we follow tion enabling us to generalize to unseen classes in this work for a fair comparison. and domains at test-time. We thoroughly evalu- Feature generation methods [Xian et al., 2018; Ni et al., 2019; ate and analyse our approach on established large- Schonfeld et al., 2019; Mishra et al., 2018; Felix et al., 2018; scale benchmark - DomainNet and demonstrate Huang et al., 2019; Keshari et al., 2020; Narayan et al., 2020; promising performance over baselines and state-of- Chandhok and Balasubramanian, 2021; Shen et al., 2020] art methods. have recently shown significant promise in traditional zero- shot or few-shot learning settings and are among the state-of- 1 Introduction the-art today for such problems. However, these approaches assume that the source domains (during training) and the tar- The dependence of deep learning models on large amounts get domain (during testing) are the same and thus aim to gen- of data and supervision creates a bottleneck and hinders their erate features that only address semantic shift. They rely on utilization in practical scenarios. There is thus a need to equip the assumption that the visual-semantic relationship learned deep learning models with the ability to generalize to unseen during training will generalize to unseen classes at test-time. domains or classes at test-time using data from other related However, there is no guarantee that this holds for images in domains or classes (where data is abundant). novel domains unseen during training [Mancini et al., 2020]. There has been great interest and corresponding efforts Thus, when used for addressing ZSLDG, the aforementioned in recent years towards tackling domain shift (a.k.a Domain methods lead to suboptimal performance. Generalization) [Yang and Gao, 2013; Muandet et al., 2013; To tackle domain shift at test-time, it is important to gen- Xu et al., 2014; Li et al., 2017; Ghifary et al., 2015; erate feature representations that encode both class level Li et al., 2018b; Carlucci et al., 2019; Li et al., 2018a; (domain-invariant) and domain-specific information [Chat- Li et al., 2019a] and semantic shift (a.k.a Zero-Shot Learn- topadhyay et al., 2020; Seo et al., 2020]. Thus we con- ing) [Reed et al., 2016; Frome et al., 2013; Wan et al., 2019; jecture that generating features that are consistent with only Kodirov et al., 2015; Zhang et al., 2017] separately. How- class-level semantics is not sufficient for addressing ZSLDG. ever, applications in the real world do not guarantee that only However, encoding domain-specific information along with ∗ Equal Contribution class information in generated features is not straightfor-
ward [Mancini et al., 2020]. Drawing inspiration from these 2.2 Generative Module observations, we propose a unified generative framework We learn a generative model which comprises of a genera- that uses COntext COnditional Adaptive (COCOA) Batch- tor G : Z × A × D → F and a projection discriminator Normalization to seamlessly integrate semantic and domain [Miyato and Koyama, 2018] (which uses a projection layer information to generate visual features of unseen classes in to incorporate conditional information into the discriminator) unseen domains. Depending on the setting, “context” can D : F × A × D → R as shown in Fig 1. We represent this qualify as domain (for DG), semantics (for ZSL), or both discriminator as D = Dl ◦Df (◦ denotes composition) where (in case of ZSLDG). Since this work primarily deals with Df is discriminator’s last feature layer and Dl is the final lin- the ZSLDG setting, ’context’ in this work refers to both do- ear layer. Both the generator and the projection discriminator main and semantic information. We perform experiments and are conditioned through a Context Conditional Adaptive (CO- analysis on standard ZSLDG benchmark: DomainNet and COA) Batch-Normalization module, which provides class- demonstrate that our proposed methodology provides state- specific and domain-specific modulation at various stages of-the-art performance for the ZSLDG setting and is able during the generation process. Modulating the feature in- to encode both semantic and domain-specific information to formation by such semantic and domain-specific embeddings generate features, thus capturing the given context better. To helps to integrate respective information into the features, the best of our knowledge, this is the first generative approach thereby enabling better generalization. We use the available to address the ZSLDG problem setting. (seen) semantic attributes asy that capture class-level charac- teristics, for encoding semantic information. However, such a 2 COCOA: Proposed Methodology representation that encodes domain information is not present Our goal is to train a classifier C which can tackle domain- for individual source domains. We hence define learnable shift as well as semantic shift simultaneously and recognize (randomly initialized and optimized during training) domain unseen classes in unseen domains at test-time. Let S T r = embedding matrices Egen = {egen gen gen gen 1 , e2 , e3 ..., .eK } and {(x, y, asy , d)|x ∈ X , y ∈ Y s , asy ∈ A, d ∈ Ds } denote Edisc = {edisc1 , edisc 2 , edisc 3 ....edisc K } for the generator and the the training set, where x is a seen class image in the visual discriminator respectively. space (X ) with corresponding label y from a set of seen class The generator G takes in noise z ∈ Z and a context vec- labels Y s . asy denotes the class-specific semantic representa- tor c, and outputs visual features f̂ ∈ F . The context vector tion for seen classes. We assume access to domain labels d c, which is the concatenation of class-level semantic attribute for a set of source domains Ds with cardinality K. The test- asy and domain-specific embedding egen i , is provided as input set is denoted by S T s = {(x, y, auy , d)|x ∈ X , y ∈ Y u , auy ∈ to a BatchNorm estimator network Bgen l : A×D → R2×h (h A, d ∈ Du } where Y u is the set of labels for unseen classes is the dimension of layer l activation vectors) which outputs and Du represents the set of unseen target domains. Note that batchnorm paramaters, γgen l l and βgen for the l-th layer. Sim- standard zero-shot setting (tackles only semantic-shift) com- ilarly, the discriminator’s feature extractor Df (.) has a sep- plies with the condition Y s ∩ Y u ≡ ∅ and Ds ≡ Du . The arate BatchNorm estimator network Bdisc l : D → R2×h to standard DG setting (tackles only domain-shift), on the other enable domain-specific context modulation of its batchnorm hand, works under the condition Y s ≡ Y u and Ds ∩ Du ≡ ∅. l parameters, γdisc l , βdisc as shown in Fig 1. In this work, our goal is to address the challenging ZSLDG l l Formally, let fgen and fdisc denote feature activations be- setting where Y s ∩ Y u ≡ ∅ and Ds ∩ Du ≡ ∅. longing to domain d and semantic attribute asy , at l-th layer of Our overall framework to address ZSLDG is summarized generator G and discriminator D respectively. We modulate in Fig 1. We employ a three-stage pipeline, which we de- l l fgen and fdisc individually using Context Conditional Adap- scribe below. tive Batch-Normalization as follows: l 2.1 Generalizable Feature Extraction l l l l+1 l fgen − µlgen l γgen , βgen ← Bgen (c) , fgen ← γgen ·p + βgen l (σgen )2 + We extract visual features f that encode discriminative class- level cues (domain-invariant) as well as domain-specific in- where c = [asy , egen d ] formation by training a visual encoder f (.), which is then f l − µldisc l l l used to train a semantic projector p(.). Both these modules γdisc , βdisc ← Bdisc (edisc d l+1 ) , fdisc l ← γdisc · pdisc l + βdisc l are trained with images from all source domains available. (σdisc )2 + The visual encoder and the semantic projector are trained to (2) minimize the following loss: Here, (µlgen , (σgen l )2 ) and (µldisc , (σdisc l )2 ) are the mean and LAGG = E(x,y)∼S T r LCE (aT p(f (x)), y) + R(f ) (1) variance of activations of the mini-batch (also used to update l l running statistics) containing fgen and fdisc respectively. c where LCE is standard cross-entropy loss and a = denotes the context vector composed of semantic and domain [as1 , ..., as|Y s | ]. R(f ) denotes the regularization loss term. embeddings and [·, ·] denotes the concatenation operation. which is used to further improve the visual features and en- Finally, the generator and discriminator are trained to opti- hance their generalizability by regulating the amount of class- mize the adversarial loss given by: level semantic and domain-specific information in the fea- h i LD = E(x,asy ,d)∼S T r max(0, 1 − D(f (x), asy , edisc d )) tures f . The regularizer block primarily uses rotation predic- h i (3) tion (viz. providing a rotated image as input, and predicting + Ey∼Y s ,d∼Ds ,z∼Z max(0, 1 + D(f̂ , asy , edisc d )) rotation angle).
Figure 1: The proposed pipeline consists of three stages: (1) Generalizable Feature Extraction; (2) Generative Module; and (3) Recognition and Inference. The first stage trains a visual backbone f (.) to extract visual feature f that encodes discriminative class-level (domain- invariant) and domain-specific information. The second stage learns a generative model G which uses COntext COnditioned Adaptive (COCOA) batch-normalization layers to fuse and integrate semantic and domain-specific information into the generated features f̂ . Lastly, the third stage generates visual features for unseen classes across domains and trains a softmax classifier LG = Ey∼Y s ,d∼Ds ,z∼Z [−D(f̂ , asy , edisc d ) Method Target Domain (4) DG ZSL clipart infograph painting quickdraw sketch Avg. + λG · LCE (aT p(f̂ ), y)] DEVISE 20.1 11.7 17.6 6.1 16.7 14.4 - ALE 22.7 12.7 20.2 6.8 18.5 16.2 T SPNet 26.0 16.9 23.8 8.2 21.8 19.4 where D(f , a, e) = a Df (f , e)+Dl (Df (f , e)) is the projec- DEVISE 20.5 10.4 16.4 7.1 15.1 13.9 tion term , f̂ = G(z, c) denotes generated feature. The second DANN ALE 21.2 12.5 19.7 7.4 17.9 15.7 SPNet 25.9 15.8 24.1 8.4 21.3 19.1 term, LCE (aT p(f̂ ), y) in LG ensures that the generated fea- DEVISE 21.6 13.9 19.3 7.3 17.2 15.9 tures have discriminative properties (λG is a hyperparameter). EpiFCR ALE 23.2 14.1 21.4 7.8 20.9 17.5 p(.) is the semantic projector that was trained in the previous SPNet 26.4 16.7 24.6 9.2 23.2 20.0 stage. Mixup-img-only Mixup-two-level 25.2 26.6 16.3 17 24.4 25.3 8.7 8.8 21.7 21.9 19.2 19.9 CuMix 27.6 17.8 25.5 9.9 22.6 20.7 2.3 Recognition and Inference f-clsWGAN 20.0 13.3 20.5 6.6 14.9 15.1 CuMix + f-clsWGAN 27.3 17.9 26.5 11.2 24.8 21.5 We freeze the generator and aim to synthesize visual features ROT + f-clsWGAN 27.5 17.4 26.4 11.4 24.6 21.4 for unseen classes across different domains, which are then COCOAAGG 27.6 17.1 25.7 11.8 23.7 21.2 used to train classifier C. To this end, we concatenate the do- COCOAROT 28.9 18.2 27.1 13.1 25.7 22.6 main embeddings egen i , (i ∈ Ds ) of the source domains from Table 1: Performance comparison with established baselines and matrix Egen (learned end-to-end with the generative model) state-of-art methods for ZSLDG problem setting on benchmark Do- and the semantic attributes/representations of unseen classes mainNet dataset. For fair comparison, all reported results follow the auy to get the context vector c, which in turn is input to the backbones, protocol and splits as established in CuMix. Best results trained batchnorm predictor network Bgen l . The output batch- are highlighted in bold and second best results are underlined. l l norm parameters (γgen , βgen ) are used in the batchnorm lay- Next, we train a MLP softmax classifier, C(.) on the generated ers of the pre-trained generator to generate features f̂ . This u multi-domain unseen class features dataset Fint by minimiz- enables us to generate features that are consistent with unseen ing: LCLS = E(f̂int ,y)∼F u [LCE (C(f̂int ), y)] class semantics and also encode domain information from in- int Classifying image at test time. At test-time, given a test dividual source domains in the training set. After obtaining image xtest , we first pass it through the visual encoder f (.) batch-norm parameters, the new set of unseen class features to get the discriminative feature representation ftest = f (x). are generated as follows: Next we pass this feature representation to the classifier to get f̂n = G(zn , c) the final prediction ŷ = arg maxy∈Y u C(ftest )[y] (5) where c = [auyn , egen u gen n ], zn ∼ Z, yn ∼ Y , en ∼ Egen To improve generalization to new domains at test-time and 3 Experiments and Results make the classifier domain-agnostic, we synthesize embed- DomainNet Dataset: DomainNet is the only established, dings of newer domains by interpolating the learned embed- diverse large-scale, coarse-grained dataset for ZSLDG dings of the source domains via a mix-up operation. [Mancini et al., 2020] with images belonging to 345 differ- gen ent categories divided into 6 different domains i.e painting, Einterp = λ · egen + (1 − λ) · egen where i, j ∼ Ds , λ ∼ U [0, 1] i j real, clipart, infograph, quickdraw and sketch. We follow the (6) where egen and egen refer to the domain embeddings of seen-unseen class training-testing splits and protocol as estab- i j i-th and j-th source domains. The unseen class features lished in [Mancini et al., 2020] where we train on 5 domains generated using interpolated domain embeddings Fint u = and test on the left-out domain. Also, following [Mancini et n al., 2020], we use word2vec representations [Mikolov et al., {(f̂int , yn )}N n=1 are generated as follows: 2013] as the semantic representations for class labels. For n f̂int = G(zn , cinterp. ) where cinterp. = [auyn , egen interp. ], feature extractor f (.), we choose a Resnet-50 architecture (7) zn ∼ Z, yn ∼ Y u , egen gen interp. ∼ Einterp similar to [Mancini et al., 2020].
Baselines. We compare our approach with simpler base- Variant Clipart Infograph Painting Quickdraw Sketch Avg lines established in [Mancini et al., 2020] which include ZSL S1 27.5 17.8 25.4 9.5 7 22.5 20.54 S2 27.6 7 17.36 27.08 11.57 24.97 21.716 methods like SPNet [Xian et al., 2019], DeViSE [Frome et al., S3 28.5 17.6 26.8 12.7 25.48 22.2 2013], ALE [Akata et al., 2013] and their coupling with well- S4 28.9 18.2 27.1 13.1 25.7 22.6 known DG methods (DANN [Ganin et al., 2016], EpiFCR [Li et al., 2019b]). We also compare with SOTA ZSLDG Table 2: Ablation study for different components of our framework method, CuMix [Mancini et al., 2020] as well as its vari- on DomainNet dataset ants: (1) Mixup-img-only where mixup is done only at image both domain and semantic embeddings enables the classifier level without curriculum; (2) Mixup-two-level where mixup to discriminate between the distribution of features f better by is applied at both feature and image level without curricu- encoding both domain-specific and class-level information in lum. We also establish Feature generation baselines which generated features f̂ . Furthermore, training the final classifier include f-clsWGAN [Xian et al., 2018] (standard ZSL only on features generated by interpolating source domain embed- approach) and its combination with visual backbone trained dings (i.e S4) further improves performance by alleviating using CuMix [Mancini et al., 2020] methodology (CuMix + f- the bias towards source domains. clsWGAN) and self-supervised rotation [Gidaris et al., 2018] Visualization of Generated Features. We sample semantic feature extraction method (ROT + f-clsWGAN). attributes auy for randomly selected unseen classes and use Results on DomainNet Benchmark. Table 1 shows the domain embeddings (learned end-to-end) of the five source performance comparison of our method with the aforemen- domains (Real, Infograph, Quickdraw, Clipart, Sketch) used tioned baselines. We observe that a simple classification- during training. We individually visualize the features gen- based feature extraction (using only LAGG ) without any reg- erated of each unseen class using only semantic embed- ularization when coupled with our generative module i.e dings/context i.e c = auy (Fig 2, Row 1) and concatena- COCOAAGG is able to outperform the current state-of-the- tion of both semantic and domain embeddings/context i.e art, CuMix [Mancini et al., 2020]. In addition, when we use c = [auy , egen d ] (Fig 2, Row 2) when estimating the batch- rotation prediction as a a regularizing auxiliary task [Gidaris norm parameters of the generator. We notice that condition- et al., 2018] (referred as COCOAROT ), we observe that it ing the batchnorm parameters on both semantic and domain achieves the best performance on all domains individually as context (Fig 2, Row 2) enables the model to better captures well as on average (with significant improvement especially the modes of the original data distribution. It can be seen that on hard domains like quickdraw where there is large domain the model can better retain domain-specific variance (asso- shift encountered at test-time), thus outperforming [Mancini ciated with the five source domains) within a specific class et al., 2020] by a margin of about 2% average across domains cluster when both semantic and domain embeddings are used (which corresponds to a 10% relative increase). We believe (Fig 2, Row 2), compared to the case where only semantic this is because the auxiliary task enables better representation context is used (Fig 2, Row 1) learning. Also, we observe that combining generative ZSL baselines like f-clsWGAN [Xian et al., 2018] with different 4 Conclusion In this work, we propose a unified generative framework visual backbones including CuMix results in inferior average for the ZSLDG setting that uses context conditional batch- performance when compared with our approach. normalization to integrate class-level and domain-specific in- Component-wise Ablation Study. Table 2 shows the formation into generated visual features, thereby enabling component-wise performance for our method COCOAROT better generalization at test time. Through experiments, we on DomainNet dataset, following [Mancini et al., 2020]. demonstrate superior performance over established baselines • S1 corresponds to the performance of the feature extractor and SOTA. Our proposed approach can be seamlessly inte- f (.) trained using using rotation prediction as a regularizer grated into other generative frameworks like VAEs, Flows, (without generative stage 2). etc. which is left for future work. • S2 corresponds to the performance achieved by learning a generator G without using domain embeddings as an input. References Specifically, this implies that the BatchNorm estimator net- works Bgen l is given only the class-level semantic attribute [Akata et al., 2013] Zeynep Akata, Florent Perronnin, Zaid as input i.e context vector c = asy . Harchaoui, and Cordelia Schmid. Label-embedding for • S3 denotes the use of S2, with domain embeddings as an l additional input in the context vector provided to Bgen i.e s gen c = [ay , ed ], without the use of interpolated domain gen embeddings Einterp (Eqn 5) and S4 denotes our complete model with the use of interpolated domain embeddings gen Einterp to generate features. Note that S2, S3 and S4 fol- low the inference mechanism described in Sec 2.3. From Table 2, we observe that S4 (proposed approach), Figure 2: Individual t-SNE visualization of synthesized image fea- which exploits the generative pipeline, semantic and domain tures by COCOA for randomly selected unseen classes (Giraffe, embeddings as well as their mixing, achieves the best perfor- Watermelon, Helicopter, Rainbow, Skyscraper) using only seman- mance. This corroborates our hypothesis that conditioning on tic context (Row 1) and both domain-semantic context (Row 2).
attribute-based classification. In Proceedings of the IEEE [Li et al., 2018a] Da Li, Yongxin Yang, Yi-Zhe Song, and Conference on Computer Vision and Pattern Recognition, Timothy M. Hospedales. Learning to generalize: Meta- pages 819–826, 2013. learning for domain generalization. In AAAI, 2018. [Carlucci et al., 2019] Fabio Maria Carlucci, Antonio [Li et al., 2018b] Haoliang Li, Sinno Jialin Pan, S. Wang, D’Innocente, Silvia Bucci, B. Caputo, and T. Tommasi. and A. Kot. Domain generalization with adversarial fea- Domain generalization by solving jigsaw puzzles. 2019 ture learning. 2018 IEEE/CVF Conference on Computer IEEE/CVF Conference on Computer Vision and Pattern Vision and Pattern Recognition, pages 5400–5409, 2018. Recognition (CVPR), pages 2224–2233, 2019. [Li et al., 2019a] Da Li, J. Zhang, Yongxin Yang, Cong Liu, [Chandhok and Balasubramanian, 2021] Shivam Chandhok Yi-Zhe Song, and Timothy M. Hospedales. Episodic train- and V. Balasubramanian. Two-level adversarial visual- ing for domain generalization. 2019 IEEE/CVF Inter- semantic coupling for generalized zero-shot learning. national Conference on Computer Vision (ICCV), pages WACV, 2021. 1446–1455, 2019. [Chattopadhyay et al., 2020] Prithvijit Chattopadhyay, [Li et al., 2019b] Da Li, Jianshu Zhang, Yongxin Yang, Y. Balaji, and Judy Hoffman. Learning to balance Cong Liu, Yi-Zhe Song, and Timothy M Hospedales. specificity and invariance for in and out of domain Episodic training for domain generalization. In Proceed- generalization. volume abs/2008.12839, 2020. ings of the IEEE International Conference on Computer [Felix et al., 2018] Rafael Felix, Vijay BG Kumar, Ian Reid, Vision, pages 1446–1455, 2019. and Gustavo Carneiro. Multi-modal cycle-consistent gen- [Mancini et al., 2020] Massimiliano Mancini, Zeynep eralized zero-shot learning. In Proceedings of the Euro- Akata, E. Ricci, and Barbara Caputo. Towards recog- pean Conference on Computer Vision, 2018. nizing unseen categories in unseen domains. In ECCV, [Frome et al., 2013] Andrea Frome, Greg S Corrado, Jon 2020. Shlens, Samy Bengio, Jeff Dean, Marc’Aurelio Ranzato, [Maniyar et al., 2020] Udit Maniyar, K. J. Joseph, A. Desh- and Tomas Mikolov. Devise: A deep visual-semantic em- mukh, Ü. Dogan, and V. Balasubramanian. Zero-shot do- bedding model. In Advances in neural information pro- main generalization. ArXiv, abs/2008.07443, 2020. cessing systems, pages 2121–2129, 2013. [Mikolov et al., 2013] Tomas Mikolov, Kai Chen, Greg S. [Ganin et al., 2016] Yaroslav Ganin, Evgeniya Ustinova, Corrado, and Jeffrey Dean. Efficient estimation of word Hana Ajakan, Pascal Germain, Hugo Larochelle, François representations in vector space, 2013. Laviolette, Mario Marchand, and Victor Lempitsky. Domain-adversarial training of neural networks. The [Mishra et al., 2018] Ashish Mishra, Shiva Krishna Reddy, Journal of Machine Learning Research, 17(1):2096–2030, Anurag Mittal, and Hema A Murthy. A generative model 2016. for zero shot learning using conditional variational autoen- coders. CVPRW, 2018. [Ghifary et al., 2015] Muhammad Ghifary, W. Kleijn, M. Zhang, and D. Balduzzi. Domain generalization for [Miyato and Koyama, 2018] Takeru Miyato and Masanori object recognition with multi-task autoencoders. 2015 Koyama. cGANs with projection discriminator. In Inter- IEEE International Conference on Computer Vision national Conference on Learning Representations, 2018. (ICCV), pages 2551–2559, 2015. [Muandet et al., 2013] Krikamol Muandet, D. Balduzzi, and [Gidaris et al., 2018] Spyros Gidaris, Praveer Singh, and B. Schölkopf. Domain generalization via invariant feature Nikos Komodakis. Unsupervised representation learning representation. ArXiv, abs/1301.2115, 2013. by predicting image rotations. ArXiv, abs/1803.07728, [Narayan et al., 2020] Sanath Narayan, A. Gupta, F. Khan, 2018. Cees G. M. Snoek, and L. Shao. Latent embedding feed- [Huang et al., 2019] He Huang, Changhu Wang, Philip S Yu, back and discriminative features for zero-shot classifica- and Chang-Dong Wang. Generative dual adversarial net- tion. ArXiv, abs/2003.07833, 2020. work for generalized zero-shot learning. CVPR, 2019. [Ni et al., 2019] Jian Ni, Shanghang Zhang, and Haiyong [Keshari et al., 2020] Rohit Keshari, Richa Singh, and Xie. Dual adversarial semantics-consistent network for Mayank Vatsa. Generalized zero-shot learning via over- generalized zero-shot learning. NeurIPS,, 2019. complete distribution. CVPR, 2020. [Reed et al., 2016] Scott E. Reed, Zeynep Akata, H. Lee, [Kodirov et al., 2015] Elyor Kodirov, Tao Xiang, Zhenyong and B. Schiele. Learning deep representations of fine- Fu, and S. Gong. Unsupervised domain adaptation for grained visual descriptions. 2016 IEEE Conference on zero-shot learning. 2015 IEEE International Conference Computer Vision and Pattern Recognition (CVPR), pages on Computer Vision (ICCV), pages 2452–2460, 2015. 49–58, 2016. [Li et al., 2017] Da Li, Yongxin Yang, Yi-Zhe Song, and [Schonfeld et al., 2019] Edgar Schonfeld, Sayna Ebrahimi, Timothy M. Hospedales. Deeper, broader and artier do- Samarth Sinha, Trevor Darrell, and Zeynep Akata. Gen- main generalization. 2017 IEEE International Conference eralized zero-and few-shot learning via aligned variational on Computer Vision (ICCV), pages 5543–5551, 2017. autoencoders. CVPR, 2019.
[Seo et al., 2020] Seonguk Seo, Yumin Suh, D. Kim, Jong- woo Han, and B. Han. Learning to optimize domain spe- cific normalization for domain generalization. 2020. [Shen et al., 2020] Yuming Shen, J. Qin, and L. Huang. In- vertible zero-shot recognition flows. In ECCV, 2020. [Wan et al., 2019] Ziyu Wan, Dongdong Chen, Y. Li, Xing- guang Yan, Junge Zhang, Y. Yu, and Jing Liao. Transduc- tive zero-shot learning with visual structure constraint. In NeurIPS, 2019. [Xian et al., 2018] Yongqin Xian, Tobias Lorenz, Bernt Schiele, , and Zeynep Akata. Feature generating networks for zero-shot learning. CVPR, 2018. [Xian et al., 2019] Yongqin Xian, Subhabrata Choudhury, Yang He, Bernt Schiele, and Zeynep Akata. Semantic pro- jection network for zero-and few-label semantic segmenta- tion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8256–8265, 2019. [Xu et al., 2014] Zheng Xu, W. Li, Li Niu, and Dong Xu. Exploiting low-rank structure from latent domains for do- main generalization. In ECCV, 2014. [Yang and Gao, 2013] P. Yang and Wei Gao. Multi-view dis- criminant transfer learning. In IJCAI, 2013. [Zhang et al., 2017] L. Zhang, Tao Xiang, and S. Gong. Learning a deep embedding model for zero-shot learning. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3010–3019, 2017.
You can also read