Singe Image Rain Removal with Unpaired Information: A Differentiable Programming Perspective - Vijay ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Singe Image Rain Removal with Unpaired Information: A Differentiable Programming Perspective Hongyuan Zhu1 , Vijay Chanderasekh1 , Liyuan Li1 Joo-Hwee Lim1 1 Institute for Infocomm Research, A*STAR, Singapore, {zhuh, vijay, lyli, joo-hwee}@i2r.a-star.edu.sg Abstract Single image rain-streak removal is an extremely challenging problem due to the presence of non-uniform rain densities in images. Previous works solve this problem using various hand-designed priors or by explicitly mapping synthetic rain to paired clean image in a supervised way. In practice, how- (a) Input (b) Ground truth (c) Our result ever,the pre-defined priors are easily violated and the paired Figure 1: A visual illustration of single image deraining. The training data are hard to collect. To overcome these limi- target is to recover a clean image from the input rain image. tations, in this work, we propose RainRemoval-GAN (RR- Our method produces a recovered image with faithful color GAN), the first end-to-end adversarial model that generates and structure without using any paired training data. realistic rain-free images using only unpaired supervision. Our approach alleviates the paired training constraints by in- where X and R denote the desired clean image and the rain troducing a physical-model which explicitly learns a recov- ered images and corresponding rain-streaks from the differen- streaks, respectively. Single image rain-streak removal aims tiable programming perspective. The proposed network con- to estimate the hidden X and R from a given I, which is sists of a novel multiscale attention memory generator and a an unconstrained problem because we need to estimate two novel multiscale deeply supervised discriminator. The multi- unknown variables and there are infinitely solutions if no scale attention memory generator uses a memory with atten- further regularization are imposed. tion mechanism to capture the latent rain streaks context at To make the rain streak removal problem trackable, exist- different stages to recover the clean images. The deeply su- ing methods can be roughly grouped into two categories: pervised multiscale discriminator imposes constraints at the prior-based and data-driven. The prior-based methods es- recovered output in terms of local details and global appear- timate the rain-streak based on various priors or assump- ance to the clean image set. Together with the learned rain- streaks, a reconstruction constraint is employed to ensure tions. Typical priors include but not limited to sparse-coding the appearance consistent with the input image. Experimen- prior (Luo et al. 2015), low-rank prior (Chang et al. 2017) tal results on public benchmark demonstrates our promising and Gaussian prior (Li et al. 2016). Despite remarkable performance compared with nine state-of-the-art methods in progress achieved by adopting these priors, they are easily terms of PSNR, SSIM, visual qualities and running time. violated in practice given the rain-streak does not strictly fol- low the Gaussian or sparse distribution and the background Introduction scenes is cluttered and contains complex illuminations. Rain, snow, and fog are common visual artifacts which In recent, the interest has shifted to data-driven ap- affects many vision-based applications (Zhu et al. 2017a; proaches which utilize labeled data with paired rain im- 2016a; 2018a; 2015; 2016b), such as drone-based video age and rain streak to learn a deep neural network (Fu surveillance and self-driving cars. The performance of many et al. 2017; Yang et al. 2017b; Zhang and Patel 2018; computer vision systems often exhibits significant drop Li et al. 2018). Given an input image, the rain streaks or when they are presented with images that contain these ar- clean image can be regressed with given input rain im- tifacts. Hence, it is highly practical and expected to de- age. More recently, inspired by the huge success of gener- velop automatic artifacts removal methods (Li et al. 2017; ative adversarial networks (GAN) in image-to-image trans- Zhang et al. 2017c). In this paper, we mainly focus on the lation tasks (Goodfellow et al. 2014; Isola et al. 2017), problem of rain streak removal from a single image. (Zhang et al. 2017b) propose a GAN-based image derain- A rain image I can be modeled as an image composition ing method which employs a generator to map input image problem (Luo et al. 2015): to the ground-truth clean image. One major limitation of the method is the necessity of paired training data and it is ex- I =X +R (1) tremely challenging to collect paired training data given the Copyright c 2019, Association for the Advancement of Artificial changing environment. Intelligence (www.aaai.org). All rights reserved. However, collecting paired training data is extremely
challenging given the changing environment. One popular Related Work solution is simulating rain in controlled environments. How- In this section, we briefly review several recent related works ever, most existing simulations are usually too simplified to on single image de-raining and generative adversarial net- depict the complexity of real-world. Another common solu- works. tion is using photo editing tools to add streak to clean nat- ural images with a various rain-density levels with different orientations and scales, but this method limits the scale of Single Image De-raining training data. Image deraining has been a classic image restoration prob- In fact, one could observe that it is easier to collect a large lem for years. Some early methods (Zhang et al. 2006; number of rain/rain-free images from Internet despite these Garg and Nayar 2007; Santhaseelan and Asari 2015; Tri- images are not in pair. Thus, it is highly excepted to develop pathi and Mukhopadhyay 2014) exploit photometric con- derain model which can utilize unpaired supervision. Note sistency and temporal dynamics for rain removal. Although that, CycleGAN (Zhu et al. 2017b) recently has become these methods achieve promising performance, they are not popular to learn cross-style image translation by using bidi- applicable to the single image setting. rectional constrains with adversarial learning. Although Cy- Unlike video-based methods, prior-based methods have cleGAN has achieved impressive performance in style trans- been proposed to handle single image deraining problem by lation, it is designed for style translation problem and may assuming the rain-streaks follows certain assumption. There not preserve the appearance consistency in the translated re- are many hand-crafted priors have been proposed, e.g. sparse sult. coding-based prior (Luo et al. 2015), low-rank prior (Chang Based on the above observations, we propose a novel et al. 2017) and gaussian prior (Li et al. 2018), just to name a single image deraining method called RainRemoval-GAN few. One major limitation of these methods is that the priors (RR-GAN) which is specifically designed based on rain im- are easily violated which results in over-/under-estimation age composition model in Eq.1. The proposed RR-GAN is of the rain-streaks (Zhang and Patel 2018). significantly different from CycleGAN and its variants in Recently, deep learning has become popular in both high- structure and application. More specifically, we propose a level and low-level vision tasks (Cai et al. 2016; Ren et al. novel generator which uses an attention memory to cap- 2016; Yang et al. 2017a), several CNN-based methods have tures the latent rain streaks contexts in a recurrent fashion also been proposed for image de-raining (Fu et al. 2017; to recover clean image X. Moreover, we propose a novel Yang et al. 2017b; Zhang and Patel 2018; Li et al. 2018). deeply-supervised multiscale discriminator to regularize the In these methods, the idea is to learn a mapping between recovered image to look as realistic as possible to the rain- input rainy images and their corresponding rain-streaks us- free images. Furthermore, these latent factors are compos- ing a CNN structure. (Yang et al. 2017b) design a deep di- ited together as in Eq.1 to reconstruct the original rain im- lated network to joint detect and remove rain streaks. (Li et age to preserve faithful color and structure, thus avoiding al. 2018) propose incorporate recurrent neural network to the inefficient CycleGAN’s bi-directional consistency train- the dilaeted network of Yang et al. to preserve multi-stage ing paradigm. contexts. (Zhang and Patel 2018) propose to use multi-scale The major contributions of this work are summarized as densely connected network trained with additional rain den- follows: sity classifier to predict the rain streaks. In summary, these methods aim to learn a mapping using synthesized rain/clean • To the best of our knowledge, this is one of first works to image pairs which are hard to collect in large scale. marriage CycleGAN for single image deraining, so that Our methods’s generator is similar to (Zhang and Pa- makes rain-removal training with unpaired data possible. tel 2018) as we combines coarse- and fine-scale network The novel GAN consists of a novel generator and discrim- for feature extraction, however we introduce a novel at- inator which is specifically designed by incorporating the tention memory network to learn the rain-streaks automati- rain image generation model with un-paired training in- cally from data in a recurrent fashion. The attention memory formation. network is inspired by recent work in language translation • A novel multi-scale attention memory generator is pro- (Gehring et al. 2017) which shows that the performance of posed with an attention memory to fuse the contexts from machine translation models could be significantly improved coarse-scale and fine-scale densely connected network to by solely using an attention model instead of using addi- recurrently learns the rain-streaks to recover the clean im- tional gate in recurrent networks as in (Li et al. 2018), which age using un-paired training data. significantly improve the inference speed. Moreover, our generator can explicitly learns two disentangled latent pa- • A novel multiscale deeply-supervised discriminator is rameters which corresponds to clean image and rain streaks proposed to regularize the generated recover image as re- from data without paired information, hence significantly re- alistic as possible to the target image in terms of both low- duce the labeling efforts. level details and high-level structures. • Extensive experiments on public benchmark demon- Generative Adversarial Networks strates our method’s efficiency and effectiveness in terms Recently generative adversarial networks (Goodfellow et al. of quantitative and qualitative performance. 2014; Arjovsky et al. 2017; Zhao et al. 2017) have achieved
Multiscale Attention Memory Generator Gf Gf Dense Dense Recovered Block Block Image ….... ….... Adversarial Regularization Dense Dense Block Block Input Image Real/Fake Real/Fake Attention Real/Fake Attention ….. Real/Fake Memory Memory Multiscale Deeply Mask n Supervised Discriminator Mask Consistency Regularization Reconstructed Image Figure 2: The pipeline of our method. Our RR-GAN consists of a multi-scale attention memory generator and a multiscale deeply-supervised discriminator. The generator uses un-paired rain and clean images to train an attention memory to aggregate the latent rain streak context from different stages. The rain image together with the latent rain-streak imasks are used as an input to the U-Net to generate the recovered image. This recovered image is regularized by a deeply-supervised multiscale discriminator so that its appearance is as close as possible to the clean images used for training in terms of both low-level details and global-level structures. significant progress. The basic idea of GANs is transform- attention memory generator using multiscale densely con- ing the white noise (or other specified prior) through a para- nected networks and attention memory in a recurrent fashion metric model to generate candidate samples with the help of to learn discriminative rain features and latent rain streaks a discriminator and a generator. By optimizing a minimax to recover the clean image. Furthermore, our network con- two-player game, the generator aims to learn the training tains of a novel deeply-supervised multi-scale discriminator data distribution, and the discriminator aims to judge that training regime, so that the local details to global image ap- a sample comes from the training data or the generator. An pearance is enforced to look realistic to the clean image set. image can be translated into an output by the conditional Furthermore, these latent factors are composited together as generator, which has various applications, such as image su- in Eq.1 to reconstruct the original rain image to preserve per resolution (Ledig et al. 2017), text2image (Zhang et al. faithful color and structure, thus avoiding the inefficient Cy- 2017a), image2image (Yi et al. 2017). However, condition- cleGAN’s bi-directional consistency training paradigm. alGAN requires paired training images, whose ground-truth can be very difficult to get. Differentiable Programming This paper is inspired by the recent popular Cycle- Our work belongs to the family of differentiable program- GAN (Zhu et al. 2017b) which can translate one image from ming which treats a program as neural network such that the one domain to another domain without paired information. program can be parametrized, automatically differentiated However, our work is remarkably different from CycleGAN. and optimizable. The first well-known work of differentiable In brief, our architecture is specifically designed for image programming is the Learned ISTA (LISTA) (Gregor and Le- deraining so that the original image’s color and structure Cun 2010), which unfolds the popular l1 solver ISTA as a could be well preserved. To the best of our knowledge, this simple RNN such that the number of layers corresponds to could one of the first works to develop specific single im- the iteration number and the weight corresponds to dictio- age deraining model that incorporates un-paired adversar- nary. The LISTA’s RNN paradigm have been applied in a ial learning. Our network is remarkably distinct from Cy- wide range of tasks, e.g. hashing (Wang et al. 2016a), clas- cleGAN in following aspects. First, we proposes a novel sification (Wang et al. 2016b), sparse coding (Zhou et al.
2018), image dehazing (Zhu et al. 2018b) and etc. is as follows: Different from conventional differentiable programming at = σ(W1 [Xc , Xf , Mt−1 , St−1 ] + b1 ) using RNN with difficult-to-interpret variables, our method St = at · St−1 (2) reformulate the image degradation model using a feed- Mt = tanh(W2 ∗ St + b2 ) forward convolutional neural network with prior knowledge. Therefore, our formulation is more interpreatable and effi- where W1 and b1 are the weight and bias of a convolutional cient then conventional RNN based DP solver. layer with 3 × 3 kernels of one neuron, σ denotes the sig- 1 moid function σ(x) = 1+exp(−x) . W2 and b2 are the weight and bias of a deconvolutional layer with 3 × 3 kernels of The Method one neuron, which are used to output the rain-streak mask. In Fig.2, one can observe that the rain mask is recurrently As shown in Fig.2, our RR-GAN consists of two networks, improved by using the contexts from the previous stages. namely, a multiscale attention memory generator (MAMG) The input image together the final attention map are con- and a multiscale deeply-supervised discriminator (MDSD) catenated to feed into the U-Net auto-encoder for generat- are specifically designed for single image deraining. In brief, ing the rain-free image. Our deep autoencoder includes 16 MAMG uses an attention memory to recurrently attend to convolutional (relu) blocks, which adopts skip connections the rain-streak regions for the purpose of recovery of the to prevent blurred outputs. To show the effectiveness of our clean image. MDSD employs a deeply supervision to re- method, Fig. 3 shows the visual results of different gener- cover image from the generator by enforcing it look as re- ators, the one which combines attention memory and skip alistic as possible to the clean image set. U-Net auto-encoder yielded best visual result. Multiscale Attention Memory Network The multiscale attention memory network consists of an at- tentive memory network and a U-Net autoencoder with skip- connection. The former uses a multiscale feature extractor Gf to extract rain-relevant regions features. To achieve bet- ter performance, Gf combines the complementary coarse- scale and dense-scale context using recently proposed dense (a) (b) connection (Huang et al. 2017). The fine-scale dense net- work applies densely connected modular with small ker- nels to capture fine-scale structures relevant to rain re- gions with a structure of C(1, 3)-C(3, 3)-C(5, 3)-C(7, 3) , where C(k1 , k2 ) denotes the convolution with a filter of size k1 × k1 , an output channel number of k2 and a stride of 1 with ReLu output (Krizhevsky et al. 2012). Each layer are densely connected with other layers. The coarse-scale (c) (d) branch applies a similar architecture but with larger kernels Figure 3: A visual comparison on the effectiveness of dif- to capture longer-range context, whose structure is of C(7, ferent generators. (a) Input; (b) the result of using context 3)-C(9, 3)-C(11, 3)-C(1, 3). The feature maps Xc and Xf auto-encoder alone; (c) the result of our generator without from coarse-scale and fine-scale networks respectively, are using attention memory; (d) the result of our multiscale at- concatenated and fed through the attention memory network tention memory generator. to learn rain regions. Visual attention models are adopted to localize relevant Multiscale Deeply-Supervised Discriminator regions in an image so that the task-specific features are ob- The function of discriminator is to regularize the output from tained. In this paper, we introduce visual attention to learn the generator so that the generated image looks as realistic as where the rain-regions should be focused on. As shown in possible to the target set clean images. Recently, a shallowly- our architecture in Fig. 2, we present an attention memory supervised discriminator with discriminator with supervi- network which is inspired by recent convolutional sequence- sion at the last layer has become popular in recent popular to-sequence learning task in machine translation (Gehring et gan architecrtures (e.g CycleGAN (Zhu et al. 2017c) and al. 2017). Our attention memory network sequentially uses ConditionalGAN (Isola et al. 2017)) which achieves good attention to replace the recurrent networks and the attention image translation performance. However, such shallowly- map is iteratively refined given the accumulated context. At supervised discriminator overlooks the low-level details and each time stamp t, our attention memory network concate- global structures which are useful for image enhancement nates Xc , Xf , Mt−1 , and St−1 , where Xc is the input from tasks. coarse-scale network, Xf is from fine-scale network, Mt−1 To overcome the above disadvantage, we propose a novel is the rain streak mask at the previous stage t − 1, and St deeply-supervised discriminator which consists of four con- denotes the state memory in previous stage. Note that, the volutional layers, which is with the structure of C(3, 64)- initial M0 and S0 are set to zero. The detailed updating rule C(3, 128)-C(3, 256)-C(3, 512), where each convolutional
layer has a stride of two. Each convolutional feature map Training Details will be passed through an instance normalization layer with During training, a 286 × 286 image is randomly cropped Leaky-ReLu activations and then be fed into the next con- from the input image (or its horizontal flip) of size 256256. volutional layer. Moreover, each side-output follows by an Adam is used as optimization algorithm with a mini-batch additional convolution layer of C(1, 1) with sigmoid to out- size of 1. The learning rate starts from 0.001. The models put the probability of each patch to the clean images. Hence are trained for up to 10 epochs to ensure convergence. We we will have four predictions which provide multiple level use a weight decay of 0.0001 and a momentum of 0.9. The supervision to provide regularization at to make the gener- entire network is trained using the Pytorch framework. Dur- ated image as realistic as the ground-truth image in terms of ing training, we set γ = 1. All the parameters are defined low-level details and high-level structures. via cross-validation using the validation set. Objective Function Experiments To stabilize the training of discriminator D, we minimize an objective function Ld that consists of two terms LDreal and We evaluate our method on public benchmark. We quan- LDf ake as in (Mao et al. 2017): titatively evaluate the rain-removal performance using two commonly used metrics, including peak signal to noise ratio Ld (G, D) =LDreal + LDf ake (PSNR) (Huynh-Thu and Ghanbari 2008) and structure sim- K ilarity index (SSIM) (Wang et al. 2004) for evaluation. We 1 X also provide ablation study of each component proposed to = {Ey (1 − Di (x, y))2 + (3) 2K i=1 demonstrate their effectiveness. Ex (Di (G(x))2 } We use Rain800 (Zhang and Patel 2018) for benchmark- ing. The Rain800 dataset contains 700 synthesized images which aims to train the discriminator Di so that it can try to for training and 100 images for testing using randomly sam- differentiate the fake examples G(x) from generator G and pled outdoor images. the real clean image y, where Ex is the expectation of x and Di∈{1,2,...,K} is the classifier at ith level side-output. Comparing Methods We compare our proposed ap- On the other hand, when we train the generator G, we proach with 7 state-of-the-art methods, including image de- optimize an objective function Lg which consists of a MSE composition (ID) (Kang et al. 2012), discriminative sparse loss Lr (Ren et al. 2016) and an adversarial learning loss coding (DSC) (Luo et al. 2015), layer priors and (LP) (Li et Ladv : al. 2016), DetailsNet (Fu et al. 2017), and joint rain detec- tion and removal (JORDER) (Yang et al. 2017b) and RES- Lg = Lr + γLadv (4) CAN (Li et al. 2018). These three terms are designed for minimizing the recon- struction error and improving the perceptual quality, re- Results on Synthetic Datasets Table 1 shows quantita- spectively. γ is a trade-off factors set according to cross- tive results on the Rain800. One can observe that our RR- validation. GAN considerably achieves promising results in terms of The MSE Loss Lm encourages the network to enforce both PSNR and SSIM on Rain800. the appearance consistency between the recovered image From Table 1, data-driven methods, especially the deep X and input image Il using the estimated rain mask R by learning based methods (Fu et al. 2017; Yang et al. 2017b; minimizing the discrepancy between the composite image Zhang and Patel 2018; Li et al. 2018) significantly outper- It = X + R and the input image Il : form the prior-based methods (Kang et al. 2012; Luo et al. 2015; Li et al. 2016). The comparison demonstrates that the W X H X C features learned by the deep neural networks is much helpful 1 X Lr = kI i,j,c − Ili,j,c k2 (5) for the rain-removal task. C × W × H i=1 j=1 c=1 t On the one hand, the proposed RR-GAN is trained with unpaired training data, which outperform the JORDER The W , H, and C are the width, height, and channel number method on the Rain800 dataset. Note that, JORDER is of the input image It . trained using the paired supervised data. In other words, it The Adversarial Loss Ladv encourages the generator G utilizes a stronger supervisor than our method does. The re- to recover image G(x) as realistic as the ground-truth image sult demonstrates that it is feasible and promising to develop y by assigning a real label 1 to recovered example G(x): deraining model using unpaired data which could be easily K accessed in practice. 1 X In terms of running time, our method is also much faster Ladv (G, D) = Ex (1 − Di (G(x)))2 (6) K i=1 than other deep learning methods and achieve a running time of 0.03s per image, which demonstrates our method’s real- where K is the layer number of discriminator, Di is i-th time performance. level classifier to classify whether an image is from the clean image or generator G(x). The term penalizes the details of Analysis of RR-GAN To demonstrate the effectiveness of recovered image that looks different from clean images. our generator used in RR-GAN, we compare it with other
Input RainCNN DetailNet DID-MDN Our Ground-Truth Figure 4: Qualitative comparison with the state-of-the-arts on two randomly sampled Rain800. Dataset R800 Time (s) From the table, one can observe that the best deraining Measure PSNR SSIM results can be obtain when both corresponding rain and rain- ID 18.88 0.5832 120s free images are used for training. However, other settings DSC 18.56 0.5996 189.3s can still achieve comparable perfor- mance. This implies that LP 20.46 0.7297 674.8s it is not necessary to have paired rain and rain-free scenes DetailNet 21.16 0.7320 0.3s during training (see setting 2). JORDER 22.24 0.7763 1.5s RESCAN 24.09 0.8410 0.4s Pair(P)/ Setting PSNR SSIM RR-GAN 23.51 0.7566 0.03s UnPair(U) 1 P 23.79 0.7951 Table 1: Quantitative experiments on Rain800. Best results 2 U 23.51 0.7566 are marked in bold and the second best results are under- lined. Analysis of Loss Function To better demonstrate the ef- fectiveness of our objective function, we conduct an abla- network architectures. The first baseline is dense connec- tion study by considering the combinations of the proposed tion. To be specific, we replace the dense block with resid- MSE loss Lr and the adversarial loss Ladv . Figure 5 and Ta- ual block (ResNet) and fix other blocks in our model. One ble 3 demonstrate qualitative and quantitative results on an could find that the performance was dropped significantly sample image, respectively. One can observe that by using from 23.51/0.7566 (PSNR/SSIM) to 22.82/0.6991. MSE loss alone without regularization from the adversarial Besides the ablation study in generator, we also test the loss, the quality of the recover image is very poor due to that effectiveness of our deeply supervised multiscale discrim- there is not sufficient infomation to allow generator to learn inator. To the end, we adopt a new discriminator which what makes the clean image. Using adversarial loss alone only enforces the supervision at the last layer (shallowly- can significantly improve the visual quality as the informa- supervised discriminator). One could see that the perfor- tion from clean image set help the generator lean use cues mance is dropped to 21.43/0.7254. to recover clean image, however the tone recovered image is bit yellowish as there is no consistency constraint between Component Baseline PSNR SSIM the recover image and the input image. By combining these Generator ResNet 22.82 0.6991 two terms together, our method can produce recovered im- Discriminator Shallow-supervision 21.43 0.7254 ages with realistic color and structures. Our Method RR-GAN 23.51 0.7566 Table 2: Ablation study on RR-GAN. Analysis of Training Paradigm We analyze how the presence of corresponding rain/rain-free scenes in the train- ing samples can aeffect the model training. We first ran- (a) Input (b) Lr domly split the Rain100H dataset into two halves (split 1 and 2). We use different combinations of the images for model training, and then test the model on the rain images in split 2. Specifically, we train under 2 settings: the rain and clean image are paired from split1 (setting 1), the clean images are randomly drawn from split 1 (setting 2), whose testing per- (c) Ladv (d) Lr+Ladv (e) GroundTruth formance on split 2’s rain images can be observed in Table.. Figure 5: Qualitative studies on different loss.
Karol Gregor and Yann LeCun. Learning fast approxima- Table 3: Quantitative studies on different losses. tions of sparse coding. In ICML, pages 399–406, USA, Metrics Lr Ladv Lr + Ladv 2010. PSNR 17.08 21.33 23.51 SSIM 0.3760 0.7328 0.7566 Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kil- ian Q. Weinberger. Densely connected convolutional net- works. In CVPR, 2017. Conclusion Q. Huynh-Thu and M. Ghanbari. Scope of validity of psnr in image/video quality assessment. Electronics Letters, This paper proposed a novel RR-GAN for end-to-end single 44(13):800–801, June 2008. image deraining without using paired training data. The pro- posed method includes a novel multiscale attention memory Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. network and a novel deeply supervied multiscale discrim- Efros. Image-to-image translation with conditional adver- inator. The attention memory network uses a state mem- sarial networks. In CVPR, pages 5967–5976, 2017. ory to learn a latent rain-streak mask in a recurrent fash- Li-Wei Kang, Chia-Wen Lin, and Yu-Hsiang Fu. Automatic ion to aggreate the rain context information from diffrent single-image-based rain streaks removal via image decom- stages. Together with the input image, the generator will try position. IEEE TIP, 21(4):1742–1755, 2012. to generate the recovered image by iusing un-paired rain and Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. clean images training data. The proposed deeply supervised Imagenet classification with deep convolutional neural net- multiscale discriminator can effectively regularize the out- works. In NIPS, pages 1106–1114, 2012. put from the generator to look as realistic as possible to the clean image sets in terms of local details and global struc- C. Ledig, L. Theis, F. Huszr, J. Caballero, A. Cunning- tures. Extensive experiments have demonstrated the promis- ham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and ing performance of our method in terms of PSNR, SSIM, W. Shi. Photo-realistic single image super-resolution using running time and visual quality. a generative adversarial network. In IEEE Conference on Computer Vision and Pattern Recognition, pages 105–114, Honolulu, HI, Jul. 2017. Acknowledgement Yu Li, Robby T. Tan, Xiaojie Guo, Jiangbo Lu, and This was funded by the Deep learning 2.0. program at Michael S. Brown. Rain streak removal using layer priors. I2R, A*STAR. The work of X. Peng was supported by the In CVPR, 2016. Fundamental Research Funds for the Central Universities under Grant YJ201748, by NFSC under Grant 61806135, Boyi Li, Xiulian Peng, Zhangyang Wang, Jizheng Xu, and 61432012 and Grant U1435213, and by the Fund of Sichuan Dan Feng. An all-in-one network for dehazing and beyond. UniversityTomorrow Advancing Life. CoRR, abs/1707.06543, 2017. Xia Li, Jianlong Wu, Zhouchen Lin, Hong Liu, and Hongbin References Zha. Recurrent squeeze-and-excitation context aggregation net for single image deraining. In ECCV, 2018. Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein GAN. In International Conference on Machine Yu Luo, Yong Xu, and Hui Ji. Removing rain from a single Learning, Jan. 2017. image via discriminative sparse coding. In ICCV, 2015. B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao. Dehazenet: Xudong Mao, Qing Li, Haoran Xie, Raymond Y. K. Lau, An end-to-end system for single image haze removal. IEEE Zhen Wang, and Stephen Paul Smolley. Least squares gen- Transactions on Image Processing, 25(11):5187–5198, Nov. erative adversarial networks. In ICCV, 2017. 2016. Wenqi Ren, Si Liu, Hua Zhang, Jinshan Pan, Xiaochun Cao, Yi Chang, Luxin Yan, and Sheng Zhong. Transformed low- and Ming-Hsuan Yang. Single image dehazing via multi- rank model for line pattern noise removal. In ICCV, 2017. scale convolutional neural networks. In Bastian Leibe, Jiri Xueyang Fu, Jiabin Huang, Delu Zeng, Yue Huang, Xinghao Matas, Nicu Sebe, and Max Welling, editors, ECCV, 2016. Ding, and John Paisley. Removing rain from single images Varun Santhaseelan and Vijayan K. Asari. Utilizing lo- via a deep detail network. In CVPR, 2017. cal phase information to remove rain from video. IJCV, Kshitiz Garg and Shree K. Nayar. Vision and rain. IJCV, 112(1):71–89, 2015. 75(1):3–27, 2007. Abhishek Kumar Tripathi and Sudipta Mukhopadhyay. Re- Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, moval of rain from videos: a review. Signal, Image and and Yann N. Dauphin. Convolutional sequence to sequence Video Processing, 8(8):1421–1430, 2014. learning. In ICML, 2017. Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simon- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing celli. Image quality assessment: from error visibility to Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, structural similarity. IEEE Transactions on Image Process- and Yoshua Bengio. Generative Adversarial Nets. In Ad- ing, 13(4):600–612, April 2004. vances in Neural Information Processing Systems, pages Zhangyang Wang, Qing Ling, and Thomas S. Huang. Learn- 2672–2680, Montreal, Canada, Dec. 2014. ing deep l0 encoders. In AAAI, pages 2194–2200, 2016.
Zhangyang Wang, Yingzhen Yang, Shiyu Chang, Qing Ling, Hongyuan Zhu, Romain Vial, and Shijian Lu. TOR- and Thomas S. Huang. Learning A deep l∞ encoder for NADO: A spatio-temporal convolutional regression network hashing. In IJCAI, pages 2174–2180, 2016. for video action proposal. In IEEE International Conference Hui Yang, Jinshan Pan, Qiong Yan, Wenxiu Sun, Jimmy on Computer Vision, ICCV 2017, Venice, Italy, October 22- Ren, and Yu-Wing Tai. Image dehazing using bilinear com- 29, 2017, pages 5814–5822, 2017. position loss function. arXiv preprint arXiv:1710.00279, Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. 2017. Efros. Unpaired image-to-image translation using cycle- Wenhan Yang, Robby T. Tan, Jiashi Feng, Jiaying Liu, consistent adversarial networks. In ICCV, 2017. Zongming Guo, and Shuicheng Yan. Deep joint rain de- Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. tection and removal from a single image. In CVPR, 2017. Efros. Unpaired image-to-image translation using cycle- consistent adversarial networks. In ICCV, 2017. Zili Yi, Hao Zhang, and Ping Tan Minglun Gong. DualGAN: Unsupervised Dual Learning for Image-to-Image Transla- H. Zhu, R. Vial, S. Lu, X. Peng, H. Fu, Y. Tian, and X. Cao. tion. In IEEE International Conference on Computer Vision, Yotube: Searching action proposal via recurrent and static pages 502–510, Venice, Italy, October 2017. regression networks. IEEE Transactions on Image Process- ing, 27(6):2609–2622, June 2018. He Zhang and Vishal M. Patel. Density-aware single image de-raining using a multi-stream dense network. In CVPR, Hongyuan Zhu, Xi Peng, Vijay Chandrasekhar, Liyuan Li, 2018. and Joo-Hwee Lim. Dehazegan: When image dehazing meets differential programming. In Proceedings of the Xiaopeng Zhang, Hao Li, Yingyi Qi, Wee Kheng Leow, and Twenty-Seventh International Joint Conference on Artificial Teck Khim Ng. Rain removal in video by combining tem- Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Swe- poral and chromatic properties. In ICME, 2006. den., pages 1234–1240, 2018. Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaolei Huang, Xiaogang Wang, and Dimitris N Metaxas. Stack- GAN - Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks. In IEEE International Conference on Computer Vision, Venice, Italy, Oct. 2017. He Zhang, Vishwanath Sindagi, and Vishal M. Patel. Image de-raining using a conditional generative adversarial net- work. CoRR, abs/1701.05957, 2017. Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. Beyond a gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Trans. Image Pro- cessing, 26(7):3142–3155, 2017. Junbo Zhao, Michael Mathieu, and Yann LeCun. Energy- based Generative Adversarial Network. In International Conference on Learning Representations, Toulon, France, Apr 2017. Joey Tianyi Zhou, Kai Di, Jiawei Du, Xi Peng, Hao Yang, Sinno Jialin Pan, Ivor Tsang, Yong Liu, Zheng Qin, and Rick Siow Mong Goh. Sc2net: Sparse lstms for sparse coding. In AAAI, 2018. Hongyuan Zhu, Shijian Lu, Jianfei Cai, and Guangqing Lee. Diagnosing state-of-the-art object proposal methods. In Pro- ceedings of the British Machine Vision Conference 2015, BMVC 2015, Swansea, UK, September 7-10, 2015, pages 11.1–11.12, 2015. Hongyuan Zhu, Jiangbo Lu, Jianfei Cai, Jianmin Zheng, Shijian Lu, and Nadia Magnenat-Thalmann. Multiple hu- man identification and cosegmentation: A human-oriented CRF approach with poselets. IEEE Trans. Multimedia, 18(8):1516–1530, 2016. Hongyuan Zhu, Jean-Baptiste Weibel, and Shijian Lu. Dis- criminative multi-modal feature fusion for RGBD indoor scene recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages 2969–2976, 2016.
You can also read