Real-Time Highlight Removal From a Single Image - Vítor Saraiva Ramos - Natal 2021 - UFRN

Page created by Vernon Wolf

Education

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Real-Time Highlight Removal From a Single Image - Vítor Saraiva Ramos - Natal 2021 - UFRN

UNIVERSIDADE FEDERAL DO RIO GRANDE DO NORTE
 CENTRO DE TECNOLOGIA
 PROGRAMA DE PÓS-GRADUAÇÃO EM ENGENHARIA ELÉTRICA E DE COMPUTAÇÃO

 Vítor Saraiva Ramos

Real-Time Highlight Removal From a Single Image

 Natal
 2021

UNIVERSIDADE FEDERAL DO RIO GRANDE DO NORTE
 CENTRO DE TECNOLOGIA
 PROGRAMA DE PÓS-GRADUAÇÃO EM ENGENHARIA ELÉTRICA E DE COMPUTAÇÃO

 Vítor Saraiva Ramos

Real-Time Highlight Removal From a Single Image

 Master’s Dissertation submitted to the
 Electrical and Computer Engineering
 Graduate Program of the Federal Uni-
 versity of Rio Grande do Norte in par-
 tial fulfillment of the requirements for
 the degree of Master of Science.

 Adviser: Luiz Felipe de Queiroz Silveira

 Natal
 2021

Universidade Federal do Rio Grande do Norte - UFRN
 Sistema de Bibliotecas - SISBI
 Catalogação de Publicação na Fonte. UFRN - Biblioteca Central Zila Mamede

Ramos, Vítor Saraiva.
 Real-time highlight removal from a single image / Vítor
Saraiva Ramos. - 2021.
 66f.: il.

 Dissertação (Mestrado) - Universidade Federal do Rio Grande
do Norte, Centro de Tecnologia, Programa de Pós-Graduação em
Engenharia Elétrica e de Computação, Natal, 2021.
 Orientador: Dr. Luiz Felipe de Queiroz Silveira.

 1. Image color analysis - Dissertação. 2. Image enhancement -
Dissertação. 3. Image processing - Dissertação. 4. Image texture
analysis - Dissertação. I. Silveira, Luiz Felipe de Queiroz. II.
Título.

RN/UF/BCZM CDU 621.3

 Elaborado por Raimundo Muniz de Oliveira - CRB-15/429

Acknowledgements

First, I would like to acknowledge my advisers, professor Luiz Felipe de Q. Silveira
and professor Luiz Gonzaga de Q. Silveira Júnior, for helping me navigate uncharted waters
in academia. They have followed this work since its inception and have equally contributed
several improvements. In addition, fellow professors in the defense committee, professor Rafael
B. Gomes and professor Francisco M. Bernardino Júnior, also provided valuable discussions that
contributed towards the conclusion of this work.
I also wish to express my thanks to the anonymous referees of our first paper. Their
comments have objectively directed the development towards a better work. Thanks also go
to Lawrence Medeiros and Ozias Filho for valuable insight when I was drafting the patent
application for the method we have developed, and to the institutional innovation agency team
for filing the application. I am also grateful to professor Daniel Pontes for helping us pursue how
to bring our work to real-world applications.
I would also like to recognize the electrical and computer engineering graduate program
staff and coordinators, and the institutional office of graduate studies staff for diligently assisting
me through many administrative processes.
Acknowledgment is also due to Atif Anwer for introducing and discussing domain-
specific scientific literature with me via correspondence, and for independently reviewing my
implementations of works from the scientific literature. Acknowledgement is likewise due to João
Lucas C. B. de Farias for sending me a draft of his multidisciplinary dissertation in mechatronics
engineering which inspired the overall structure of this work, and to all peers who contributed
to this work, including friends and colleagues.
Last, but not least, special and warm thanks go to my parents, Anatália S. M. Ramos and
Rubens E. B. Ramos, to my brothers, Eugênio S. Ramos e Pedro S. Ramos, and to my better half,
Helena T. A. da Silva. I would not be able to achieve new heights if not for them. The support
that our families provide is immeasurable and cannot be understated. Thank you.
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de
Nível Superior – Brasil (CAPES) – Finance Code 001.

Abstract
The problem of highlight removal from image data refers to an open problem in computer vision
concerning the estimation of specular reflection components and the removal thereof. In recent
applications, highlight removal methods have been employed for the reproduction of specular
highlights on high dynamic range (HDR) displays; to increase glossiness of images in specular
reflection control technologies; to improve image quality in display systems such as TVs; and
to enhance the dynamic range of low dynamic range (LDR) images. However, the underlying
processing required by state-of-the-art methods is computationally expensive and does not meet
real-time operational requirements in image processing pipelines found in consumer electronics
applications. In addition, these applications may require that methods work with a single frame
in imaging or video streams. Thus, this work proposes a novel method for the real-time removal
of specular highlights from a single image. The essence of the proposed method consists in
matching the histogram of the luminance component of a pseudo-specular-free representation
using as reference the luminance component of the input image. The operations performed by
the proposed method have, at most, linear time complexity. In experimental evaluations, the
proposed method is capable of matching or improving upon state-of-the-art results on the task
of diffuse reflection component estimation from a single image, while being 5× faster than the
method with the best computational time and 1500× faster than the method with the best results.
The proposed method has high industrial applicability, and targeted use cases can take advantage
of contributions of this work by incorporating the proposed method as a building block in image
processing pipelines.

Keywords: image color analysis. image enhancement. image processing. image texture analysis.

Resumo
O problema da remoção de realces especulares em dados de imagem refere-se a um problema
em aberto em visão computacional relativo à estimativa dos componentes de reflexão especular
e à remoção dos mesmos. Em aplicações recentes, métodos de remoção de realces especula-
res têm sido empregados para a reprodução de realces especulares em monitores de alta faixa
dinâmica (HDR); para aumentar o brilho das imagens em tecnologias de controle de reflexão
especular; para melhorar a qualidade da imagem em dispositivos de visualização como TVs;
e para melhorar a faixa dinâmica de imagens de baixa faixa dinâmica (LDR). No entanto, o
processamento subjacente exigido pelos métodos do estado da arte é computacionalmente dis-
pendioso e não atende aos requisitos operacionais de processamento em tempo real de pipelines
de processamento de imagem encontrados em aplicações em eletrônica de consumo. Além disso,
essas aplicações podem exigir que os métodos trabalhem com um único quadro em circuitos de
processamento de imagens ou de vídeos. Assim, este trabalho propõe um novo método para a
remoção em tempo real de realces especulares em uma única imagem. A essência do método
proposto consiste em casar o histograma do componente de luminância de uma representação
pseudolivre de especularidades, usando como referência o componente de luminância da imagem
de entrada. As operações realizadas pelo método proposto têm, no máximo, complexidade de
tempo linear. Nas avaliações experimentais, o método proposto é capaz de alcançar ou superar os
resultados do estado da arte na tarefa de estimativa do componente de reflexão difusa a partir de
uma única imagem, sendo 5× mais rápido do que o método com o melhor tempo computacional
e 1500× mais rápido do que o método com os melhores resultados. O método proposto tem alta
aplicabilidade industrial, e as aplicações visadas podem usufruir das contribuições deste trabalho,
incorporando o método proposto como um componente básico em pipelines de processamento
de imagem.

Palavras-chave: análise de cor de imagem. melhoria de imagem. processamento de imagem.
análise de textura de imagem.

List of Figures

Figure 1 – Illustration of the decomposition of an input image into diffuse and specular
 reflection components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Figure 2 – Illustration of the decomposition of an input image into diffuse and specular
 weight maps and chromaticities . . . . . . . . . . . . . . . . . . . . . . . . 15
Figure 3 – Illustration of the generation of a pseudo-specular-free representation . . . . 17
Figure 4 – [A] diagram showing one exemplary configuration of an image processing
 apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Figure 5 – [A] diagram showing a configuration of another image processing apparatus 22
Figure 6 – [A] diagram showing a configuration of still another image processing apparatus 22
Figure 7 – Dichromatic editing examples. © 2006, Springer . . . . . . . . . . . . . . . 23
Figure 8 – Video processing outline to improve apparent gloss. © 2012, IEEE . . . . . 24
Figure 9 – Two examples of histogram matching . . . . . . . . . . . . . . . . . . . . . 35
Figure 10 – One example of a masking operation by thresholding . . . . . . . . . . . . 39
Figure 11 – Diagram of the proposed method . . . . . . . . . . . . . . . . . . . . . . . 42
Figure 12 – Montage of each block comprising the diagram of the proposed method . . 43
Figure 13 – Results for the Shen and Zheng [1] dataset . . . . . . . . . . . . . . . . . . 49
Figure 14 – Results for Tan and Ikeuchi [2] and Shen and Zheng [1] test images . . . . . 50
Figure 15 – Results for public domain photography . . . . . . . . . . . . . . . . . . . . 51
Figure 16 – Dataset artifact in the fruit image from the Shen and Zheng [1] dataset . . . 53
Figure 17 – Results for the lady image . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

List of Tables

Table 1 – PSNR evaluation of the recovered diffuse component . . . . . . . . . . . . . 46
Table 2 – SSIM evaluation of the recovered diffuse component . . . . . . . . . . . . . 46
Table 3 – CIE76 color difference evaluation of the recovered diffuse component . . . . 47
Table 4 – CIE94 color difference evaluation of the recovered diffuse component . . . . 47
Table 5 – CIEDE2000 color difference evaluation of the recovered diffuse component . 47
Table 6 – PSNR evaluation of the recovered diffuse component in presence of AWGN . 48
Table 7 – Runtime evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

List of Symbols

 Input image

 Diffuse reflection component

 Specular reflection component

 D Diffuse weighting factor

 S Specular weighting factor

 Diffuse chromaticity

 Specular chromaticity

 min Dark (minimum) channel

 Pseudo-specular-free representation

 Histogram matching reference

 Histogram matching output

( ) Two-dimensional image coordinates

⋅R Red channel

⋅G Green channel

⋅B Blue channel

⋅Y Luminance component

⋅Cb Blue difference chroma component

⋅Cr Red difference chroma component

Contents

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.1 Aims and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.1 Dichromatic Reflection Model . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2.2 Dark Channel Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2.3 Pseudo-Specular-Free Representation . . . . . . . . . . . . . . . . . . . . . 16
1.2.4 YCbCr Color Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2.5 Histogram Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3.1 Early Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3.2 Image Processing Pipelines . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3.3 Image Enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.4.1 Single-Image Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.4.2 Real-Time Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.4.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2 METHODOLOGICAL PRELIMINARIES . . . . . . . . . . . . . . . . 28
2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2 Dichromatic Reflection Model . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3 Dark Channel Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4 Pseudo-Specular-Free Representation . . . . . . . . . . . . . . . . . . . 30
2.4.1 Decomposition With Respect to the Diffuse Reflection Component . . . . . 31
2.5 YCbCr Color Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.5.1 Dichromatic Reflection Model YCbCr Components . . . . . . . . . . . . . 32
2.5.2 Dark Channel YCbCr Components . . . . . . . . . . . . . . . . . . . . . . 33
2.5.3 Pseudo-Specular-Free Representation YCbCr Components . . . . . . . . . . 34
2.6 Histogram Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.6.1 CDF-Matching Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.6.2 Sort-Matching Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.6.3 Exact Histogram Specification . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.6.4 Reference Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.6.4.1 Energy-Based Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.6.4.1.1 Large Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.6.4.2 Inequality-Based Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3 PROPOSED METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.1 Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 Montage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3 Sample Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4 EXPERIMENTAL RESULTS . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3.1 Quantitative Results Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.3.2 Qualitative Results Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.1.1 Chromatic Pixel Assumption . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.1.2 Linear Light Assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.1.3 Normalized Illumination Assumption . . . . . . . . . . . . . . . . . . . . . 57
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.3 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

1 Introduction

This work deals with the problem of specular highlight removal, an open problem in
computer vision [3]. This master’s dissertation details and extends to a great degree findings that
we initially published in a research article [4]. In the article, we disclose our initial findings in a
succinct manner, whereas, in this dissertation, we will describe in greater detail our contributions
in a didactic manner; extend methodological developments; include complementary aspects of
the proposed method; present new results; and include topics not previously discussed.
Accordingly, we have structured our presentation in order to best introduce the core
concepts involved in describing the problem being solved, the specific methodological develop-
ment, the proposed solution for the problem at hand, the experimental results thereof, and the
conclusions. The chapters that comprise this dissertation are enumerated as follows.
In chapter 1 (Introduction), we begin by defining the objective of this work and how
it may be categorized in regard to current scientific literature. We then present the definition
and historical context regarding key concepts related to the method proposed in this work. We
provide a high-level account of each concept that will subsequently be extended in the following
chapter. We punctuate legacy applications and contemporary use cases that require qualities
pertinent to the proposed method (that of being single-image and real-time). Keep in mind that
these qualities greatly narrow the number of works presenting fast single-image solutions. We
conclude this chapter by surveying the state of the art of single-image specular highlight removal
methods.
In chapter 2 (Methodological Preliminaries), we look further into the physical reflection
model adopted, the dichromatic reflection model [5], particularly its normalized diffuse and
specular chromaticities extension [2], from which we will present useful results that simplify the
treatment of this problem. The main result is that, based on an intermediate pseudo-specular-free
representation that is fully contained in the diffuse component [6], we can show that it is possible
to propose an effective diffuse estimation mechanism based on an intensity transformation of the
pseudo-specular-free representation.
In chapter 3 (Proposed Method), based on the results obtained in the previous chapter,
we propose a method to estimate the diffuse reflection component from a single input image.
The main idea is that of matching the histogram of the luminance component of the pseudo-
specular-free representation to the histogram of the luminance component of the input image,
thereby transforming the intensities of the pseudo-specular-free luminance. Because the pseudo-
specular-free representation is demonstrably free of the specular reflection component, the output
of histogram matching will also be so. Furthermore, histogram matching presents linear time
complexity [7], and thus our strategy may be implemented in real-time (constant time per pixel).

Chapter 1. Introduction 12

We present a diagram overviewing the proposed approach accompanied by a descriptive account
of each block.
 In chapter 4 (Experimental Results), we present to a great extent quantitative and qual-
itative results for the proposed method. Regarding quantitative results, we analyze the task
of diffuse reflection component estimation from a single image by state-of-the-art methods
[1, 2, 8, 9, 10, 11, 12, 13, 14] and by the proposed method with several metrics. First with peak
signal-to-noise ratio (PSNR), a traditional estimation error metric based on the mean-squared
error (MSE); next with structural similarity (SSIM) [15], a robust metric that additionally con-
siders perceived change in structural information; then with color difference based on the ISO
11664-4 (CIE76) [16], CIE 116 (CIE94) [17], and ISO 11664-6 (CIEDE2000) [18] standards;
after that with PSNR in presence of additive white Gaussian noise (AWGN); and finally with
computational runtime. Regarding qualitative results, we present images processed by state-of-
the-art real-time methods alongside images processed by the proposed method. We include two
sets of standard test images and one set of photographic images.
 In chapter 5 (Conclusion), we conclude this work by providing an account of the limi-
tations of the proposed method alongside possible remedies for known limitations. Some limi-
tations may be treated in future works, for which we propose research leads that may provide
novel solutions or enhance existing solutions for the problem of specular highlight removal. In
the last section of this chapter, we present our final remarks, summarizing this work.

1.1 Aims and Objectives
 The problem of separating diffuse and specular reflection components from images (both
analog and digital) dates as far back as 1985, when, in a seminal work, Shafer introduced a
practical reflection model to describe how light is formed with respect to body (diffuse) and
illumination (specular) reflection components [5]. In his work, he summarized the physics re-
garding the reflection of observed light by a linear model of these two reflection components, the
diffuse and the specular reflection components. (Precisely what constitutes a diffuse or a specular
reflection component is deferred to following section.)
 This theoretical advancement greatly simplified the analysis of images with respect to
these two components. In brief, he presented a dichromatic reflection model where these two
components are linearly additive. For instance, we may decompose (analyze) an observed image
into these two linear components, and those two components separately may also reconstruct
(synthesize) an observed reflection.
 In computer graphics, it is not uncommon to rely on reflection models to render views.
There, we often have access to necessary intrinsic information with respect to given objects in a
scene, and thus synthesizing a view is as straightforward as evaluating a reflection model given
known parameters [19].

Chapter 1. Introduction 13

In image analysis, the configuration is exactly the opposite. That is, we assume no in-
formation whatsoever and we deal with already formed, generated, or synthesized images. To
better illustrate, we may think of synthesis as the process of rendering an image given a scene,
an object, and an illumination, whereas we may think of analysis as the process of (often blindly)
decomposing an image that has been already formed, digitally or not.
In this context, the work herein is best aligned with image analysis. We will deal with
a single frame (image), and the objective of our analysis is to decompose a single input frame
into its diffuse reflection component. This operation is also known in practice under the name of
specular highlight removal, since we may obtain the diffuse component by simply subtracting
the specular component.

Figure 1 – Illustration of the decomposition of an input image into diffuse and specular reflection
components

Source: author

We also wish to perform this decomposition with as cheap computational cost as possible
in order to accommodate real-time application use cases. That is to say, in order to provide appli-
cations with an image processing building block that does not impact the overall computational
efficiency of a given system.
In summary, the general objective of this work is to contribute to the most current sci-
entific literature on single-image specular highlight removal, and the specific objective of this
work is to propose a real-time method to achieve this effect.

1.2 Concepts
Throughout this work, we will refer to concepts well established in the literature of single-
image specular highlight removal methods. This section seeks to provide a concise reference for
each concept at a high level. The concepts presented are the dichromatic reflection model; the
dark channel prior; pseudo-specular-free representations; the YCbCr color space; and histogram
matching. We have put great effort into illustrating these concepts to complementarily offer a
visual summary.

Chapter 1. Introduction 14

1.2.1 Dichromatic Reflection Model
 The first in a series of concepts that we will define is the dichromatic reflection model [5].
The dichromatic reflection model is a general reflection model of light reflected from inhomo-
geneous materials based on a physical description of the reflection process. The dichromatic
reflection model is widely used for image color analysis, intrinsic image decomposition in com-
puter vision, and image rendering in computer graphics [20].
 The dichromatic reflection model represents the total radiance of reflected light from an
inhomogeneous object as the sum of two independent parts, the radiance of light reflected at the
interface between the air and the surface medium, and the radiance of light reflected from the
surface body of the material.
 Put differently, the dichromatic reflection model distinguishes between two types of
reflection, namely the body (diffuse) and the interface (specular) reflection. The model effec-
tively defines that, upon formation, an observed image is composed of a linear weighted sum of
functions corresponding to each of these two types of reflection.
 Furthermore, according to the dichromatic reflection model, an observed lightness can
be decomposed into a linear composition of body color that is independent of imaging geometry
scaled by a magnitude scale factor that depends only on geometry and is independent of body
color, and an interface color that is independent of imaging geometry scaled by a magnitude
factor that depends only on geometry and is independent of interface color.
 Specific quantities that each of these components contribute to the composition of the
final lightness of a picture element in an observed image are unknown, not identical, and typically
differ from one element to another in the image.
 It is now worth making the point that, for the purpose of image analysis, the dichromatic
reflection model by Shafer [5] introduces computational tractability that is otherwise unavailable.
For instance, given an already formed digital image, i.e., a multidimensional array, a great amount
of detail is largely unknown a priori, such as reflection coefficients (magnitude scale factors, i.e.,
weights), chromaticities (i.e., colors), and photometric angles (i.e., imaging geometry). Thus,
the separation of diffuse and specular reflection components is an ill-posed problem of intrinsic
image decomposition [21] where adopting the dichromatic model yields better tractability.
 To help define the contemporary concepts of diffuse and specular reflections components
with respect to the dichromatic reflection model, particularly in the context of computer vision,
we may directly consult reference work entries corresponding to each concept, namely, “Diffuse
reflectance” [22, p. 209], and “Specularity, specular reflectance” [3, p. 750].
 In light of the definitions in the reference work entries, we remark that these concepts
have largely acquired loose definitions depending on the particular field of scientific literature
in which they are used. For instance, Shafer himself, in choosing the correct terminology for
the development of his work, noted that the meaning surrounding the terms diffuse and specular

Chapter 1. Introduction 15

reflection changes according to the field of scientific literature in which it is used [5].
Further, we remark that, even in contemporary scientific works, the meaning surrounding
the dichromatic reflection model changes according to usage. In the latter case, the dichromatic
reflection model is commonly used interchangeably in reference to the decomposition of an
image into diffuse and specular components.
In this work, the terminology employed will be aligned with the contemporary usage
of specular and diffuse terms, particularly with the literature of specular highlight removal
methods. To be specific, the meaning of employed terminology draws from the Tan and Ikeuchi
[2] interpretation of the dichromatic reflection model by Shafer in the context of digital color
image formation.
In Figure 1, we include an illustration of the decomposition of an input image into its
diffuse reflection component and its specular reflection component . (In fact, we include this
illustration inspired by Tan [3, Fig. 1].)
In Figure 2, we include an illustration of the decomposition of an input image into its
diffuse weight map D and diffuse chromaticity and its specular weight map S and specular
chromaticity , where “⊙” denotes element-wise product, according to the Tan and Ikeuchi [2]
normalized chromaticities interpretation of the dichromatic reflection model.

Figure 2 – Illustration of the decomposition of an input image into diffuse and specular weight
maps and chromaticities

Source: author

1.2.2 Dark Channel Prior
In our work, we rely on the concept of a dark channel, denominated after an image prior
called dark channel prior. There are two distinct image priors called dark channel prior.
The first occurrence of this image prior (precisely under this denomination) comes from
an impactful work in single image haze removal. In the context of haze removal, He et al. [23,
Section 3] define the concept of a dark channel, the outcome of two (commutative) minimum
operators. The first minimum operator is the minimum along color channels, performed on each
pixel. The second operator is a minimum filter. He et al. [23] employ the dark channel to improve
atmospheric light estimation.

Chapter 1. Introduction 16

The second occurrence of this image prior comes from industry work in single-image
specular reflection separation. In the context of specular reflection separation, Kim et al. [24,
Section 4], motivated by He et al. [23], define the dark channel as the lowest intensity value among
RGB channels at each pixel. Kim et al. [24] utilize this definition to obtain a pseudo specular-free
image, obtained by subtracting the dark channel from all color channels. The definition of Kim
et al. [24] relates to that of He et al. [23] by not employing an order-statistic minimum filter.
However, as far as the historical context about this concept is concerned, the definition
of Kim et al. [24] appears in earlier work, also on separation of reflection components. Yoon et
al. [6, Section 3] define a specular-free two-band image obtained by subtracting the minimum
along color channels from all color channels. Although Yoon et al. [6] did not explicitly assign
a denomination to the minimum along color channels, the definition is the same in both works.
Nevertheless, we adopt the nomenclature by Kim et al. [24] because it has been recently employed
in other relevant industry work [14].
Particular to this work, the dark channel is simply the minimum operator along the three
linear RGB color channels, performed for each pixel in an image, per Yoon et al. [6] or Kim et
al. [24].
The usefulness of the dark channel has a straightforward derivation based on how it is
decomposed with respect to the dichromatic reflection model (see, e.g., Yoon et al. [6]). In the
next chapter, we show this derivation and extend it to YCbCr color space.

1.2.3 Pseudo-Specular-Free Representation
The concept of a pseudo-specular-free representation refers to an easily obtainable
specularity-invariant color image representation, typically employed in the process of obtaining
the diffuse component. It is an intermediate representation used in specular highlight removal
methods.
Particular to methods that work with a single input image, this representation is broadly
employed as a geometrical profile reference for the diffuse reflection component, or as an initial
estimate of the diffuse reflection.
The pseudo-specular-free representation is likewise called the specular-free image [2],
the specular-free two-band image [6], the modified specular-free image [8, 25], or the pseudo
specular-free image [24]. Each denomination is associated with a proposed definition.
Tan and Ikeuchi [2] generate a specular-free image by setting the diffuse maximum chro-
maticity equal to an arbitrary scalar value. This definition provides a specular-free representation
that preserves hue but distorts the saturation of the image.
Yoon et al. [6] generate a specular-free two-band image by subtracting, for each pixel,
the minimum along color channels from all color channels. (It will be shown in this work that

Chapter 1. Introduction 17

this definition provides a specularity-invariant representation that preserves chroma.) Kim et
al. [24] generate a pseudo specular-free image using the same definition.
Shen and Cai [8] and Shen et al. [25] generate a modified specular-free image by ex-
tending the specular-free two-band image [6]. In [8], the authors propose adding a scalar value
to offset the specular-free two-band image. This increases the robustness of the specular-free
chromaticity with respect to imaging noise. In [25], the authors further extend the proposal of
Shen and Cai [8] by making the offset pixel dependent.
In this work, the definition of the employed pseudo-specular-free representation is by
Yoon et al. [6], the same as Kim et al. [24]. We illustrate in Figure 3 how to obtain this represen-
tation, wherein illustrates an input image, min illustrates its dark (minimum) channel, and
illustrates its pseudo-specular-free representation.
We will leverage the demonstrably specular-free geometric profile of the pseudo-specular-
free representation by transforming the intensity values of the luminance component of the two-
band specular-free image through histogram matching to the luminance component of the input
image.

Figure 3 – Illustration of the generation of a pseudo-specular-free representation

Source: author

1.2.4 YCbCr Color Space
YCbCr refers to a family of color spaces that is especially common in digital image and
video signal processing pipelines [26]. The Y component refers to luminance, the Cb component
refers to blue color difference, and the Cr component refers to red color difference. The Cb
and Cr components are likewise called chroma components. The direct transformation from
RGB primaries to YCbCr components is composed of a linear transformation. That is, the
transformation consists in a series of scalar multiplications of the RGB channels by coefficients
followed by offset additions. (Together, coefficients and offsets parametrize a specific YCbCr
color space definition.) In addition, much like the direct transformation, the inverse transformation
is also linear.
We can find definitions of parameters for the direct and inverse transformation from the
YCbCr color space to RGB included in the ITU-R BT.601 [27] for the digital coding of standard

Chapter 1. Introduction 18

definition television (SDTV) video signals; in the ITU-R BT.709 [28] for the production and
international exchange of high definition television (HDTV) programmes; and in the ITU-R
BT.2020 [29] for the production and international exchange of ultra-high definition television
(UHDTV) programmes.
In practice, the YCbCr color space is constructed for encoding perceptual uniformity
while maintaining desirable qualities such as that of being a computationally cheap linear trans-
formation from and to. In the context of image and video signal coding, it is useful e.g., for data
compression. Most commonly, chroma components may be heavily compressed without loss
of perceptual quality, leveraging our decreased visual acuity for color in contrast to lightness.
Thus, chroma is often more heavily subsampled (spatially), quantized, or bandwidth-reduced
(temporally) than luminance is.
Therefore, practical applications often process luminance and chroma components inde-
pendently (see, e.g., [26, pp. 528-529]). In this work, we will precisely do that. In fact, one of the
main results of this work is that we show that the chroma components are specular-free under a
few assumptions.
To summarize the reason of adopting the YCbCr color space in this work, other than the
fact that it encodes color separately from lightness and that it is widely standardized in image and
video signal processing pipelines, it is first and foremost a linear coordinate transformation from
and to RGB. Therefore, as will be clear in the next chapter, we will be able to naturally extend
the dichromatic reflection model by simply evaluating the underlying coordinate transformation
(i.e., the matrix multiplication plus offsetting term) and analyzing diffuse and specular weight
maps and chromaticities in the resulting luminance and chroma components.
To further complement our reasoning, remark that other color spaces that aspire to encode
for perceptual uniformity, most notoriously CIELAB [16], do so in a bijective nonlinear mapping
manner. Therefore, pixel-wise value transformation of perceptually encoded luminosity values in
CIELAB will not be linear in RGB domain. Thus, the dichromatic reflection model linear light
assumption would be violated.

1.2.5 Histogram Matching
In order to better introduce the concept of histogram matching, we begin by introducing
the concept of histogram equalization. The technique of histogram equalization first appeared
in the context of real-time image enhancement for cockpit display systems [30]. Histogram
equalization refers to the task of adjusting the histogram of an input image such that its empirical
distribution of intensity values (i.e., its histogram) best approximates a uniform distribution.
Now, what if we wanted to specify another distribution? Such technique would generalize
the technique of histogram equalization since we would be able to specify any distribution
arbitrarily, including the uniform distribution. Such technique is called histogram matching.

Chapter 1. Introduction 19

Histogram matching refers to the task of adjusting the histogram of an input image such that its
empirical distribution of intensity values best approximates (i.e., matches) a reference distribution.
The technique of histogram matching is likewise known as histogram specification, modeling,
or transfer [31].
In a widely adopted digital image processing textbook, Gonzalez and Woods [32] mention
that histogram manipulation is a fundamental tool in image processing; that histogram manipu-
lation is amenable to fast hardware implementations; and that histogram-based techniques are
common in real-time image processing.
Indeed, later practical research papers such as Rolland et al. [7] remark the high computa-
tional speed of histogram matching. Rolland et al. [7] show that histogram matching approaches
based on look-up tables (LUTs) have linear time complexity in number of pixels and discrete
intensity values, and approaches based on sorting have linear times logarithmic time complexity
in number of pixels (irrespective of the number of discrete intensity values).
In one remarkably interesting example, we can find a histogram matching program in ISIS
(Integrated System for Imagers and Spectrometers), a digital image processing software package
developed by the USGS (United States Geological Survey) for NASA (National Aeronautics and
Space Administration).1 It is used in equalization and tone matching applications for radiometric
and photometric correction (e.g., to generate tone-matched mosaics).
In other interesting examples, histogram matching has been applied to compensate light
attenuation in microscopy [33]; to match colors in twin cameras in stereoscopic cinema [34];
and to extend signal strength in tomography images in ophthalmic imaging [35].
In this work, we will employ a histogram matching block to match the intensity of the
luminance component of a pseudo-specular-free representation (input) to the intensity of the
luminance component of the input image (reference).

1.3 Applications
It is important to precisely locate which applications stand to benefit the most from the
contributions of this work. For that, first we briefly overview how diffuse and specular reflections
relate to legacy applications in computer vision, next we introduce the applied concept of an
image processing pipeline, and after that we highlight the most current applications of diffuse
and specular reflection components in image enhancement.
1 USGS: ISIS histmatch Application Documentation. Accessed: Feb. 05, 2021. [Online]. Available:
https://web.archive.org/web/20210205042108if_/https://isis.astrogeology.usgs.gov/
Application/presentation/PrinterFriendly/histmatch/histmatch.html

Chapter 1. Introduction 20

1.3.1 Early Computer Vision
Here, what we refer to as early computer vision in fact is a denomination we employ to
refer to classic low-level vision works, in general, before the advent of learning-based praxes (of
which they paved the way to).
In early computer vision works, to ensure computational tractability or complexity re-
duction of ill-posed problems, algorithms typically considered that the reflection of observed
objects could be described by a low-parametric model of perfectly diffuse reflection such as the
Lambertian model [36]. In the Lambertian model, reflectance does not depend on viewing direc-
tion. A surface presenting Lambertian reflectance reflects incident light equally in all directions
(i.e., appears equally bright in all directions).
In a seminal work in low-level vision, Woodham [37] employed a perfectly diffuse model
of surface reflectance in order to propose a technique for inferring scene geometry, which he
called photometric stereo, a reflectance map determining surface orientation at each point by
varying the direction of incident illumination between successive views.
However, the Lambertian model does not account for specular reflections [36]. Indeed,
in a reference work entry, Tan [3, p. 752] remarks that many existing algorithms in computer
vision assume perfect diffuse surfaces; and that such algorithms regard specular reflections as
outliers. Hence, it is not unusual to find computer vision algorithms that employ methods for the
separation of reflection components in preprocessing steps.
In contrast, information conveyed by specular reflections is usually employed directly
in other computer vision algorithms. In [38], Tan et al. have proposed estimating illumination
chromaticity by analyzing specular reflections. In [39], Adato et al. have investigated at great
length the problem of specular shape reconstruction from specular flow.

1.3.2 Image Processing Pipelines
Before advancing, we shall make a small detour. We should now define the concept of an
image processing pipeline. An image processing pipeline loosely refers to a sequence of image
signal processing steps employed for a specific function. We highlight two relevant resources to
help us grasp this concept more precisely. First, a magazine article by Ramanath et al. [40] that
surveys the digital still camera processing pipeline. (On a side note, this particular article dates
before the advent of smartphones. Unsurprisingly, it still is by and large relevant! The digital
still camera has merely miniaturized since.) Second, a reference work entry by Corcoran and
Bigioi [26] that discusses in great depth practical considerations of image processing pipelines
in the consumer digital imaging industry.
In short, processing steps in consumer digital imaging systems include, and are by no
means limited to, demosaicing, sensor and lens compensation, color processing, autofocus, expo-
sure, and compression. These steps refer to functional blocks. Associated with a sequence, they

Chapter 1. Introduction 21

define an image processing pipeline, which in turn is typically implemented in an image signal
processor.
Admittedly, the above definition is very generic. Indeed, the set (and sequence) of blocks
involved in an image processing pipeline differs from manufacturer to manufacturer and addi-
tionally differs from application to application [40]. In this case, the above definition refers to
an image processing pipeline in consumer digital imaging. To better understand this concept in
other applications, we turn to industry applications to provide concrete examples.
We present in Figure 4, 5, 6 exemplary pipelines describing underlying image processing
of a projector, a display device, and an imaging device, respectively. These figures are excerpted
from the patent [41], which, in brief, discloses an image processing apparatus for correcting an
image. Specifically, the inventors disclose a strategy to adaptively control influence of illumina-
tion light in devices capable of image processing to the effect of image enhancement. We note
that Figure 4, 5, 6, and the accompanying discussion that follows are merely included for the
sole purpose of exemplifying typical image processing pipelines found in industry. We intend
no judgment on the underlying invention.
In Figure 4, the inventors describe a projector. The described apparatus includes an input
signal processor (e.g., to allow for analog, digital, or mixed-signal input), an image corrector,
a timing controller (e.g., to generate a display control signal), and an optics unit. Summarily, a
projector receives an input signal, preprocesses it into an intermediate signal, corrects it, generates
a display control signal, and projects it.

Figure 4 – [A] diagram showing one exemplary configuration of an image processing apparatus

Source: U.S. Patent 9,053,539 [41, Fig. 1]

In Figure 5, the inventors describe a display device. The display device includes an input
signal processor, an image corrector, a timing controller, and a (display) panel. In short, an
electronic visual display receives an input signal, converts it to an intermediate signal, corrects

Chapter 1. Introduction 22

it, generates a display control signal, and visually displays it. (If you are reading this document
electronically, chances are that your display device has a similar architecture.)

Figure 5 – [A] diagram showing a configuration of another image processing apparatus

Source: U.S. Patent 9,053,539 [41, Fig. 13]

In Figure 6, the inventors describe an imaging device. The described apparatus includes an
imaging optics unit, an image corrector, and an imaging display unit. One recording/reproducing
unit is included to additionally confer input/output functionality to a storage medium. Summarily,
an imaging device acquires (senses) an image, transforms it into digital data, corrects it, and
optionally displays or reproduces it.

Figure 6 – [A] diagram showing a configuration of still another image processing apparatus

Source: U.S. Patent 9,053,539 [41, Fig. 14]

Whereas in Figure 4, 5, 6 the image corrector block refers to the specific innovative
disclosure of [41], we may abstract it and notice that an image corrector may be simply considered
as one processing step in a larger image signal processing system, which in turn may be composed
of multiple underlying processing steps.
The qualities set forth as the specific objective of this work, i.e., that of proposing a single-
image real-time method, should be clearer given the discussion of this application. In what was
presented in this subsection, we add that apparatuses such as projectors, display devices, and
imaging devices not only are required to work with a single image buffer but are also required to
provide timely processing, otherwise there is a risk of violating the functional requirements of
the entire image processor system due to untimeliness.
To conclude this subsection, we remark that the two underlying categories of devices,
that of image signal reproduction apparatuses (e.g., projectors, display devices) and image sig-

Chapter 1. Introduction 23

nal recording apparatuses (e.g., imaging devices), are useful in distinguishing real-time image
enhancement applications in consumer electronics.

1.3.3 Image Enhancement
 In general, image enhancement refers to methods and applications that seek to provide
aesthetic improvements and corrections to the projection, viewing, and sensing of images. For
instance, in the context of tone mapping in visual display technologies, image enhancement
refers to the key image signal processing task involved with enhancing the readability and the
perceived image quality of displays under the influence of the ambient light [42].
 To this end, diffuse and specular reflection components can be employed by applications
both directly and indirectly to achieve image enhancement effects. Examples of direct applications
include dichromatic editing [43], while examples of indirect applications include tone mapping
for high dynamic range (HDR) displays [44], apparent gloss improvement [45, 46], and inverse
tone mapping of low dynamic range (LDR) images [47, 48, 49, 50].

Figure 7 – Dichromatic editing examples. In each case a visual effect is simulated by indepen-
 dent processing of the recovered specular and diffuse components. (a) Input image.
 (b) Wetness effect by sharpening the specular component. (c) Skin color change by
 varying the intensity of the diffuse component. (d) Effect of make-up by smoothing
 the diffuse component and removing the specular component. (e) Input image. (f)
 Sharpened specular lobe, as would occur if the surface was more smooth. This is
 achieved by eroding the specular component using a disk-shaped structuring element
 and amplifying it. (g) Effect of an additional light source obtained by exploiting the
 object symmetry and reflecting the specular component about the vertical axis. (h)
 Avocado-like appearance by modulating the specular component. © 2006, Springer

 Source: Mallick et al. [43, Fig. 6]

Chapter 1. Introduction 24

In an exemplary direct application, Mallick et al. [43] introduced the concept of dichro-
matic editing, an application stemming from the independent processing of separated diffuse
and specular reflection components to produce a variety of visual effects. Mallick et al. [43]
depict applications in photo editing and e-cosmetics, where examples of simulated visual effects
include make-up, surface roughening, and wetness. In Figure 7, we excerpt a figure from Mallick
et al. [43] to illustrate dichromatic editing examples.
An indirect application consists in tone mapping for HDR displays. In [44], Meylan et al.
proposed a piecewise linear scale function to tone map standard dynamic range (SDR) images
based on first segmenting the input image into diffuse and specular components and then scaling
them differently. Meylan et al. [44] propose detecting specular highlights by applying low-pass
filters and morphological operators. Then, a piecewise linear function composed of two different
slopes scales the diffuse and the specular segmented regions separately.
Another indirect application refers to gloss enhancement. In industry-led research and
development (R&D) papers, Hasegawa et al. [45] propose a video processing method to im-
prove apparent gloss by first detecting specular highlight areas, then enlarging differences of
brightness between specular highlight and surrounding areas, and finally expanding highlight
areas by visual highlight enhancement (e.g., adding controlled glare). Kobiki et al. [46] propose
a specular reflection control technology to increase glossiness for next-generation displays by
emphasizing/suppressing the specular reflection image and recombining it with the diffuse re-
flection image. In Figure 8, we excerpt a figure from Hasegawa et al. [45] to illustrate their video
processing method.
Still another indirect application is inverse tone mapping to convert LDR content to HDR.
In [48], Huo and Yang propose a dynamic range expansion method that consists in detecting,
linearly boosting, and recombining highlight areas separately; and in [50], Saha et al. propose a
method to obtain an HDR-like image from a single LDR image by combining specular highlight
removal and low-light image enhancement techniques.
We strongly believe that, equipped with a computationally efficient image processing
block that provides fast and high-quality specular highlight removal, products of the above image
enhancement applications can be greatly powered. Particularly considering that these applications
by and large target systems that are typically implemented in image processing pipelines.

Source: Hasegawa et al. [45, Fig. 1]

Chapter 1. Introduction 25

1.4 Related Work
Plentiful methods have been proposed for the task of specular highlight removal. It is
worth mentioning that these methods observe a common classification scheme with respect to
key underlying characteristics. In an important survey, Artusi et al. [51] propose classifying spec-
ularity removal methods in terms of techniques used, number of images requires, user interaction
(e.g., segmentation), light requirement (e.g., illuminant compensation, dichromatic reflection
model, flash model), and hardware [51, Table 2].
Of particular interest to our problem setting, we are concerned with methods that are
single-image, automatic, and most importantly, real-time. The requirement for multiple input
images precludes a number of applications, particularly applications where the input image is
already in digital form. Therefore, since the work of Tan and Ikeuchi [2] most of the proposed
methods have used only a single input image. In addition, such applications naturally have
real-time operational requirements. Furthermore, to the best of our knowledge, besides being
automatic, all real-time methods use a single input image. In what follows, we will survey single-
image methods and real-time methods.

1.4.1 Single-Image Methods
Tan and Ikeuchi [2] should be greatly acknowledged in regard to their contributions for
the automatic separation of reflection components from a single image. Prior to their work, all
methods using a single input image required manual color segmentation. They have introduced the
concept of chromaticity analysis based on a normalized chromaticity extension of the dichromatic
reflection model. Their proposed method is fully automatic and uses a single colored image. Their
formulation proposed that diffuse pixels propagate their chromaticity to specular regions, detected
by logarithmic differentiation with respect to a specular-free image.
Kim et al. [24] incorporated the concept of the dark channel prior and were the first
to approach the specular reflection separation problem from an optimization standpoint. They
proposed a maximum a posteriori (MAP) approach that incorporates desirable image priors such
as smoothly varying specular reflection and edge-preserving diffuse chromaticity. The MAP
optimization framework consists in minimizing TV- 2 and TV- 1 subproblems, making it a
high computational complexity method.
Akashi and Okatani [10] introduced a framework that incorporated non-negative matrix
factorization (NNMF) with a sparsity constraint that limited the number of colors used to compose
the image, taking advantage of the fact that natural images have a limited number and composition
of colors. One of the bases of the factorization was the illuminant itself, and a cost function was
formulated to penalize the use of illuminant color.
Suo et al. [11] extended the dichromatic reflection model in terms of 2 -normalized chro-
maticities and additionally formulated the highlight removal problem such that the illuminant is

Chapter 1. Introduction 26

orthogonal to one subspace in their 2 chromaticity definition. Their approach required adaptive
clustering for the estimation of region-specific purely diffuse colors.
Ren et al. [12] introduced a method based on color-lines that jointly estimated illumi-
nant color and recovered diffuse colors by first clustering the image using a modified nearest
neighbor technique and recovering the diffuse coefficient by searching along the radius in a
polar-formulated coordinate system.
Guo et al. [13] introduced a sparse and low-rank formulation related to the sparse non-
negative matrix factorization approach initially formulated by Akashi and Okatani [10]. They
propose that diffuse weights are few in number and in composition (i.e., that diffuse weights are
both sparse and low rank), and that specular weights are likewise sparse. They introduce two
auxiliary variables to incorporate these formulations and iteratively solve a constrained nuclear
norm and 1 -norm minimization of an augmented Lagrangian function.
Son et al. [14] modeled the general properties of diffuse and specular reflections in a
convex optimization framework. The authors additionally attack a limitation of specular removal
based on the dichromatic reflection model, specifically that of failing to remove specular reflec-
tions from achromatic regions, by explicitly including generic image priors applicable to natural
images.

1.4.2 Real-Time Methods
Yoon et al. [6] were the first to introduce the two-band specular-free image, obtained
by subtracting the minimum among the three RGB channels from the input image. They pro-
posed comparing neighbor intensity ratios to corresponding ratios in the two-band specular-free
representation and propagating diffuse ratios. They were also the first to be concerned with the
timeliness of the underlying method for separation of reflection components.
Shen et al. [8] modified the two-band specular-free image by Yoon et al. [6] to make
its chromaticity more robust to noise by adding an offset. They treated the highlight removal
problem by solving the dichromatic reflection model least-square problem for mixed specular-
diffuse regions with regions that are purely diffuse and have the least distance in chromaticity
coordinates.
Shen and Cai [25] solved the removal problem by first segmenting into mixed specular
and diffuse and purely diffuse, then correcting the values of specular regions by solving for a
constant adjustment gain under the criterion of smooth color transition along the boundary of
highlight and surrounding regions.
Yang et al. [9] introduced a real-time method rooted in the chromaticity analysis work
of Tan and Ikeuchi [2]. They proceeded by employing a joint bilateral filter to smooth out the
maximum chromaticity of the observed image, using a specular-free guide that has no specular
geometry features. Specular phenomena are considered noise in this filtering approach. At the

You can also read