FPGA Design of Enhanced Scale-Invariant Feature Transform with Finite-Area Parallel Feature Matching for Stereo Vision

Page created by Anna Austin
 
CONTINUE READING
FPGA Design of Enhanced Scale-Invariant Feature Transform with Finite-Area Parallel Feature Matching for Stereo Vision
electronics
Article
FPGA Design of Enhanced Scale-Invariant Feature Transform
with Finite-Area Parallel Feature Matching for Stereo Vision
Chien-Hung Kuo, Erh-Hsu Huang, Chiang-Heng Chien and Chen-Chien Hsu *

                                          Department of Electrical Engineering, National Taiwan Normal University, Taipei 10610, Taiwan;
                                          chk@ntnu.edu.tw (C.-H.K.); 60675013h@ntnu.edu.tw (E.-H.H.); chiangheng.chien@gmail.com (C.-H.C.)
                                          * Correspondence: jhsu@ntnu.edu.tw; Tel.: +886-2-7749-3551

                                          Abstract: In this paper, we propose an FPGA-based enhanced-SIFT with feature matching for stereo
                                          vision. Gaussian blur and difference of Gaussian pyramids are realized in parallel to accelerate the
                                          processing time required for multiple convolutions. As for the feature descriptor, a simple triangular
                                          identification approach with a look-up table is proposed to efficiently determine the direction and
                                          gradient of the feature points. Thus, the dimension of the feature descriptor in this paper is reduced
                                          by half compared to conventional approaches. As far as feature detection is concerned, the condition
                                          for high-contrast detection is simplified by moderately changing a threshold value, which also
                                          benefits the reduction of the resulting hardware in realization. The proposed enhanced-SIFT not only
                                          accelerates the operational speed but also reduces the hardware cost. The experiment results show
                                          that the proposed enhanced-SIFT reaches a frame rate of 205 fps for 640 × 480 images. Integrated
                                          with two enhanced-SIFT, a finite-area parallel checking is also proposed without the aid of external
         
                                   memory to improve the efficiency of feature matching. The resulting frame rate by the proposed
Citation: Kuo, C.-H.; Huang, E.-H.;       stereo vision matching can be as high as 181 fps with good matching accuracy as demonstrated in
Chien, C.-H.; Hsu, C.-C. FPGA             the experimental results.
Design of Enhanced Scale-Invariant
Feature Transform with Finite-Area        Keywords: SIFT; FPGA; feature detection; feature descriptor; feature matching; stereo vision; Epipolar
Parallel Feature Matching for Stereo
Vision. Electronics 2021, 10, 1632.
https://doi.org/10.3390/
electronics10141632                       1. Introduction
                                                Recently, vision-based simultaneous localization and mapping (V-SLAM) techniques
Academic Editors: Stefano Ricci and
                                          have become more and more popular due to the need for the autonomous navigation
Akash Kumar
                                          of mobile robots [1,2]. The front-end feature point detection and feature matching are
                                          especially important because their accuracy will significantly influence the performance of
Received: 15 May 2021
Accepted: 7 July 2021
                                          back-end visual odometry, mapping, and pose estimation [3,4]. In the front-end schemes,
Published: 8 July 2021
                                          although speed-up robust features (SURF) exhibit a faster operational speed, its accuracy is
                                          worse than scale-invariant feature transform (SIFT) [5,6]. Nevertheless, the high accuracy
Publisher’s Note: MDPI stays neutral
                                          of SIFT is achieved at the cost of a time-consuming process. Although CUDA implemen-
with regard to jurisdictional claims in
                                          tations on GPUs for parallel programming can be used for various feature detection and
published maps and institutional affil-   matching [7,8], the software-based approaches generally require a larger power consump-
iations.                                  tion which is not desired for mobile robot vision applications. Thus, this paper aims to
                                          improve the operational speed of SIFT by hardware implementation without sacrificing
                                          its accuracy. Over the past years, many researchers were devoted to improving the op-
                                          erational efficiency of SIFT [9]. Among them, a pipelined FPGA-based architecture was
Copyright: © 2021 by the authors.
                                          proposed in [10] to process images with double speed at the price of a higher hardware
Licensee MDPI, Basel, Switzerland.
                                          cost. A hardware implemented SIFT was also proposed in [11] to reduce the number of
This article is an open access article
                                          internal registers. However, the resulting image frame rate is not high enough because
distributed under the terms and           of finite bandwidth due to external memory. As a result, the efficiency of the subsequent
conditions of the Creative Commons        mapping process could be limited. In [12], multiple levels of Gaussian-blurred images
Attribution (CC BY) license (https://     were simultaneously generated by adequately modifying Gaussian kernels to shorten the
creativecommons.org/licenses/by/          processing time, resulting in a frame rate of 150 fps for 640 × 480 images. Although a
4.0/).                                    few dividers were only used by the modified approach for feature detection, the resulting

Electronics 2021, 10, 1632. https://doi.org/10.3390/electronics10141632                                https://www.mdpi.com/journal/electronics
FPGA Design of Enhanced Scale-Invariant Feature Transform with Finite-Area Parallel Feature Matching for Stereo Vision
Electronics 2021, 10, 1632                                                                                          2 of 24

                             hardware cost is difficult to reduce because of the square function in the design. Besides,
                             the coordinate rotation digital computer (CORDIC) algorithm adopted in finding phases
                             and gradients of the feature descriptor often requires considerable hardware resources and
                             a long latency period due to the needs of multiple iterations, thus significantly preventing
                             the approach from being applied for real-world applications.
                                  As far as feature matching is concerned, one feature point of an image needs to
                             compare with all feature points of the other image to find the matching pairs [13] in
                             conventional exhaustion methods. A large number of feature points would incur extra
                             memory access time. Moreover, the realization of feature matching often consumes a
                             large number of hardware resources. Although the random sample consensus (RANSAC)
                             algorithm [14] can improve the accuracy of feature matching, the resulting frame rate is
                             only 40 fps. Therefore, it has brought a great challenge to improve the operational speed of
                             the SIFT for use with feature matching.
                                  In this paper, we propose an enhanced-SIFT (E-SIFT) to reduce the hardware resources
                             required and accelerate the operational speed without sacrificing accuracy. Gaussian-
                             blurred images and difference of Gaussian (DoG) pyramids can be simultaneously ob-
                             tained by meticulous design of a realizing structure. We also propose a simple triangular
                             identification (TI) with a look-up table (LUT) to easily and quickly decide the direction and
                             gradient of a pixel point. The resulting dimension of feature points can thus be effectively
                             reduced by half. This is not only beneficial to the operational speed of the proposed E-SIFT
                             and the following feature matching but also helpful to reduce the hardware cost. In the
                             feature detection, the condition of high-contrast detection is also modified by slightly
                             adjusting a threshold value without substantial change, thus considerably reducing the
                             required logic elements and processing time.
                                  For stereo vision, we also propose a finite-area parallel (FAP) feature matching ap-
                             proach integrated with two E-SIFTs. According to the position of the feature point of the
                             right image, a fixed number of feature points in the corresponding area of the left image
                             will be chosen and stored first. Then, feature matching proceeds in parallel to efficiently
                             and accurately find the matching pairs without using any external memory. Based on
                             the Epipolar geometry, a minimum threshold is assigned during feature comparison to
                             avoid matching errors caused by the visual angle difference between the two cameras [15].
                             Besides, a valid signal is designed in the system to prevent the finite bandwidth of external
                             SDRAM or discontinuous image data transmission from corrupting the matching accuracy.
                             In the experiments, we use Altera FPGA hardware platform to evaluate the feasibility
                             of the proposed TI with LUT scheme and test the flexibility of the proposed E-SIFT and
                             double E-SIFT with FAP feature matching.

                             2. Proposed E-SIFT and Fap Feature Matching for Stereo Vision
                                  Figure 1 shows the proposed FPGA-based E-SIFT architecture, where an initial image
                             is convoluted by a meticulous design of Gaussian filters to simultaneously generate a
                             Gaussian-blurred image and DoG pyramid. Through a definite mask in the Gaussian-
                             blurred image, the direction and gradient of each detection point can be determined by
                             the proposed TI with LUT method. The results in each local area are then summed and
                             collected to form a feature descriptor. Since the summations among different partition
                             areas present large differences, values in the feature descriptors will be normalized to
                             acceptable ones by a simple scheme without considerably altering the original character of
                             distribution. At the same time, on the right side in Figure 1, the corresponding detection
                             point in the DoG images will pass through extrema, high contrast, and corner detections
                             to decide whether feature points exist. If the result is positive, a confirmed signal and the
                             corresponding feature descriptor will be delivered to the feature matching module.
FPGA Design of Enhanced Scale-Invariant Feature Transform with Finite-Area Parallel Feature Matching for Stereo Vision
Electronics 2021, 10, 1632                                                                                                                                    3 of 24
  Electronics 2021, 10, x FOR PEER REVIEW                                                                                                                       3 of 24

                                    Figure
                                  Figure 1.1.Building
                                              Buildingblocks
                                                        blocksofofthe
                                                                    theproposed
                                                                        proposed  E-SIFT.
                                                                                E-SIFT.

                                  2.1.
                                    2.1.Image
                                         ImagePyramid
                                                 Pyramid
                                         The
                                          Thepurpose
                                                 purposeofofbuilding
                                                               buildingthe  theimage
                                                                                 imagepyramid
                                                                                          pyramidisistotofind findout
                                                                                                                    outfeature
                                                                                                                         featurepoints
                                                                                                                                    pointsfrom
                                                                                                                                            fromthethe
                                  differences    of  consecutive     scale-spaces.    Multi-level     Gaussian-blurred
                                    differences of consecutive scale-spaces. Multi-level Gaussian-blurred images are created images     are created
                                  one
                                    oneby byone
                                              onethrough
                                                   throughaaseries
                                                                seriesof ofGaussian
                                                                            Gaussian functions.
                                                                                         functions. The The speckles
                                                                                                             speckles andand interfering
                                                                                                                               interferingnoises
                                                                                                                                             noisesin
                                  inthe
                                     theoriginal
                                           original   image
                                                   image    cancan   therefore
                                                                 therefore        be reduced
                                                                              be reduced     [9]. [9].
                                                                                                  Then, Then,    the contour
                                                                                                          the contour            of objects
                                                                                                                         of objects           in the
                                                                                                                                      in the original
                                  original
                                    image is image    is vaguely
                                                vaguely    outlined outlined
                                                                        by the by   the differences
                                                                                 differences    between  between    the consecutive
                                                                                                            the consecutive              Gaussian-
                                                                                                                                 Gaussian-blurred
                                  blurred
                                    images.images.
                                               Thus, theThus,   the image
                                                            image     featurefeature
                                                                                can be can    be reserved
                                                                                         reserved              to be
                                                                                                       to be the      the for
                                                                                                                   basis  basis   for subsequent
                                                                                                                               subsequent      feature
                                  feature   detection.     In  general,    long   latency    is expected     in  constructing
                                    detection. In general, long latency is expected in constructing the multi-level Gaussian-     the   multi-level
                                  Gaussian-blurred
                                    blurred pyramids.     pyramids.
                                                             Thus, as Thus,
                                                                         shownasinshown
                                                                                      Figurein2a, Figure
                                                                                                    a more 2a,efficient
                                                                                                                a more efficient
                                                                                                                         structurestructure
                                                                                                                                     had beenhad  pro-
                                  been
                                    posed by simultaneously convoluting all the Gaussian functions, where suitable suitable
                                         proposed     by  simultaneously       convoluting      all the  Gaussian    functions,    where    Gaussian
                                  Gaussian
                                    kernels cankernels   can beby
                                                    be found     found    by training
                                                                    training            the system
                                                                                the system              with different
                                                                                               with different     images images    [12].
                                                                                                                            [12]. To     To further
                                                                                                                                       further   accel-
                                  accelerate    the construction      of the  DoG   pyramid,      a simplified    structure
                                    erate the construction of the DoG pyramid, a simplified structure is proposed in this     is proposed    in  this
                                                                                                                                                   pa-
                                  paper,   where new
                                    per, where      new kernels
                                                          kernels for
                                                                    for convolution
                                                                         convolution are are first
                                                                                               firstderived
                                                                                                     derivedby   bysubtraction
                                                                                                                    subtractionofoftwo  twoadjacent
                                                                                                                                             adjacent
                                  ones
                                    onesasasshown
                                               shownininFigure
                                                           Figure2b. 2b.Then,
                                                                          Then,thethenth
                                                                                       nthlevel
                                                                                            levelDoG      image,DnD(x,
                                                                                                    DoGimage,            y), can be obtained by
                                                                                                                      n(x, y), can be obtained by
                                  directly  convoluting      the  original   image    with  the
                                    directly convoluting the original image with the new kernel:  new   kernel:

                                                Dn ( x, y)D=   L       ( x, y) − L ( x, y) = [ G +n1+(1x,
                                                           n ( x ,ny+)1= Ln +1 ( x, y )n− Ln ( x, y ) = [nG ( xy, y) )−− Gnn((xx,, yy)])]∗∗I (Ix(,x,y )y,),      (1)(1)

                                    whereLnL(x,
                                  where       n(x,
                                                 y)y)and
                                                       andGG   n(x,y)
                                                             n (x,   y)are
                                                                        arethe
                                                                             thenthnthlevel
                                                                                        levelGaussian-blurred
                                                                                              Gaussian-blurredimage   imageand andGaussian
                                                                                                                                    Gaussiankernel,
                                                                                                                                               kernel,
                                  respectively,
                                    respectively, andandI(x,I(x,
                                                              y) is
                                                                  y)the  initial
                                                                     is the      image.
                                                                              initial      In the
                                                                                       image.   In proposed
                                                                                                    the proposedstructure,  the DoG
                                                                                                                      structure,  the pyramid    and
                                                                                                                                       DoG pyramid
                                  the
                                    andrequired    Gaussian-blurred
                                         the required      Gaussian-blurred image image
                                                                                      can becan
                                                                                              directly    createdcreated
                                                                                                   be directly      at the same
                                                                                                                            at thetime.
                                                                                                                                    sameHere,
                                                                                                                                          time.only
                                                                                                                                                 Here,
                                  the  2ndthe
                                    only    level
                                               2ndGaussian-blurred
                                                      level Gaussian-blurred image isimage
                                                                                         reserved    to be thetobasis
                                                                                                is reserved             of the
                                                                                                                   be the       feature
                                                                                                                            basis  of thedescriptor.
                                                                                                                                          feature de-
                                        Since a 7 × 7 mask is adopted mainly for the convolution in the image pyramid,
                                    scriptor.
                                  49 general-purpose
                                          Since a 7 × 7 mask 8-bit registers
                                                                     is adopted   will  be required
                                                                                     mainly    for the to   store the pixel
                                                                                                         convolution          data.
                                                                                                                         in the  imageForpyramid,
                                                                                                                                          an image49
                                  width   of  640  pixels,    a  large   number      of  8-bit  registers    needs   be  aligned
                                    general-purpose 8-bit registers will be required to store the pixel data. For an image         row   by rowwidth
                                                                                                                                                   for
                                  the  serial input    of pixel    data.  To   dramatically     reduce    the  number    of internal
                                    of 640 pixels, a large number of 8-bit registers needs be aligned row by row for the serial        registers, we
                                  propose
                                    input ofa pixel
                                               bufferdata.
                                                        structure     as shown in
                                                              To dramatically           Figurethe
                                                                                     reduce      3. number
                                                                                                    We utilize of Altera
                                                                                                                  internaloptimized
                                                                                                                            registers, RAM-based
                                                                                                                                        we propose a
                                  shift registers
                                    buffer          to form
                                            structure           a six-row
                                                          as shown            buffer3.array,
                                                                        in Figure             where Altera
                                                                                         We utilize     each row    includesRAM-based
                                                                                                                optimized      633 registers.  Once
                                                                                                                                             shift  reg-
                                  the  pixel
                                    isters todata
                                              formreaches
                                                      a six-row thebuffer
                                                                     last one,   i.e.,where
                                                                             array,    reg. 49,       7 × includes
                                                                                                 the row
                                                                                              each         7 convolution     can be accomplished
                                                                                                                       633 registers.  Once the pixel
                                  indata
                                     one clock
                                          reachescycle.
                                                     the A    similar
                                                           last         buffer
                                                                 one, i.e.,      structure
                                                                              reg.  49, the with
                                                                                             7 × 7 different
                                                                                                     convolutionsizescan
                                                                                                                       for be
                                                                                                                           theaccomplished
                                                                                                                                following building
                                                                                                                                                in one
FPGA Design of Enhanced Scale-Invariant Feature Transform with Finite-Area Parallel Feature Matching for Stereo Vision
Electronics 2021, 10, x FOR PEER REVIEW                                                                                                         4 of 24
 Electronics2021,
Electronics  2021, 10,
                  10,   x FOR PEER REVIEW
                      1632                                                                                                                      4 4ofof2424

                                  clock
                                   clockcycle.
                                         cycle.AAsimilar
                                                  similarbuffer
                                                           bufferstructure
                                                                  structurewith
                                                                            withdifferent
                                                                                  differentsizes
                                                                                             sizesfor
                                                                                                   forthe
                                                                                                       thefollowing
                                                                                                           followingbuilding
                                                                                                                      buildingblocks
                                                                                                                               blocks
                                  blocks requiring
                                  requiring          acan
                                                       mask  can   also be adopted   to efficiently reduce   the register number as
                                   requiring a mask can also be adopted to efficiently reduce the register numberasaswell
                                             a mask       also be  adopted  to efficiently  reduce  the  register number      wellasas
                                  well as shorten
                                  shorten          the buffer delay.
                                   shortenthe
                                            thebuffer
                                                bufferdelay.
                                                        delay.

                                    (a)
                                     (a)                                                                  (b)
                                                                                                           (b)
                  Figure
                   Figure2.
                  Figure 2.2.(a)
                               (a)Conventional
                             (a)   Conventionalstructure
                                  Conventional   structureand
                                               structure   and(b)
                                                          and  (b)the
                                                              (b)  theproposed
                                                                  the  proposedstructure
                                                                      proposed structure  ininrealizing
                                                                                 structurein  realizing  image
                                                                                                realizingimage pyramids.
                                                                                                          imagepyramids.
                                                                                                                pyramids.

                                  Figure
                                   Figure3.
                                  Figure 3.3.8-bit
                                              8-bitinput
                                             8-bit   inputbuffers
                                                   input   buffersfor
                                                          buffers  for777×××777mask.
                                                                  for          mask.
                                                                                mask.

                                  2.2.  Feature Descriptor
                                                 Descriptor
                                   2.2.Feature
                                         Feature Descriptor
                                        To  accelerate the generating speed of feature descriptors, the direction and gradient
                                          To accelerate the generating speed of feature descriptors, the direction and gradient
                                  of the detection point need to be quickly determined. Thus,             Thus, in this paper, the TI method
                                   of the detection point need to be quickly determined. Thus,inin this paper, the TI method
                                  is proposed
                                     proposed to  to promptly
                                                      promptly determine
                                                                    determinethe thedirection
                                                                                      directionofofthe thedetection
                                                                                                            detection point.CombinedCombined witha
                                   is proposed to promptly determine the direction of the detectionpoint.                  point. Combinedwith   with a
                                  a look-up
                                  look-up       table, the    gradient of the    detection point        can be easily      estimated. In general,
                                   look-uptable,
                                              table,the
                                                      thegradient
                                                            gradientofofthe
                                                                          thedetection
                                                                               detectionpoint
                                                                                           pointcan canbebeeasily
                                                                                                             easilyestimated.
                                                                                                                      estimated.InIngeneral,
                                                                                                                                        general,eight
                                                                                                                                                   eight
                                  eight different
                                  different          directions need       to be determined       to reducecomplexity
                                                                                                               the complexity of phase  the phase of
                                   differentdirections
                                                directionsneed needtotobebedetermined
                                                                             determinedtotoreducereducethethe complexityofofthe      the phaseofofthethe
                                  the detection
                                  detection    pointpoint    [9,16].
                                                       [9,16].   In   In this
                                                                    this      paper,
                                                                         paper,  the   the eight
                                                                                       eight        directions
                                                                                              directions     with with    unsigned
                                                                                                                     unsigned          gradients
                                                                                                                                  gradients   for   for
                                                                                                                                                    the
                                   detection point [9,16]. In this paper, the eight directions with unsigned gradients for the
                                  the detection
                                  detection        point have       been simplified    to four with     signed ones       to reducehardware
                                                                                                                                       the hardware
                                   detectionpoint
                                                pointhave
                                                        havebeenbeensimplified
                                                                       simplifiedtotofour
                                                                                      fourwith
                                                                                            withsigned
                                                                                                    signedones
                                                                                                             onestotoreduce
                                                                                                                        reducethethe hardwarecost.  cost.
                                  cost.  Although
                                  Although            the   resulting   gradient  size  increases    by  one  bit,  the  dimension     of the feature
                                                                                                                                              feature
                                   Althoughthe   theresulting
                                                      resultinggradient
                                                                    gradientsize
                                                                               sizeincreases
                                                                                     increasesby   byone
                                                                                                       onebit,
                                                                                                             bit,the
                                                                                                                   thedimension
                                                                                                                        dimensionofofthe   the feature
                                  descriptor can
                                  descriptor    can be
                                                     be effectively reduced
                                                                        reduced by half.
                                                                                      half. It is also helpful
                                                                                                         helpful to the FAP FAP feature matching
                                                                                                                                            matching
                                   descriptor
                                  for            can beeffectively
                                       implementation.     effectively reducedby   by half.ItItisisalso
                                                                                                    also helpfultotothe the FAPfeature
                                                                                                                                    feature matching
                                  for
                                   forimplementation.
                                        implementation.
                                         Figure 44 shows
                                        Figure      showsthe   theblock
                                                                    blockdiagram
                                                                            diagram of thefeature
                                                                                                featuredescriptor.
                                                                                                          descriptor.First, First, 3a×33×mask
                                                                                                                                           3 mask     is
                                  applied Figure
                                             to the4 2nd
                                                     shows  level    block diagramofofthe
                                                                theGaussian-blurred       the feature
                                                                                         image     to finddescriptor.
                                                                                                            the           First,abetween
                                                                                                                 differences     a 3 × 3 mask   isisap-
                                                                                                                                             adjacentap-
                                  plied
                                   pliedtoin
                                           tothe  2nd    level
                                                           levelGaussian-blurred        image
                                                                                         imagetotofindfindthethedifferences      between     adjacent
                                  pixels      xthe
                                                 and2ndy axes    toGaussian-blurred
                                                                     the center. The direction        and         differences
                                                                                                           gradient                between
                                                                                                                         of the center        adjacent
                                                                                                                                           point  will
FPGA Design of Enhanced Scale-Invariant Feature Transform with Finite-Area Parallel Feature Matching for Stereo Vision
scriptor of the detection point can be synthesized by finding and collecting g
                                of the four directions in each 4 × 4 partition area of a 16 × 16 mask.
Electronics 2021, 10, x FOR PEER REVIEW                                                                                                  5 of 2
  Electronics 2021, 10, 1632                                                                                                   5 of 24

                                  pixels in x and y axes to the center. The direction and gradient of the center point will the
                                 thendetermined
                                  be  be determined respectively
                                                       respectivelyby
                                                                    bythe
                                                                        the proposed
                                                                             proposed TI TIwith
                                                                                            withLUT
                                                                                                  LUT   method.
                                                                                                      method.    Finally,
                                                                                                              Finally,      the feature de
                                                                                                                       the feature
                                  scriptor ofofthe
                                 descriptor      thedetection
                                                     detection point
                                                                point can
                                                                      can bebe synthesized
                                                                               synthesizedby  byfinding
                                                                                                  findingand
                                                                                                           and collecting
                                                                                                             collecting     gradient sum
                                                                                                                         gradient
                                 sums
                                  of theoffour
                                           the four directions
                                                directions  in each  4 ×4 4×partition
                                                               in each       4 partitionarea
                                                                                         areaof
                                                                                              ofaa 16 ×16
                                                                                                   16 × 16 mask.
                                                                                                           mask.

                                Figure 4. Function blocks of the feature descriptor.

                                2.2.1. Triangular Identification and Look-Up Table
                                      In4.4.conventional
                                  Figure
                                 Figure      Function
                                           Function    blocks
                                                    blocks    approaches,
                                                               of the
                                                           of the     feature
                                                                  feature        the center
                                                                              descriptor.
                                                                          descriptor.          point, i.e., L2(x, y), of every 3 × 3 m
                                in2.2.1.
                                    the 2nd level      Gaussian-blurred          image     is the detection point. The variatio
                                   2.2.1.Triangular
                                          TriangularIdentification and and
                                                      Identification   Look-Up
                                                                           Look-UpTable Table
                                tection     point on x and y axes, i.e., Δp and Δq, are defined by the differences
                                        In conventional approaches, the center point, i.e., L2 (x, y), of every 3 × 3 mask shifting
                                adjacent In conventional   approaches,   thein
                                                                             center   point,
                                                                                           5. i.e.,point.
                                                                                                     L2(x, y), of every 3 × 3 mask shiftin
                                  in the 2ndpixel    values,    as shown
                                               level Gaussian-blurred   image  isFigure
                                                                                   the detection             The variations of the
                                  in the 2nd
                                 detection    level
                                           point on xGaussian-blurred      image
                                                      and y axes, i.e., ∆p and  ∆q, is
                                                                                     arethe detection
                                                                                         defined by thepoint.    The variations
                                                                                                         differences    between twoof the de
                                  tection point
                                 adjacent        on x and
                                          pixel values,     y axes,ini.e.,
                                                        as shown           Δp Δ
                                                                      Figure  and
                                                                              5. p =Δq,    x +defined
                                                                                      L2 (are 1, y ) − Lby
                                                                                                         2 ( x − 1, y )
                                                                                                             the differences between tw
                                  adjacent pixel values, as shown in Figure 5.
                                                                ∆p = L2 ( x + 1, y) − L2 ( x − 1, y)                              (2)
                                                                        Δp =ΔLq2 (=x +L1, (yx) ,−yL+  x −−1,Ly ) ( x, y − 1)
                                                                                                   2 (1)                                    (2
                                                                                        2                     2
                                                                ∆q = L2 ( x, y + 1) − L2 ( x, y − 1)                              (3)
                                                                          Δq = L2 ( x, y + 1) − L2 ( x, y − 1)                              (3

                                  Figure 5. Variations of the detection point on x and y axes in a 3 × 3 mask.
                                 Figure 5. Variations of the detection point on x and y axes in a 3 × 3 mask.
                                Figure   5. Variations of the detection point on x and y axes in a 3 × 3 mask.
                                      Since these two variations are perpendicular to each other, the resulting phase an
                                     Since these
                                 magnitude   whichtwo     variations
                                                     define            areand
                                                            the direction  perpendicular
                                                                               gradient of thetodetection
                                                                                                 each other,
                                                                                                          pointthe
                                                                                                                can resultin
                                                                                                                    be easil
                                 determined   by the following equations  [9].
                                magnitude which define the direction and gradient of the detection point
FPGA Design of Enhanced Scale-Invariant Feature Transform with Finite-Area Parallel Feature Matching for Stereo Vision
 Δq 
                                                                            θ ( x, y ) = arctan                                               (4)
                                                                                                 Δp 
Electronics 2021, 10, 1632
                                                                             m( x, y) = Δp 2 + Δq 2                                              6 of 24
                                                                                                                                                (5)

                                   Conventional approaches generally utilize the CORDIC algorithm to derive the re-
                             sults of Equations (4) and (5) [17]. However, time-consuming and tedious procedures
                                       Since due
                             would ensue      thesetotwo     variations
                                                        a large   numberare  ofperpendicular
                                                                                 iterations. In this to each    other,
                                                                                                           paper,   we the    resulting
                                                                                                                        propose            phaseTIand
                                                                                                                                      a simple
                                 magnitude      which    define    the  direction     and   gradient     of the   detection
                             method to determine the phase of detection points. Since only eight directions exist in            point   can  bethe
                                                                                                                                                 easily
                                 determined     by   the  following    equations      [9].
                             original definition, the direction of the detection point can be easily determined by the
                             value of Δp and Δq, which are two legs of a right-angled              
                                                                                                      ∆q triangle. For example, if Δp is
                                                                                                          
                                                                            θ ( x, y ) =
                             larger than Δq, and both are greater than zero, then the hypotenuse will be located in (4)
                                                                                           arctan
                                                                                                      ∆p
                             direction 0, as shown in Figure 6a. Thus, the direction of the detection point is classified
                             as phase 0 directly. Conversely, the direction of the detection                  point is classified as phase 1
                                                                                            q
                                                                            m( x, y) = ∆p2 + ∆q2                                                     (5)
                             if Δp is smaller than Δq. In a similar way, we can easily recognize other different directions
                             by usingConventional
                                        Table 1. The approaches
                                                         required conditions
                                                                         generallyare      alsothe
                                                                                       utilize   devised
                                                                                                     CORDIC to distinguish
                                                                                                                 algorithm todifferent
                                                                                                                                   derive thedirec-
                                                                                                                                                results
                             tions.
                                 of Absolute
                                    Equationsvalues
                                                  (4) and are(5)
                                                               considered     in reality
                                                                 [17]. However,             because the lengths
                                                                                       time-consuming                 of bothprocedures
                                                                                                              and tedious        legs cannotwouldbe
                             negative.
                                 ensue due to a large number of iterations. In this paper, we propose a simple TI method to
                                   For a right-angled
                                 determine     the phasetriangle,      if there
                                                             of detection         is a big
                                                                             points.         difference
                                                                                         Since   only eight between    the length
                                                                                                                directions     exist inofthe
                                                                                                                                          theoriginal
                                                                                                                                               two
                             legs,definition,
                                   the hypotenuse      length of
                                               the direction     can  bedetection
                                                                    the   approximated         as the
                                                                                       point can    be longer    one. If two legs
                                                                                                        easily determined        by thehave  equal
                                                                                                                                          value   of ∆p
                                 and ∆q,
                             lengths,  i.e.,which
                                            Δp = Δq,arethe
                                                         two hypotenuse     length will be
                                                               legs of a right-angled               2
                                                                                             triangle.  times
                                                                                                          For   leg one,
                                                                                                              example,    as
                                                                                                                           if ∆p
                                                                                                                               shown
                                                                                                                                   is    in
                                                                                                                                      largerFigure
                                                                                                                                              than   ∆q,
                                 and both
                             6b. Thus,   the are greaterofthan
                                             gradient             zero, thenpoint
                                                             the detection      the hypotenuse        will be located
                                                                                       cannot be directly       estimatedin direction
                                                                                                                              only by the 0, as shown
                                                                                                                                              com-
                                 in Figure
                             parison   of the6a.
                                              twoThus,      the direction
                                                    legs. Here,    a look-upoftablethe detection
                                                                                        is proposed   point   is classified
                                                                                                         for the  estimationasofphase       0 directly.
                                                                                                                                     the gradient
                                 Conversely,
                             of the  detectionthe     direction
                                                 point,           of the
                                                           as listed  in detection      pointthe
                                                                          Table 2. First,       is classified
                                                                                                    ratio, h, of   phase 1 if ∆p
                                                                                                                as variations         is smaller
                                                                                                                                  in two    axes isthan
                                 ∆q. In afor
                             evaluated     similar   way, the
                                              choosing      we can   easily recognize
                                                                correction    factor, K,other      different
                                                                                            as shown           directions
                                                                                                          in the  table. Theby gradient
                                                                                                                                 using Table     1. The
                                                                                                                                             of the
                                 required    conditions     are  also  devised     to  distinguish     different   directions.
                             detection point, m, will then be approximated by the length of the longer leg multiplied              Absolute     values
                             by aare  considered
                                   chosen           in reality
                                            correction     factor.because the lengths of both legs cannot be negative.

                                                 (a)                                           (b)
                             Figure  6. (a)6.The
                                 Figure           proposed
                                              (a) The      TI algorithm
                                                      proposed           for direction
                                                               TI algorithm            recognition;
                                                                             for direction          (b) The
                                                                                           recognition;     ratioratio
                                                                                                        (b) The   of hypotenuse  to the
                                                                                                                       of hypotenuse to the
                             leg has maximum
                                 leg has   maximum value when
                                                       value   Δp =∆p
                                                             when   Δq.= ∆q.

                             Table 1. Angle
                                Table       range
                                       1. Angle   condition
                                                range       andand
                                                      condition direction distribution.
                                                                    direction distribution.

                               Direction Direction          Conditions  Conditions                                      Pixel
                                                                                                                           PixelPhase
                                                                                                                                  Phaseθθ
                                   0         0     Δp > 0, Δq ∆p > 0, ∆q ≥ 0> &
                                                              ≥ 0 &   |Δp|     |Δq|
                                                                                 |∆p| > |∆q|                                   ∠θ
                                                                                                                         0°0◦≤ ≤ ∠θ˂  0, Δq ∆p > 0, ∆q > 0 &|Δq|
                                                              > 0 &   |Δp|   ≤  |∆p| ≤ |∆q|                             45°  ≤
                                                                                                                             ◦  ∠ θ ˂
                                                                                                                          45 ≤ ∠θ < 90◦90°
                                   2         2     Δp ≤ 0, Δq ∆p       ∆q > 0< &
                                                              > 0≤&0,|Δp|      |Δq|
                                                                                 |∆p| < |∆q|                             90◦≤ ≤
                                                                                                                       90°     ∠θ       135◦
                                                                                                                                 ∠θ˂  0≥&|Δq|
                                                              > 0 |∆q|                              180 ≤ ∠θ < 225
                                   4               Δp < 0, Δq ≥ 0 & |Δp| > |Δq|                                        180° ◦≤ ∠θ ˂ 225°◦
                                             5                ∆p < 0, ∆q < 0 & |∆p| ≤ |∆q|                              225 ≤ ∠θ < 270
                                   5         6     Δp < 0, Δq ∆p
                                                              < 0≥&0,|Δp|
                                                                       ∆q < 0≤ &
                                                                               |Δq|
                                                                                 |∆p| < |∆q|                           225°
                                                                                                                        270◦≤≤  ∠θ∠θ˂
FPGA Design of Enhanced Scale-Invariant Feature Transform with Finite-Area Parallel Feature Matching for Stereo Vision
Electronics 2021, 10, 1632                                                                                                         7 of 24
 Electronics 2021, 10, x FOR PEER REVIEW                                                                                             7 of 24

                                  gradient
                                        Noteof that
                                                the detection
                                                      the featurepoint,    m, willrequires
                                                                     descriptor     then beeight
                                                                                             approximated
                                                                                                 dimensions bytothe lengthdifferent
                                                                                                                  express  of the longer
                                                                                                                                    gradi-
                                  leg multiplied     by  a  chosen   correction  factor.
                                   ents of the eight directions. To reduce hardware resources, a representation having fewer
                                   dimensions is discussed in this paper. From the observation of Figure 6, phase 4 has an
                                  Table 2. Look-up
                                   opposite  direction tabletofor gradient
                                                                phase      correction.phase 5 has an opposite direction to phase 1, and
                                                                       0. Similarly,
                                   so on. Thus, the eight-direction representation can be simplified to a four-direction one to
                                            h = |∆p/∆q|                       Correction Factor (K)             Pixel Gradient (m)
                                   reduce the resulting dimensions. For example, if the detection point has an unsigned gra-
                                               h ≤ 0.25
                                   dient in phase     4, it will be changed to have1.00  a negative one in phase 0. As shown in Figure
                                           0.25 < h ≤ 0.52                             1.08
                                   7, the proposed       TI
                                           0.52 < h ≤ 0.65   with  LUT   approach    for detecting
                                                                                       1.17        the direction and gradient of the de-
                                   tection 0.65
                                            point    is  processed
                                                 < h ≤ 0.75           in parallel  to accelerate
                                                                                       1.22      the operational speed. The presented
                                   structure  is
FPGA Design of Enhanced Scale-Invariant Feature Transform with Finite-Area Parallel Feature Matching for Stereo Vision
Electronics 2021,
Electronics       10, x1632
            2021, 10,   FOR PEER REVIEW                                                                                                       88of
                                                                                                                                                of24
                                                                                                                                                   24

                                   will possess
                                    sentation  will64possess
                                                     dimensions,     which are which
                                                                64 dimensions,  half of are
                                                                                        thathalf
                                                                                             by conventional   methods, as
                                                                                                 of that by conventional    shown in
                                                                                                                          methods,  as
                                   Figure
                                    shown8b.in It  not only
                                               Figure        reduces
                                                         8b. It       thereduces
                                                                not only  required  devices
                                                                                  the        in implementation
                                                                                      required                   but also accelerates
                                                                                                 devices in implementation    but also
                                   the producing
                                    accelerates   thespeed   of feature
                                                      producing     speeddescriptors. These benefits
                                                                          of feature descriptors.      arebenefits
                                                                                                    These  also helpful to the
                                                                                                                   are also    subse-
                                                                                                                            helpful to
                                   quent  feature matching.
                                    the subsequent     feature matching.

                                     (a)                                                                   (b)
     Figure
     Figure 8.
             8. (a)
                 (a)Sixteen
                     Sixteen 44 ××4 4partitions
                                      partitionsininaa16
                                                       16××
                                                          1616
                                                             mask
                                                               maskfor
                                                                     forthe
                                                                          thefeature
                                                                               featuredescriptor
                                                                                       descriptorand
                                                                                                  and(b)
                                                                                                      (b)the
                                                                                                          thecorresponding
                                                                                                              correspondinggradient
                                                                                                                            gradientsum
                                                                                                                                     sumin
                                                                                                                                         in
     each 4 × 4  partitions.
     each 4 × 4 partitions.

                                   2.2.3.
                                    2.2.3.Normalization
                                           Normalization
                                         For
                                          For general
                                                  general images, images, large differences
                                                                                     differencesusually
                                                                                                  usuallyexist
                                                                                                           existamong
                                                                                                                 amongthe
                                                                                                                        thegradient
                                                                                                                            gradientsums,
                                                                                                                                        sums,  result-
                                                                                                                                             resulting
                                   ing  in different
                                    in different              dimensions
                                                          dimensions           ofof thefeature
                                                                                  the    featuredescriptor.
                                                                                                 descriptor. Consequently,
                                                                                                             Consequently, the feature
                                                                                                                                  featuredescriptor,
                                                                                                                                           descriptor,
                                   OF
                                    OF == (o
                                           (o11, , oo22, ,…,
                                                          . . . o, 64o), needs
                                                                      64 ), needstotobebenormalized
                                                                                           normalizedtotodecrease
                                                                                                          decreasethe
                                                                                                                    thescale
                                                                                                                        scale without
                                                                                                                              without significantly
                                                                                                                                         significantly
                                   altering               distributionamong
                                    altering the distribution                  amonggradient
                                                                                          gradient sums.
                                                                                                 sums. TheThe    normalization
                                                                                                            normalization  of theoffeature
                                                                                                                                     the feature  de-
                                                                                                                                            descriptor
                                   scriptor
                                    is achievedis achieved
                                                         in this paper  in thisby:
                                                                                paper by:
                                                                                                         OF
                                                                   F = ( f , f ,  , f ) = R × OF                                                (6)
                                                                F = ( f 1 , 1f 2 , 2· · · , f 6464) = R ×W                                        (6)
                                                                                                          W
                                   where W is the sum of all gradient sums in all dimensions of the feature descriptor.
                                   where W is the sum of all gradient sums in all 64        dimensions of the feature descriptor.
                                                                                     W =  oi                                                    (7)
                                                                                             64
                                                                                             i =1
                                                                                   W = ∑ oi                                                 (7)
                                         A scaling factor, R = 255, is chosen here for      better discrimination. That is, the resulting
                                                                                         i =1
                                   gradient sums of the feature descriptor lie between ±255. Thus, the required bit number
                                   of theAgradient   sum isRreduced
                                            scaling factor,     = 255, is from
                                                                          chosen13here
                                                                                    bitsfor
                                                                                         to better
                                                                                             9 bits discrimination.    That is,
                                                                                                     in the normalization     ofthe
                                                                                                                                  theresulting
                                                                                                                                      feature
                                    gradient  sums   of the  feature descriptor   lie between    ±
                                   descriptor. In the implementation of the presented normalization, we introducenumber
                                                                                                   255.  Thus,  the required   bit   an addi-of
                                    the gradient   sum   is reduced   from   13 bits  to 9  bits in the  normalization     of
                                   tional multiplier having the power of 2 approximating the sum of all gradients for achiev- the  feature de-
                                    scriptor.
                                   ing        In the
                                        a simple     implementation
                                                  operation.    Thus, theof normalized
                                                                            the presented     normalization,
                                                                                           feature   descriptorweF introduce   an additional
                                                                                                                    can be easily    obtained
                                    multiplier
                                   by           having
                                       only shifting  bitsthe
                                                            of power   of 2 approximating
                                                               the feature   descriptor. The the    sum of
                                                                                                realized     all gradients
                                                                                                           block  diagram of forthe
                                                                                                                                 achieving
                                                                                                                                     normali-a
                                    simple  operation.   Thus,
                                   zation is shown in Figure 9.  the normalized    feature   descriptor   F can be  easily obtained    by only
                                    shifting bits of the feature descriptor. The realized         block  diagram    of the normalization     is
                                                                                    R×2j            1
                                    shown in Figure 9.                         F=           × OF × j                                       (8)
                                                                                 R ×W2 j            21
                                                                            F=            × OF × j                                          (8)
                                                                                    W                2
FPGA Design of Enhanced Scale-Invariant Feature Transform with Finite-Area Parallel Feature Matching for Stereo Vision
Electronics 2021, 10, 1632                                                                                                                9 of 24
    Electronics 2021, 10, x FOR PEER REVIEW                                                                                                   9 of 24

                                      Figure9.9.Function
                                     Figure      Functionblocks
                                                          blocksofofthe
                                                                     thenormalization
                                                                         normalizationofofthe
                                                                                           thefeature
                                                                                               featuredescriptor.
                                                                                                       descriptor.

                                     2.3.
                                      2.3.Feature
                                           FeatureDetection
                                                   Detection
Electronics 2021, 10, x FOR PEER REVIEW                                                                                                        10 of 24
                                           In the  beginning, we use
                                            In the beginning, we  use three
                                                                        threelevels
                                                                               levelsofofDoGDoGimages
                                                                                                   imagesto to  establish
                                                                                                             establish  a 3Da 3D   space.
                                                                                                                                space. A 3 ×A3
                                     3 × 3 mask will then be applied to the three DoG images to define 27-pixel data for feature
                                      mask will then be applied to the three DoG images to define 27-pixel data for feature de-
                                     detection, as shown in Figure 10. Thus, center pixel of the 3 × 3 mask in the 2nd level DoG
                                      tection, as shown in Figure 10. Thus, center pixel of the 3 × 3 mask in the 2nd level DoG
                                     image, D2, is regarded as the detection point. Three kinds of detections, including extrema
                                                                      ( x, y +
                                      image, D2, is regarded as theDdetection     − D1 ( xThree
                                                                               1) point.      + 1)kinds
                                                                                          , ydetection,
                                                                                                     D (of    − 1) − D1 (and
                                                                                                        x, detections,
                                                                                                           y adopted       , y −processed
                                                                                                                         xincluding
                                                                                                                                 1) extrema
                                     detection, high contrast
                                      detection, high contrast   ≡ 3
                                                             Dyzdetection,
                                                                 detection,
                                                                             and corner            − 3 are
                                                                              and4 corner detection,     are adopted       and   processed
                                                                                                                                           in
                                                                                                                                            in    (16)
                                     parallel, as shown in Figure 11.                                              4
                                      parallel, as shown in Figure 11.
                                            Once the results of the three detections are all positive, the detection point becomes
                                      a feature point, and a detected signal will be sent to the feature descriptor and feature
                                      matching blocks for further processing.
                                            To evaluate the variation of the pixel values around the detection point, 3D and 2D
                                      Hessian matrix, H3×3 and H2×2, respectively, will be used in the high contrast and corner
                                      detections.
                                                                                             Dxx    Dxy 
                                                                                    H 2×2 =                                                    (9)
                                                                                             Dxy   Dyy 

                                                                                       Dxx Dxy Dxz 
                                                                                                         
                                                                              H 3×3 =  Dxy Dyy Dyz  ,                                  (10)
                                                                                                         
                                     Figure  10.The
                                     Figure 10.  Thedetection
                                                      detection  range
                                                               range      includes
                                                                      includes      27 D
                                                                                        pixels
                                                                                27 pixels      Din
                                                                                         xzin three   Dzz images
                                                                                                 yz three
                                                                                                    DoG     DoG images
                                                                                                                  under aunder  a 3 × 3where
                                                                                                                          3 × 3 mask,   mask, where
                                     the
                                     the orange
                                         orange  pixel
                                                pixel isis regarded
                                                         regarded as as
                                                                     the the detection
                                                                          detection      point.
                                                                                    point.
                                      where the definitions of the elements are listed as follows.
                                          Once the results of the D   xx ≡ D
                                                                    three     2 ( x + 1, y ) +are
                                                                             detections             x −positive,
                                                                                               D2 (all 1, y ) − 2 D2the
                                                                                                                     ( x, detection
                                                                                                                          y)          point becomes
                                                                                                                                                 (11)
                                     a feature point, and a detected signal will be sent to the feature descriptor and feature
                                     matching blocks for furtherD   processing.
                                                                      yy ≡ D2 ( x, y + 1) + D2 ( x, y − 1) − 2D2 ( x, y)                         (12)
                                          To evaluate the variation of the pixel values around the detection point, 3D and
                                     2D Hessian matrix, H3×3 and H2×2 , respectively, will be used in the high contrast and
                                                                        Dzz ≡ D3 ( x, y ) + D1 ( x, y ) − 2 D2 ( x, y )                          (13)
                                     corner detections.                                                   
                                                                                             Dxx Dxy
                                                          D2 ( x + 1, y + 1)H−2×     x + 1, D
                                                                                D22 (=       y − 1) DD2 ( x − 1, y + 1) − D2 ( x − 1, y − 1)     (9)
                                                    Dxy ≡                                      xy − yy                                           (14)
                                                                             4                                          4
                                                                                       Dxx Dxy Dxz
                                                                  D ( x + 1, y ) − D ( x + 1, y) D ( x − 1, y) − D1 ( x − 1, y)
                                                          Dxz ≡ 3 H3×3 =  1Dxy Dyy− 3Dyz ,                                                    (10)
                                                                                                                                                 (15)
                                                                                  4 Dxz Dyz Dzz                      4
FPGA Design of Enhanced Scale-Invariant Feature Transform with Finite-Area Parallel Feature Matching for Stereo Vision
Electronics 2021, 10, x FOR PEER REVIEW                                                                                                        10 of

                                                                  D3 ( x, y + 1) − D1 ( x, y + 1) D3 ( x, y − 1) − D1 ( x, y − 1)
 Electronics 2021, 10, 1632                               Dyz ≡                                  −                                  10 of 24
                                                                                                                                                  (1
                                                                                 4                               4

                                 where the definitions of the elements are listed as follows.

                                                         Dxx ≡ D2 ( x + 1, y) + D2 ( x − 1, y) − 2D2 ( x, y)                           (11)

                                                         Dyy ≡ D2 ( x, y + 1) + D2 ( x, y − 1) − 2D2 ( x, y)                           (12)
                                                             Dzz ≡ D3 ( x, y) + D1 ( x, y) − 2D2 ( x, y)                               (13)
                                           D2 ( x + 1, y + 1) − D2 ( x + 1, y − 1)  D ( x − 1, y + 1) − D2 ( x − 1, y − 1)
                                   Dxy ≡                                           − 2                                                 (14)
                                                              4                                       4
                                                      D3 ( x + 1, y) − D1 ( x + 1, y)  D ( x − 1, y) − D1 ( x − 1, y)
                                              Dxz ≡                                   − 3                                              (15)
                                                                     4                               4
                                                     D ( x, y + 1) − D1 ( x, y + 1)  D3 ( x, y − 1) − D1 ( x, y − 1)
                                  Figure 10. The ≡ 3
                                             Dyz detection   range includes 27 pixels
                                                                                    − in  three DoG images under a 3 × 3(16)
                                                                                                                          mask, whe
                                                                   4                                4
                                  the orange pixel is regarded as the detection point.

                                  Figure
                                 Figure 11.11. Building
                                            Building    blocks
                                                     blocks      offeature
                                                            of the  the feature detection.
                                                                           detection.

                                      Since both
                                        Since    thethe
                                              both    matrices
                                                        matricesareare
                                                                    symmetric,  only
                                                                       symmetric,     six six
                                                                                   only   elements needneed
                                                                                              elements   to betodetermined.
                                                                                                                 be determined. Fig
                                 Figure
                                  ure 12 shows the realized structure, where pixel inputs come fromcorresponding
                                        12 shows  the realized structure, where pixel  inputs come from the  the corresponding p
                                 positions shown in
                                  sitions shown   in Figure
                                                     Figure10.
                                                             10.
                                 2.3.1. Extreme Detection
                                      In extreme detection, the detection point is compared with the other 26 pixels in the
                                 3D mask range. If the pixel value of the detection point is the maximum or the minimum
                                 one, the detection point will be a possible feature point [18].
Electronics
Electronics    2021,
            2021, 10,10,x 1632
                          FOR PEER REVIEW                                                                                    11 of 24   11 of 24

                                           12.Hardware
                                    Figure 12.
                                    Figure     Hardwarestructure of the
                                                         structure      elements
                                                                     of the      in 2Dinand
                                                                            elements     2D3D Hessian
                                                                                            and       matrices.
                                                                                                3D Hessian   matrices.
                                    2.3.2. High Contrast Detection
                                    2.3.1.InExtreme  Detection
                                             conventional approaches, when the variation of the pixel values in three-dimension
                                    spaceInaround
                                             extremethedetection,  the detection
                                                        detection point   is greaterpoint
                                                                                     than is comparedofwith
                                                                                          a threshold      0.03, the
                                                                                                                  the other 26 pixels in the
                                                                                                                      high contrast
                                    feature
                                    3D  maskcanrange.
                                                 be assured
                                                       If the[11].
                                                              pixelHere,
                                                                    valueto of
                                                                             simplify the derivations
                                                                               the detection  point isand
                                                                                                        thesubsequent
                                                                                                             maximumhardware
                                                                                                                          or the minimum
                                    design,
                                    one, thean  approximated
                                              detection   point value of a1/32
                                                                will be         is chosen
                                                                            possible      to be
                                                                                      feature   the threshold
                                                                                              point  [18].      value. That is, the
                                    resulting pixel variation criterion is modified as:
                                    2.3.2. High Contrast Detection                           1
                                                                               | D (S)| >        ,                              (17)
                                         In conventional approaches, when the 32    variation of the pixel values in three-dimen-
                                    sion space around    the detection point is greater than a threshold of 0.03, the high contrast
                                    where S = [x y z]T denotes the 3D coordinate vector. The pixel variation can be expressed
                                    feature can  be  assured  [11].
                                    as the Maclaurin series and     Here, to simplify
                                                                approximated   by:     the derivations and subsequent hardware
                                    design, an approximated value of 1/32 is chosen to be the threshold value. That is, the
                                                                             T (0)     1 T ∂2 D (0)
                                    resulting pixel variation criterion is∂D
                                                                           modified   as:
                                                           D ( S ) ≈ D (0) +      ·S+ S ·             ·S                  (18)
                                                                                 ∂S     2 1     ∂S2
                                                                                 D(S ) >     ,                                  (17)
                                    where D(0) represents the pixel value of the detection32point. The transpose of the partial
                                    derivative
                                    where      [xD(0)
                                           S =of y z]Twith respect
                                                       denotes  the  S will
                                                                   to3D     be:
                                                                        coordinate   vector. The pixel variation can be expressed
                                    as the Maclaurin series
                                                           and approximated
                                                                 T     T    by:
                                                               ∂D (0)         ∂D (0)
                                                                           =          =T [ Dx Dy Dz 2],                         (19)
                                                                                 ∂S ∂D (0)       1     ∂ D (0)
                                                                                            ⋅ S + ST ⋅
                                                                ∂S
                                                                   D ( S ) ≈ D (0) +                           ⋅S                          (18)
                                                                                      ∂S         2      ∂S 2
                                    where
                                    where D(0) represents    D2pixel
                                                           the           y) − D
                                                                ( x + 1,value        x − detection
                                                                               of2 (the  1, y)     d − D2 fThe transpose of the partia
                                                                                                 D point.
                                                       Dx ≡                                    = 2                        (20)
                                    derivative of D(0) with respect( xto    − ( xbe:
                                                                       + 1S)will  − 1)               2
                                                                  D2 ( x, y + 1) −T D2 ( x,Ty − 1)
                                                          Dy ≡           ∂D(0)       ∂D (0) = D2 b − D2 h ,                   (21)
                                                                        (y + 1) − (y − 1) = [ Dx Dy2 Dz ]
                                                                         ∂S 
                                                                                    =
                                                                                         ∂S
                                                                                                                                           (19)
                                                                        D3 ( x, y) − D1 ( x, y)  D e − D1 e
                                    where                  and Dz ≡                             = 3         .                   (22)
                                                                                 3−1                 2
                                         Figure 13 shows D         D2 ( x +hardware
                                                            the ≡realized   1, y) − D2 ( of
                                                                                         x −the
                                                                                             1, ypartial − D2f
                                                                                                 ) D2dderivative. The second partial
                                                                                                  =                                        (20)
                                    derivative of D(0) is the xsame as the
                                                                         ( x + 1) − ( x − 1)
                                                                              3D  Hessian    matrix.     2
                                                                  D2 ( x, y + ∂1)2 D
                                                                                   − (D02)( x, y − 1) D2b − D2h
                                                           Dy ≡                       2
                                                                                            = H3×3 =                            (23)       (21)
                                                                        ( y + 1) − ( y − 1)
                                                                                  ∂S                      2

                                                              and Dz ≡ D3 ( x , y ) − D1 ( x , y ) = D3e − D1e .                           (22)
                                                                                      3 −1                2
                                         Figure 13 shows the realized hardware of the partial derivative. The second partia
                                    derivative of D(0) is the same as the 3D Hessian matrix.
                                                                                      ∂ 2 D (0)
                                                                                                = H 3×3                                    (23)
                                                                                        ∂S 2
Electronics  2021,
     Electronics    10,10,x 1632
                 2021,      FOR PEER REVIEW                                                                                                 12 of 24

                                    Figure
                                     Figure    Hardware
                                           13.13.       architecture
                                                  Hardware           of partialofderivative
                                                              architecture                  of D(0).
                                                                                  partial derivative           of D(0).
                                          Let the first derivative of Equation (18) with respect to S equals zero.
                                              Let the first derivative of Equation (18) with respect to S equals zero.
                                                                                 D 0 (S) = 0                                                   (24)
                                                                                                     D′( S ) = 0
                                          The most variation of the pixel values around the detection point can be found at:
                                              The most variation of the pixel values around the detection point can be fou
                                                                                          ∂D (0)
                                                                       S = −( H3×3 )−1 ·           − 1 ∂D (0)
                                                                                                                                               (25)
                                                                                    S = −(H ∂S
                                                                                              3×3 ) ⋅
                                                                                                        ∂S
                                          Substituting Equation (25) into Equation (18), we obtain:
                                   ∂D T (0) Substituting   ∂D (Equation (25) into ∂DEquation
                                                                                     (0) T       (18), we obtain:
                                                                                       
                                                                0) 1                                            ∂D (0)
                 D ( S ) = D (0) −          · ( H3×3 )−1 ·         + ( H3×3 )−1 ·          · H3×3 · ( H3×3 )−1 · T
                                                                 T
                                     ∂S                     ∂∂SD (0)2           ∂D∂S(0) 1               ∂D(0) ∂S                             −1 ∂D(
                                          D( S ) = D(0) −
                                                         1 ∂D T (0)
                                                                   ⋅(H  3×3 )−1 ⋅            +
                                                                                              3×3        ( H )−1 ⋅        ⋅ H 3×3 ⋅ ( H 3×3 ) ⋅
                                               = D (0) − ∂S         · ( H3×3 )−1 · ∂S . 2                             ∂S                         ∂S
                                                                                   ∂D (0)
                                                                                                                                         (26)
                                                         2 ∂S                       ∂S
                                                                                            T
                                                                                       1 ∂Dwe
                                          Substituting Equation (26) into Equation (17),      (0)
                                                                                               have:                  −1    ∂D(0) .
                                                                           = D(0) −                         ⋅ ( H 3×3 ) ⋅
                                                                  1 ∂D T (0)
                                                                                         2       ∂S
                                                                                            ∂D (0)   1
                                                                                                                             ∂S
                                                             D (0) −         · ( H3×3 )−1 ·        >
                                              Substituting Equation  (26)
                                                                  2 ∂S     into   Equation    (17),
                                                                                             ∂S     we
                                                                                                     32 have:
                                                                                             T
                                                                          ∂D T (01) ∂D (0)      ∂D (−01) ∂D(0) 1
                                                                        D(0) − · ( H3×3 )⋅−(1H· 3×3 ) ⋅ > 1 >
                                                          ⇒ 16 × 2D (0) −                                                                      (27)
                                                                            ∂S 2 ∂S               ∂S      ∂S   32
                                         Since the matrix inversion, (H3×3 )−1 , requires lots of dividers in implementation, the
                                                                                            T
                                                                    16 × 2 D(0) − ∂D (0) ⋅ ( H )−1 ⋅ ∂D(0) 3×>31
                                    resulting hardware cost would be high. Thus, the        adjugate matrix, Adj(H ), and the
                                                                                                      3×3
                                    determinant, det(H3×3 ), are utilized here to accomplish the matrix       inversion instead of
                                                                                           ∂S                   ∂S
                                    using dividers.
                                           Since the matrix inversion,        1 3×3Adj  ( H3×3 )
                                                                     ( H3×3 )−(H = )−1, requires ,   lots of dividers in implementa
                                                                                                                              (28)
                                                                                    det( H3×3 )
                                      resulting hardware cost would be high. Thus, the adjugate matrix, Adj(H3×3), an
                                    where
                                     terminant,  det(H3×3), are utilized here to accomplish the matrix inversion instead
                                      dividers.                Dyy Dyz         Dxy Dxz        Dxy Dxs
                                                                                                       
                                                           + Dyz                    −                +
                                                                           Dzz         Dyz Dzz
                                                                                                  Adj
                                                                                                         D
                                                                                                      H3×D3 yy (       )      Dyz       
                                                                                            −1 D
                                                                                     (           )
                                                                                                                                       
                                                             Dxy
                                                           − Dxz
                                                                           Dyz
                                                                                     +H3×3
                                                                                       D xx    = xz         xx,               Dxz       
                                            Adj( H3×3 ) =                                            −
                                                                                       Dxz Dzzdet H3×3Dxy      (       )                       (29)
                                                                                                                                        
                                                                          Dzz                                                Dyz       
                                                                                                                                        
                                                             Dxy          Dyy         Dxx Dxy           Dxx                  Dxy       
                                      where                 +                        −                +
                                                              Dxz          Dyz         Dxz Dyz           Dxy                  Dyy

                                    and                                            D yy D yz         Dxy             Dxz         Dxy   Dxs 
                                                                                   +D     D      D−                          +              
                                                                                   Dxxyz Dxyzz     xzD
                                                                                                        yz            Dzz         Dyy   Dyz 
                                                                  det( H3×3 ) =  Dxy Dyy Dyz                                                 (30)
                                                                                   DDxzxy DDyzyz DzzDxx              Dxz         Dxx   Dxz 
                                                                  Adj ( H 3×3 ) =  −              +                          −
                                                                                      Dxz Dzz         Dxz             Dzz         Dxy   Dyz 
                                                                                                                                            
                                                                                   Dxy Dyy           Dxx             Dxy         Dxx   Dxy 
                                                                                  +               −                          +              
                                                                                   Dxz D yz         Dxz             Dyz         Dxy   Dyy 

                                      and
Electronics 2021, 10, x FOR PEER REVIEW                                                                                           13 of 24

                                        Figure 14 shows the realized architecture of the determinant and adjugate matrices.
Electronics 2021, 10, 1632                                                                                                       13 of 24
                                   Since the adjugate matrix is symmetric, only six elements need to be processed in the im-
                                   plementation. The determinant of H3×3 can be obtained by repeatedly using the first row
                                   of the adjugate matrix combined with elements in the first column of the determinant to
                                   reduce  the hardware
                                        Figure  14 showscost. Thus, Equation
                                                         the realized             (27) can
                                                                        architecture     of be
                                                                                             therewritten
                                                                                                  determinantas: and adjugate matrices.
                                   Since the adjugate matrix is symmetric,         onlyAdj
                                                                             ∂DT (0)     six( Helements    need to be processed in the
                                                                                                3×3 ) ∂D (0)
                                                              16 × 2 D(0)of
                                   implementation. The determinant         − H3×3 can ⋅ be obtained  ⋅    by>repeatedly
                                                                                                                1        using the first
                                                                                 ∂S     det( H 3×3 ) ∂S
                                   row of the adjugate matrix combined with elements in the first column of the determinant       (31)
                                   to reduce the hardwarecost.   Thus,  Equation     (27)  can   be   rewritten
                                                             16 × 2 D (0) ⋅ det ( H ) − Mul > det ( H ) ,         as:
                                                                                       3×3               3×3

                                   where                                    ∂D T (0)Adj( H3×3 ) ∂D (0)
                                                          16 × 2D (0) −                 ·            ·          >1
                                                                          ∂
                                                                          ∂SD   (0)Tdet( H3×3 ) ∂D(0)    ∂S                            (31)
                                                                 Mul =             ⋅ Adj ( H 3×3 ) ⋅                                (32)
                                                       ⇒ 16 × |2D (0) · det(∂HS3×3 ) − Mul      | > |det∂S ( H 3×3 )| ,
                                       That is, the detection point will have a high-contrast feature and could be regarded
                                  where
                                  as a possible feature point when Equation
                                                                        ∂D T (0(31)
                                                                                ) is satisfied. Since   ∂D (0the
                                                                                                              ) condition for the high-
                                  contrast feature detection⇒isMul
                                                                less =
                                                                     complicated
                                                                           ∂S
                                                                                  · Adj  ( H3the
                                                                                      than            ·
                                                                                               ×3 )conventional
                                                                                                         ∂S
                                                                                                                                       (32)
                                                                                                                      ones, the resulting
                                  hardware circuits will be simpler for achieving a higher operating speed [12].

                                   Figure 14.
                                   Figure 14. Hardware architecture of the
                                                                       the determinant
                                                                           determinant and
                                                                                       andadjugate
                                                                                           adjugatematrices.
                                                                                                   matrices.

                                   2.3.3.That  is, the
                                          Corner       detection point will have a high-contrast feature and could be regarded
                                                    Detection
                                   as a possible
                                         In this paper, thepoint
                                                    feature  Harriswhen
                                                                    cornerEquation
                                                                              detection(31)     is satisfied.
                                                                                           is adopted      [19], Since the condition for the
                                   high-contrast feature detection is less complicated than the conventional ones, the resulting
                                                                            2                    2
                                   hardware circuits will be simpler r × tr    H 2×2 ) > (r +a1)higher
                                                                         for( achieving            ⋅ det( Hoperating
                                                                                                            2×2 )    speed [12].       (33)

                                  whereCorner
                                  2.3.3.   the parameter
                                                  Detectionr is the ratio of two eigenvalues of H2×2 and r > 1, and tr(H2×2) and
                                  det(H2×2) respectively represent the trace and the determinant of H2×2, which are given by:
                                        In this paper, the Harris corner detection is adopted [19],
                                                                     tr ( H 2×2 ) = Dxx + Dyy                                        (34)
                                                                 r × tr2 ( H2×2 ) > (r + 1)2 · det( H2×2 )                             (33)
                                                                                  Dxx Dxy
                                                          and det ( H ) =                    = Dxx Dyy of  2
                                                                                                       − DHxy                        (35)
                                  where the parameter r is the        ratio
                                                                          2×2 of two eigenvalues
                                                                                  Dxy Dyy                    2×2 and r > 1, and tr(H   2×2 )
                                  and det(H2×2 ) respectively represent the trace and the determinant of H2×2 , which are
                                  givenThat
                                         by: is, the detection point has a corner feature if the inequality Equation (33) is sat-
                                  isfied. An empirical value, r = 10,tris( Hchosen       here for acquiring a better result. Figure (34)
                                                                                 2×2 ) = Dxx + Dyy
                                                                                                                                       15
                                  shows the hardware architecture of the corner detection.
                                        Once the results   of det
                                                               the(three            Dxx are
                                                                                          Dxyall positive, the detection
                                                         and        H2×2 )detections
                                                                              =                  = Dxx Dyy − Dxy   2        point will (35)
                                                                                                                                       be
                                  logically regarded as a feature point. During     D     Dyyserial input
                                                                                      xy the                of pixels, the production of
                                  the feature
                                        That is,descriptor  and the
                                                  the detection        feature
                                                                   point    has apoint   detection
                                                                                    corner  featureproceeds     until no Equation
                                                                                                     if the inequality   image pixel
                                                                                                                                   (33)re-is
                                  mains.
                                  satisfied. An empirical value, r = 10, is chosen here for acquiring a better result. Figure 15
                                   shows the hardware architecture of the corner detection.
Electronics 2021, 10, 1632                                                                                                                                  14 of 24
    Electronics 2021, 10, x FOR PEER REVIEW                                                                                                      14 of 24

    Electronics 2021, 10, x FOR PEER REVIEW                                                                                                       14 of 24

                                     Figure
                                      Figure 15. Hardwarestructure
                                             15. Hardware structureofof the
                                                                      the   corner
                                                                          corner   detection.
                                                                                 detection.

                                           Once the Parallel
                                      3. Finite-Area  results Matching
                                                              of the three
                                                                        fordetections  are all positive, the detection point will be
                                                                            Stereo Vision
                                     logically
                                           A double E-SIFT with FAP feature matchingserial
                                                regarded as a feature  point. During the  systeminput of pixels,for
                                                                                                  is proposed    theuse
                                                                                                                     production
                                                                                                                        in stereo of the
                                     feature
                                      vision, as shown in Figure 16. The stereo vision captures images by parallel cameras remains.
                                              descriptor  and  the feature point detection proceeds   until no  image   pixel for
                                      the left15.
                                      Figure   and  right E-SIFTs
                                                  Hardware         respectively
                                                           structure of the cornerto find the feature points and the corresponding
                                                                                   detection.
                                     3.feature
                                         Finite-Area  Parallel
                                               descriptors,     Matching
                                                            FL and         forsame
                                                                   FR. At the  Stereo Visionthe corresponding coordinates will
                                                                                   moment,
                                       3. Finite-Area
                                      be  output    by aParallel
                                             A double      counterMatching
                                                           E-SIFT    with    FAPfor
                                                                      triggered    by  Stereo
                                                                                       the
                                                                                    feature     Visionsignals,
                                                                                           detected
                                                                                               matching            dtcL and
                                                                                                              system            dtcR, fromfor
                                                                                                                          is proposed          theuse
                                                                                                                                                   twoin  E- stereo
                                      SIFTs.AThen,
                                     vision,   as      theE-SIFT
                                                   shown
                                                double      stereo  matching
                                                             in Figure
                                                                    with16.FAPThepoints,  MX
                                                                                     stereo
                                                                                  feature       L, MYL
                                                                                              vision
                                                                                            matching    , MX  R, and
                                                                                                       captures
                                                                                                          system    is MY
                                                                                                                    images  R, will
                                                                                                                        proposed      beforfound
                                                                                                                                by parallel       out
                                                                                                                                             usecameras
                                                                                                                                                 in     andfor the
                                                                                                                                                     stereo
                                      recorded
                                     left  and as
                                       vision,     by
                                                right the
                                                    shown   proposed
                                                         E-SIFTs         FAP
                                                                   respectively
                                                              in Figure        feature
                                                                          16. The        matching.
                                                                                     to find
                                                                                    stereo             Since points
                                                                                               the feature
                                                                                             vision   captures theimages
                                                                                                                     image
                                                                                                                         andbytransmission
                                                                                                                                the              could for
                                                                                                                                      corresponding
                                                                                                                                  parallel    cameras     befeature
                                      interrupted
                                       the left anddue
                                     descriptors,      right
                                                      FL    toE-SIFTs
                                                           and the
                                                                 FR.finite  bandwidth
                                                                      Atrespectively
                                                                          the             of find
                                                                                         to
                                                                               same moment,   SDRAM the   or noisepoints
                                                                                                     the feature      coupling,
                                                                                                          corresponding      and   athedata
                                                                                                                                coordinates   valid  signal
                                                                                                                                           corresponding
                                                                                                                                                  will  be output
                                      is
                                     by  added
                                       feature    to alarm
                                                descriptors,
                                          a counter           allFL
                                                        triggered building
                                                                     and     blocks
                                                                           FR.
                                                                      by the   At thetosame
                                                                               detected  prevent     abnormal
                                                                                                 moment,
                                                                                           signals,    dtcLthe    termination
                                                                                                                     dtcR , fromor
                                                                                                              andcorresponding       the any  discontinu-
                                                                                                                                         coordinates
                                                                                                                                            two         will Then,
                                                                                                                                                 E-SIFTs.
                                      ous
                                       be   situation
                                           output    by from
                                                         a     corrupting
                                                           counter           correct
                                                                      triggered    by  output
                                                                                       the       messages.
                                                                                            detected     signals,  dtc    and   dtc   ,  from  the  two   E-
                                     the stereo matching points, MXL , MYL , MXR , and MYR , will be found out and recorded
                                                                                                                       L            R                            by
                                       SIFTs.  Then,   the  stereo   matching     points,  MX   L, MYL, MXR, and MYR, will be found out and
                                     the proposed FAP feature matching. Since the image transmission could be interrupted
                                       recorded
                                     due    to theby   the proposed
                                                    finite   bandwidth    FAPof feature
                                                                                 SDRAM    matching.
                                                                                             or noiseSince     the image
                                                                                                          coupling,      a datatransmission
                                                                                                                                   valid signal  could    be
                                                                                                                                                      is added   to
                                       interrupted    due   to  the finite  bandwidth      of SDRAM        or noise   coupling,
                                     alarm all building blocks to prevent abnormal termination or any discontinuous situation       a   data  valid  signal
                                       is added to alarm all building blocks to prevent abnormal termination or any discontinu-
                                     from corrupting correct output messages.
                                      ous situation from corrupting correct output messages.

                                      Figure 16. Building blocks of the proposed E-SIFT and FAP feature matching for stereo vision.

                                      Figure 16. Building blocks of the proposed E-SIFT and FAP feature matching for stereo vision.
                                     Figure 16. Building blocks of the proposed E-SIFT and FAP feature matching for stereo vision.
Electronics 2021, 10, x FOR PEER REVIEW                                                                                                            15 of 24
Electronics 2021, 10, 1632                                                                                                                           15 of 24

                                     3.1. Finite-Area Parallel Feature Matching
                                    3.1. Finite-Area Parallel Feature Matching
                                            Suppose two cameras, CL and CR, are set up in parallel at the same height without
                                           Suppose
                                     any tilt  angle two    cameras,
                                                      difference,      CLthree
                                                                     and    and C  R , are set
                                                                                  objects,   P1, up
                                                                                                  P2, in P3,parallel    at the
                                                                                                               are placed    in same     height
                                                                                                                                 line with   thewithout
                                                                                                                                                 right cam-any
                                    tilt
                                     era, as shown in the top view in Figure 17a [20]. In theory, the projections of the sameas
                                         angle  difference,   and   three  objects,  P 1 , P 2 , P 3 , are    placed   in line   with   the right camera,
                                    shown     in thetwo
                                     object onto     topimage
                                                           view in   Figure
                                                                   planes     17abe[20].
                                                                            will           In theory,
                                                                                      on the     same row.   the projections
                                                                                                                    However, based of the same    object onto
                                                                                                                                           on Epipolar     ge-
                                    two   image   planes    will  be on  the  same   row.    However,           based   on  Epipolar
                                     ometry [15], the imaging of P2 and P3 in the right image could disappear due to the shading          geometry    [15], the
                                    imaging     of P2 by
                                     effect caused     andP1P. 3As
                                                                 in shown
                                                                    the rightin image
                                                                                 Figurecould        disappear
                                                                                           17b, three                 due to
                                                                                                              imagings,     P1Lthe  shading
                                                                                                                                , P2L , and P3Leffect  caused
                                                                                                                                                , appear    in
                                    by   P  . As shown    in  Figure   17b,  three  imagings,
                                           1 image, but only one projection, P1R, in the right
                                     the left                                                          P 1L ,  P   , and  P    ,
                                                                                                                 one. Therefore,
                                                                                                                2L          3L   appear    in the  left image,
                                                                                                                                       to avoid the feature
                                    but   onlyofone
                                     points      the projection,
                                                      left and rightP1Rimages
                                                                        , in the from
                                                                                  right error
                                                                                           one. Therefore,
                                                                                                    matching, to        avoid the feature
                                                                                                                     a minimum                 points
                                                                                                                                       threshold   value offor
                                                                                                                                                            the
                                    left and rightwill
                                     comparison      images     from error
                                                          be chosen    in the matching,
                                                                                proposeda FAP   minimum feature  threshold
                                                                                                                    matching.  value for comparison will
                                    be chosen in the proposed FAP feature matching.

                                        (a)                                                                          (b)
                         Figure17.
                        Figure  17. (a)
                                     (a) Top
                                         Topview
                                             view and
                                                  and (b)
                                                       (b) front
                                                           front view
                                                                 view of
                                                                      of stereo
                                                                         stereo vision
                                                                                vision with
                                                                                       with Epipolar
                                                                                            Epipolar phenomenon.
                                                                                                     phenomenon.

                                           In reality,
                                          In  reality, the
                                                       the object projections
                                                                   projectionson
                                                                               ontwo
                                                                                  twocameras
                                                                                       camerascannot
                                                                                                cannotbebe
                                                                                                         ononthe same
                                                                                                               the same row  due
                                                                                                                           row  dueto ato
                                    aslight
                                      slightdifference
                                              differenceininthe
                                                             thetilt
                                                                  tiltangle
                                                                       anglebetween
                                                                             betweenthe
                                                                                     thetwo
                                                                                         twocameras.
                                                                                              cameras.The
                                                                                                        Theresulting
                                                                                                             resultingfeature
                                                                                                                         featurepoint
                                                                                                                                 point
                                     matching would
                                    matching     would be inaccurate. However, too too many
                                                                                       many feature
                                                                                             featurepoints
                                                                                                    pointswould
                                                                                                            wouldconsume
                                                                                                                    consumeaalargelarge
                                     number of
                                    number     of resources.
                                                  resources. On the other hand,
                                                                             hand, matching
                                                                                   matchingaccuracy
                                                                                            accuracywill
                                                                                                       willdecrease
                                                                                                            decreaseififthere
                                                                                                                          thereare
                                                                                                                                arenot
                                                                                                                                     not
                                     sufficient  feature points.
                                    sufficient feature points.

                                    3.2.
                                    3.2. Hardware
                                         Hardware Design
                                    3.2.1.
                                    3.2.1. Parallel
                                           Parallel Arrangement for Feature
                                                    Arrangement for Feature Data
                                                                            Data
                                           In
                                            In stereo   vision,the
                                               stereo vision,     thescenery
                                                                       sceneryof of
                                                                                 thethe
                                                                                      leftleft camera
                                                                                           camera       is shifted
                                                                                                   is shifted  towardtoward   the
                                                                                                                        the left    left slightly
                                                                                                                                 slightly  com-
                                    compared      to  the  right camera.   Some   objects   would  not  be captured
                                     pared to the right camera. Some objects would not be captured in the right camerain the right   cameraandand
                                    vice
                                     vice versa,
                                           versa, asas shown
                                                        shown in in the
                                                                     the pink
                                                                         pink and
                                                                              and blue
                                                                                     blue areas   of Figure
                                                                                           areas of  Figure 18.
                                                                                                              18. Thus,
                                                                                                                   Thus, while
                                                                                                                          whilepixels,
                                                                                                                                 pixels,which
                                                                                                                                          which
                                    begin
                                     begin from
                                              from the
                                                     the top
                                                          top left
                                                               left of
                                                                    of the
                                                                       the image,   are entered
                                                                           image, are    entered into
                                                                                                  into the
                                                                                                        the system,
                                                                                                             system, mismatch
                                                                                                                       mismatchcouldcouldoccur
                                                                                                                                            occur
                                    due
                                     due toto the
                                               the visual
                                                   visual angle
                                                             angle difference
                                                                    difference between
                                                                               between two  two the
                                                                                                the cameras.
                                                                                                     cameras. Therefore,
                                                                                                                 Therefore,aalittle
                                                                                                                               littledelay
                                                                                                                                      delaywill
                                                                                                                                              will
                                    be  added     in  the  right-image    input  path   to  ensure  the  feature  points  found
                                     be added in the right-image input path to ensure the feature points found from the left       from  the    left
                                    image
 Electronics 2021, 10, x FOR PEER REVIEW      can  be  matched     with  the ones   from   the right  image,   as shown   in
                                     image can be matched with the ones from the right image, as shown in the green area of  the  green   area
                                                                                                                                         16  of   of
                                                                                                                                                 24
                                    the  figure.
                                     the figure.
                                            Figure 19 shows the building blocks of the proposed FAP feature matching. A de-
                                     multiplexer transfers the serial feature point data into a parallel form by the left detected
                                     signal, dtcL. The feature descriptors and the corresponding coordinates will be stored into
                                     registers. Then, sixteen feature points data of the left image will be compared with the
                                     feature point of the right image simultaneously when the right detected signal, dtcR, is
                                     received.

           Figure18.
          Figure  18.Different
                     Differentscenes
                               scenes are
                                      are captured
                                          captured in
                                                   in stereo
                                                      stereo vision
                                                             vision due
                                                                    due to
                                                                        to the
                                                                           the visual
                                                                               visual angle
                                                                                      angle difference
                                                                                            difference between
                                                                                                       between two
                                                                                                                twocameras.
                                                                                                                    cameras.
Electronics 2021, 10, 1632                                                                                                              16 of 24

                                           Figure 19 shows the building blocks of the proposed FAP feature matching. A demul-
                                    tiplexer transfers the serial feature point data into a parallel form by the left detected signal,
                                    dtcL . The feature descriptors and the corresponding coordinates will be stored into registers.
                                    Then, sixteen feature points data of the left image will be compared with the feature point
                                    of the right image simultaneously when the right detected signal, dtctwo
            Figure 18. Different scenes are captured in stereo vision due to the visual angle difference between R
                                                                                                                   , iscameras.
                                                                                                                        received.

                 Figure
                 Figure 19.
                        19. Building
                            Building blocks
                                     blocks of
                                            of the
                                               the proposed finite-area parallel
                                                   proposed finite-area parallel (FAP)
                                                                                 (FAP) feature
                                                                                       feature matching
                                                                                               matching for
                                                                                                         for stereo
                                                                                                             stereovision.
                                                                                                                    vision.

                                   3.2.2. Feature Matching Algorithm
                                   3.2.2. Feature Matching Algorithm
                                          For a street view image of 640 × 480 pixels, about 2500 feature points can be detected.
                                          For a street view image of 640 × 480 pixels, about 2500 feature points can be detected.
                                   Every row
                                   Every     row ofof the
                                                       the image
                                                            imagemaymayinclude
                                                                          include5.25.2feature
                                                                                         featurepoints
                                                                                                  pointsinin
                                                                                                           average.
                                                                                                              average. Thus,   to improve
                                                                                                                           Thus,   to improvethe
                                   accuracy     of feature   matching,   16  feature points,  which   are equivalent    to
                                   the accuracy of feature matching, 16 feature points, which are equivalent to an area of an area  of approx-
                                   imately three rows
                                   approximately        threeofrows
                                                                pixels, will be will
                                                                     of pixels,  chosen   for feature
                                                                                     be chosen         matching
                                                                                                  for feature      in this in
                                                                                                               matching     paper.
                                                                                                                               this paper.
                                          In  general,    for the  same  object,  the  projection  in  the left image    will
                                          In general, for the same object, the projection in the left image will be located    be located   in a
                                                                                                                                      in a little
                                   little  more   to  the  right  position  than  the  corresponding     one  in the  right. An
                                   more to the right position than the corresponding one in the right. An exception is that if     exception   is
                                   that  if the object   is far away  from   the cameras,   the resulting  x coordinates    of
                                   the object is far away from the cameras, the resulting x coordinates of the projections on   the projections
                                   on two
                                   two        images
                                         images     willwill
                                                          be be  almost
                                                              almost    equal.
                                                                      equal.     Thus,the
                                                                               Thus,    thecomparison
                                                                                            comparisonwill willproceed
                                                                                                                proceedonlyonlythe
                                                                                                                                 the following
                                                                                                                                     following
                                   condition is
                                   condition     is satisfied:
                                                    satisfied:
                                                                                         x R x≤ ≤x Lx, ,                 (36)
                                                                                              R      L                   (36)
                                   where xR and xL represent the right-image and left-image x-coordinates, respectively. To
                                   where xR and xL represent the right-image and left-image x-coordinates, respectively. To
                                   quickly find out the matching points, the 16 feature descriptors of the left image will be
                                   quickly find out the matching points, the 16 feature descriptors of the left image will be
                                   simultaneously subtracted from the feature descriptor of the right one. The resulting
                                   simultaneously subtracted from the feature descriptor of the right one. The resulting dif-
                                   differences in 64 dimensions between each two feature descriptors are then summed
                                   ferences in 64 dimensions between each two feature descriptors are then summed together
                                   together for comparison.
                                   for comparison.
                                                                         64 64
                                                                 ηk =  =
                                                                    ηk ∑ | f k_Li   Li −
                                                                               f k _−                  = 0,1,1,2,2,. ....,
                                                                                       f Rif|Ri, k, =k 0,                  15 ,
                                                                                                                       . , 15,             (37)
                                                                                                                                           (37)
                                                                         i =1i =1

                                   where fk_Li , i = 1, 2, . . . , 64, is the element of the k-th feature descriptor in the left image,
                                   and fRi is the element of the feature descriptor of the right one. If the minimum sum is
                                   smaller than the half of the second minimum one, the feature points causing the minimum
                                   sum will be the matching pairs. The corresponding coordinates, MXR , MYR , MXL , and
                                   MYL , on the respective images will be the output.
You can also read