Deep-Learning-Based Pupil Center Detection and Tracking Technology for Visible-Light Wearable Gaze Tracking Devices

Page created by Jessie Moran

Hobbies & Interests

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Deep-Learning-Based Pupil Center Detection and Tracking Technology for Visible-Light Wearable Gaze Tracking Devices

applied
 sciences
Article
Deep-Learning-Based Pupil Center Detection and Tracking
Technology for Visible-Light Wearable Gaze Tracking Devices
Wei-Liang Ou, Tzu-Ling Kuo, Chin-Chieh Chang and Chih-Peng Fan *

 Department of Electrical Engineering, National Chung Hsing University, 145 Xingda Rd., South Dist.,
 Taichung City 402, Taiwan; soft96123@gmail.com (W.-L.O.); claire5883claire5883@gmail.com (T.-L.K.);
 g107064065@mail.nchu.edu.tw (C.-C.C.)
 * Correspondence: cpfan@dragon.nchu.edu.tw

 Abstract: In this study, for the application of visible-light wearable eye trackers, a pupil tracking
 methodology based on deep-learning technology is developed. By applying deep-learning object
 detection technology based on the You Only Look Once (YOLO) model, the proposed pupil tracking
 method can effectively estimate and predict the center of the pupil in the visible-light mode. By
 using the developed YOLOv3-tiny-based model to test the pupil tracking performance, the detection
 accuracy is as high as 80%, and the recall rate is close to 83%. In addition, the average visible-light
 pupil tracking errors of the proposed YOLO-based deep-learning design are smaller than 2 pixels for
 the training mode and 5 pixels for the cross-person test, which are much smaller than those of the
 previous ellipse fitting design without using deep-learning technology under the same visible-light
 conditions. After the combination of calibration process, the average gaze tracking errors by the
 proposed YOLOv3-tiny-based pupil tracking models are smaller than 2.9 and 3.5 degrees at the
 training and testing modes, respectively, and the proposed visible-light wearable gaze tracking
  system performs up to 20 frames per second (FPS) on the GPU-based software embedded platform.
 

Citation: Ou, W.-L.; Kuo, T.-L.; Keywords: deep-learning; YOLOv3-tiny; visible-light; pupil tracking; gaze tracker; wearable
Chang, C.-C.; Fan, C.-P. eye tracker
Deep-Learning-Based Pupil Center
Detection and Tracking Technology
for Visible-Light Wearable Gaze
Tracking Devices. Appl. Sci. 2021, 11, 1. Introduction
851. https://doi.org/10.3390/
 Recently, wearable gaze tracking devices (or called eye trackers) have started to
app11020851
 become more widely used in human-computer interaction (HCI) applications. To evaluate
 people’s attention deficit, vision, and cognitive information and processes, gaze tracking
Received: 29 November 2020
Accepted: 12 January 2021
 devices have been well-utilized in this field [1–6]. In [4], the eye-tracking technology was a
Published: 18 January 2021
 vastly applied methodology to examine hidden cognitive processes. The authors developed
 a program source code debugging procedure that was inspected by using the eye-tracking
Publisher’s Note: MDPI stays neutral
 method. The eye-tracking-based analysis was elaborated to test the debug procedure and
with regard to jurisdictional claims in
 record the major parameters of eye movement. In [5], the eye-movement tracking device
published maps and institutional affil- was used to observe and analyze a complex cognitive process for C# programming. By
iations. evaluating the knowledge level and eye movement parameters, the test subjects were
 involved to analyze the readability and intelligibility of the query and method at the
 Language-Integrated Query (LINQ) declarative query of the C# programming language.
 In [6], the forms and efficiency of debugging parts of software development were observed
Copyright: © 2021 by the authors.
 by tracking eye movements with the participation of test subjects. The routes of gaze for
Licensee MDPI, Basel, Switzerland.
 the test subjects were observed continuously during the procedure, and the coherences
This article is an open access article
 were decided by the assessment of the recorded metrics.
distributed under the terms and Many up-to-date consumer products have applied embedded gaze tracking technol-
conditions of the Creative Commons ogy in their designs. Through the use of gaze tracking devices, the technology improves
Attribution (CC BY) license (https:// the functions and applications of intuitive human-computer interaction. Gaze tracking
creativecommons.org/licenses/by/ designs use an image sensor based on near-infrared (NIR), which was utilized in [7–16].
4.0/). Some previous eye-tracking designs used high-contrast eye images, which are recorded by

Appl. Sci. 2021, 11, 851. https://doi.org/10.3390/app11020851 https://www.mdpi.com/journal/applsci

Appl. Sci. 2021, 11, x FOR PEER REVIEW 2 of 21

Appl. Sci. 2021, 11, 851 2 of 21
designs use an image sensor based on near-infrared (NIR), which was utilized in [7–16].
Some previous eye-tracking designs used high-contrast eye images, which are recorded
by active
active NIRNIR image image sensors.
sensors. NIR NIR
image image sensor-based
sensor-based technology
technology can achieve
can achieve highhigh accu-
accuracy
racy when used indoors. However, the NIR-based design is not
when used indoors. However, the NIR-based design is not suitable for use outdoors or suitable for use outdoors
or with
with sunlight.
sunlight. Moreover,
Moreover, long-term
long-term operation
operation maymay have
have thethe potential
potential toto hurtthe
hurt theuser’s
user’s
eyes. When eye trackers are considered for general and long-term
eyes. When eye trackers are considered for general and long-term use, visible-light pupil use, visible-light pupil
trackingtechnology
tracking technologybecomesbecomesaasuitable
suitablesolution.
solution.
Duetotothe
Due the characteristics
characteristics of the
of the selected
selected sensor
sensor module,
module, the contrast
the contrast of theofvisible-light
the visible-
light
eye eye may
image image bemay be insufficient,
insufficient, and the nearfield
and the nearfield eye imageeyemay image
containmay
somecontain
randomsome ran-
types
ofdom types
noise. of noise.visible-light
In general, In general,eye visible-light
images usuallyeye images
include usually include
insufficient insufficient
contrast con-
and large
trast and
random largeInrandom
noise. order tonoise. In order
overcome thisto overcome thissome
inconvenience, inconvenience,
other methods some other have
[17–19] meth-
ods [17–19]
been developed haveforbeen developed for visible-light-based
visible-light-based gaze tracking design.gaze tracking design. For
For the previous the pre-
wearable
vious
eye wearable
tracking designeyeintracking design
[15,19] with theinvisible-light
[15,19] withmode,
the visible-light
the estimation mode, the estimation
of pupil position
was easilyposition
of pupil disturbed wasbyeasily
light and shadow
disturbed byocclusion, and the prediction
light and shadow occlusion, anderrorthe
of the pupil
prediction
position,
error of thus,
the pupilbecomes serious.
position, thus,Figure 1 illustrates
becomes serious.some
Figureof1the visible-light
illustrates somepupil
of thetracking
visible-
results by using
light pupil the previous
tracking results by ellipse
usingfitting-based
the previousmethods in [15,19]. Thus,
ellipse fitting-based methodsa large pupil
in [15,19].
tracking error will lead to inaccuracy of the gaze tracking operation.
Thus, a large pupil tracking error will lead to inaccuracy of the gaze tracking operation.In [20,21], for indoor
applications,
In [20,21], forthe designers
indoor developed
applications, the eye tracking
designers devices based
developed on visible-light
eye tracking devices basedimageon
sensors. A previous
visible-light imagestudy [20]Aperformed
sensors. previous experiments indoors, but
study [20] performed showed thatindoors,
experiments the devicebut
could
showednotthat
work thenormally outdoors.
device could not work When the wearable
normally eye-tracking
outdoors. device is correctly
When the wearable eye-track-
used in all indoor
ing device and outdoor
is correctly used in environments,
all indoor and the method
outdoor based on thethe
environments, visible-light
method based imageon
sensor is sufficient.
the visible-light image sensor is sufficient.

Figure1.1.Visible-light
Figure Visible-lightpupil
pupiltracking
trackingresults
resultsby
bythe
theprevious
previousellipse
ellipsefitting-based
fitting-basedmethod.
method.

Related
RelatedWorks
Works
InInrecent
recent years, thethe
years, progress of deep-learning
progress of deep-learningtechnologies has become
technologies very important,
has become very im-
especially in the fieldinof
portant, especially computer
the vision andvision
field of computer patternand recognition. Through deep-learning
pattern recognition. Through deep-
technology, inference models
learning technology, canmodels
inference be trained
can toberealize
trainedreal-time
to realizeobject detection
real-time and
object recog-
detection
nition. For the deep-learning-based designs in [22–37], the eye tracking
and recognition. For the deep-learning-based designs in [22–37], the eye tracking systems systems could
detect
couldthe location
detect of eyes or
the location of pupils,
eyes orregardless of using of
pupils, regardless theusing
visible-light or near-infrared
the visible-light or near-
image
infraredsensors.
imageTable 1 describes
sensors. Table 1 the overview
describes the of the deep-learning-based
overview gaze tracking
of the deep-learning-based gaze
designs.
tracking Fromdesigns. the From
viewpoints of device of
the viewpoints setup, the setup,
device deep-learning-based gaze tracking
the deep-learning-based gaze
designs
tracking indesigns
[22–37] incan be divided
[22–37] can beinto two different
divided into twostyles,
differentwhich include
styles, which theinclude
non-wearable
the non-
and the wearable
wearable and thetypes.
wearable types.
For
For thedeep-learning-based
the deep-learning-basednon-wearable
non-wearableeye eyetracking
trackingdesigns,
designs,inin[22],
[22],the
theauthors
authors
proposed
proposed a deep-learning-based gaze estimation scheme to estimate the gazedirection
a deep-learning-based gaze estimation scheme to estimate the gaze direction
from
fromaasingle
singlefacefaceimage.
image.The Theproposed
proposedgazegazeestimation
estimationdesigndesignwas wasbased
basedon onapplying
applying
multiple
multiple convolutional neural networks (CNN) to learn the model for gazeestimation
convolutional neural networks (CNN) to learn the model for gaze estimation
from
fromthe theeyeeyeimages.
images.The Theproposed
proposeddesign
designprovided
providedaccurate
accurategaze
gazeestimation
estimationfor forusers
users
with different head poses by including the head pose information into
with different head poses by including the head pose information into the gaze estimationthe gaze estimation
framework.
framework.ComparedComparedwith withthe
theprevious
previousmethods,
methods,the thedeveloped
developedgazegazeestimation
estimationdesign
design
increased
increased the accuracy of appearance-based gaze estimation under headpose
the accuracy of appearance-based gaze estimation under head posevariations.
variations.
In [23], the purpose of this work was to use CNNs to classify eye tracking data. Firstly,
In [23], the purpose of this work was to use CNNs to classify eye tracking data.
a CNN was used to classify two different web interfaces for browsing news data. Secondly,
Firstly, a CNN was used to classify two different web interfaces for browsing news data.
a CNN was utilized to classify the nationalities of users. In addition, the data-preprocessing
Secondly, a CNN was utilized to classify the nationalities of users. In addition, the data-
and feature-engineering techniques were applied. In [24], for emerging consumer elec-
tronic products, an accurate and efficient eye gaze estimation design was important, and

Appl. Sci. 2021, 11, 851 3 of 21

the products were required to remain reliable under difficult environments with low-
power consumption and low hardware cost. In this paper, a new hardware-friendly CNN
model with minimal computational requirements was developed and assessed for efficient
appearance-based gaze estimation. The proposed CNN model was tested and compared
against existing appearance-based CNN approaches, achieving better eye gaze accuracy
with large reductions of computational complexities.
In [25], iRiter technology was developed, which can assist paralyzed people to write
on screen by using only their eye movements. iRiter detects precisely the movement of the
eye pupil by referring to the reflections of external near-infrared (IR) signals. Then, the
deep multilayer perceptron model was used for the calibration process. By the position
vector of pupil area, the coordinates of the pupil were mapped to the given gaze position
on the screen.
In [26], the authors proposed a gaze zone estimation design by using deep-learning
technology. Compared with traditional designs, the developed method did not need the
procedure of calibration. In the proposed method, a Kinect was used to capture the video
of a computer user, a Haar cascade classifier was applied to detect the face and eye regions,
and data on the eye region was used to estimate the gaze zones on the monitor by the
trained CNN. The proposed calibration-free method performed a high accuracy to be
applied for human–computer interaction.
In [27], which focused on healthcare applications, the high accuracy in semantic
segmentation of medical images were important; a key issue for training the CNNs was
obtaining large-scale and precisely annotated images, and the authors sought to address
the lack of annotated data used with eye tracking method. The hypothesis was that
segmentation masks generated to help eye tracking would be very similar to those rendered
by hand annotation, and the results demonstrated that the eye tracking method created
segmentation masks, which were suitable for deep-learning-based semantic segmentation.
In [29], for the conditions of unconstrained gaze estimation, most of the gaze datasets
were collected under laboratory conditions, and the previous methods were not evaluated
across multiple datasets. In this work, the authors studied key challenges, including target
gaze range, illumination conditions, and facial appearance variation. The study showed
that image resolution and the use of both eyes affected gaze estimation performance.
Moreover, the authors proposed the first deep appearance-based gaze estimation method,
i.e., GazeNet, to improve the state of the art by 22% for the cross-dataset evaluation.
In [30], a subconscious response influenced the estimation of human gaze from factors
related to the human mental activity. The previous methodologies only based on the use
of gaze statistical modeling and classifiers for assessing images without referring to a
user’s interests. In this work, the authors introduced a large-scale annotated gaze dataset,
which was suitable for training deep-learning models. Then, a novel deep-learning-based
method was used to capture gaze patterns for assessing image objects with respect to the
user’s preferences. The experiments demonstrated that the proposed method performed
effectively by considering key factors related to the human gaze behavior.
In [31], the authors proposed a two-step training network, which was named the Gaze
Estimator, to raise the gaze estimation accuracy on mobile devices. At the first step, an
eye landmarks localization network was trained on the 300W-LP dataset, and the second
step was to train a gaze estimation model on the GazeCapture dataset. In [28], the network
localized the eye precisely on the image and the CNN network was robust when facial
expressions and occlusion happened; the inputs of the gaze estimation network were eye
images and eye grids.
In [32], since many previous methods were based on a single camera, and focused on
either the gaze point estimation or gaze direction estimation, the authors proposed an ef-
fective multitask method for the gaze point estimation by multi-view cameras. Specifically,
the authors analyzed the close relationship between the gaze point estimation and gaze
direction estimation, and used a partially shared convolutional neural networks to estimate

Appl. Sci. 2021, 11, 851 4 of 21

the gaze direction and gaze point. Moreover, the authors introduced a new multi-view
gaze tracking data set consisting of multi-view eye images of different subjects.
In [33], the U2Eyes dataset was introduced, and U2Eyes was a binocular database of
synthesized images regenerating real gaze tracking scripts with 2D/3D labels. The U2Eyes
dataset could be used in machine and deep-learning technologies for eye tracker designs.
In [34], for using appearance-based gaze estimation techniques, only using a single camera
for the gaze estimation will limit the application field to short distance. To overcome this
issue, the authors developed a new long-distance gaze estimation design. In the training
phase, the Learning-based Single Camera eye tracker (LSC eye tracker) acquired gaze data
by a commercial eye tracker, the face appearance images were captured by a long-distance
camera, and deep convolutional neural network models were used to learn the mapping
from appearance images to gazes. In the application phase, the LSC eye tracker predicted
gazes based on the acquired appearance images by the single camera and the trained CNN
models with effective accuracy and performance.
In [35], the authors introduced an effective eye-tracking approach by using a deep-
learning-based method. The location of the face relative to the computer was obtained
by detecting color from the infrared LED with OpenCV, and the user’s gaze position was
inferred by the YOLOv3 model. In [36], the authors developed a low-cost and accurate
remote eye-tracking device which used an industrial prototype smartphone with integrated
infrared illumination and camera. The proposed design used a 3D gaze-estimation model
which enabled accurate point-of-gaze (PoG) estimation with free head and device motion.
To accurately determine the input eye features, the design applied the convolutional neural
networks with a novel center-of-mass output layer. The hybrid method, which used
artificial illumination, a 3D gaze-estimation model, and a CNN feature extractor, achieved
better accuracy than the existing eye-tracking methods on smartphones.
On the other hand, for the deep-learning-based wearable eye tracking designs, in [28],
the eye tracking method with video-oculography (VOG) images needed to provide an
accurate localization of the pupil with artifacts and under naturalistic low-light conditions.
The authors proposed a fully convolutional neural network (FCNN) for the segmentation
of the pupil area that was trained on 3946 hand-annotated VOG images. The FCNN output
simultaneously performed pupil center localization, elliptical contour estimation, and
blink detection with a single network on commercial workstations with GPU acceleration.
Compared with existing methods, the developed FCNN-based pupil segmentation design
was accurate and robust for new VOG datasets. In [34], to conquer the obstructions and
noises in nearfield eye images, the nearfield visible-light eye tracker design utilized the
deep-learning-based gaze tracking technologies to solve the problem.
In order to make the pupil tracking estimation more accurate, this work uses the
YOLOv3-tiny [38] network-based deep-learning design to detect and track the position
of the pupil. The proposed pupil detector is a well-trained deep-learning network model.
Compared with the previous designs, the proposed method reduces the pupil tracking
errors caused by light reflection and shadow interference in visible-light mode, and will
also improve the accuracy of the calibration process of the entire eye tracking system.
Therefore, the gaze tracking function of the wearable eye tracker will become powerful
under visible-light conditions.

Appl. Sci. 2021, 11, 851 5 of 21

Table 1. Overview of the deep-learning-based gaze tracking designs.

Deep-learning-based Operational Eye/Pupil Tracking Gaze Estimation and
Dataset Used
Methods Mode/Setup Method Calibration Scheme
Multiple pose-based
Uses a facial
Visible-light mode and VGG-like CNN
Sun et al. [22] UT Multiview landmarks-based design to
/Non-Wearable models for gaze
locate eye regions
estimation
Joint eye-gaze CNN
Visible-light mode The CNN-based eye architecture with both
Lemley et al. [24] MPII Gaze
/Non-Wearable detection eyes for gaze
estimation
Uses the OpenCV method
Visible-light mode GoogleNetV1 based
Cha et al. [26] Self-made dataset to extract the face region
/Non-Wearable gaze estimation
and eye region
Uses face alignment and VGGNet-16-based
Visible-light mode
Zhang et al. [29] MPII Gaze 3D face model fitting to GazeNet for gaze
/Non-Wearable
find eye zones estimation
CNN-based multi-view
Visible-light mode ShanghaiTechGaze and Uses the ResNet-34 model
Lian et al. [32] and multi-task gaze
/Non-Wearable MPII Gaze for eye features extraction
estimation
Uses the OpenFace CLM-
Visible-light mode framework/Face++/YOLOv3 Infrared-LED based
Li et al. [34] Self-made dataset
/Non-Wearable to extract facial ROI and calibration
landmarks
Visible-light mode Uses YOLOv3 to detect the Infrared-LED based
Rakhmatulin et al. [35] Self-made dataset
/Non-Wearable eyeball and eye’s corners calibration
Infrared mode Eye-Feature locator CNN 3D gaze-estimation
Brousseau et al. [36] Self-made dataset
/Non-Wearable model model-based design
Uses a fully convolutional
3D eyeball model,
Near-eye infrared German center for neural network for pupil
marker, and projector-
Yiu et al. [28] mode vertigo and balance segmentation and uses
assisted-based
/Wearable disorders ellipse fitting to locate the
calibration
eye center
Uses the
Near-eye Marker and affine
YOLOv3-tiny-based lite
Proposed design visible-light Self-made dataset transform-based
model to detect the
mode/Wearable calibration
pupil zone

The main contribution of this study is that the pupil location in nearfield visible-light
eye images is detected precisely by the YOLOv3-tiny-based deep-learning technology. To
our best knowledge from surveying related studies, the novelty of this work is to provide
the leading reference design of deep-learning-based pupil tracking technology using the
nearfield visible-light eye images for the application of wearable gaze tracking devices.
Moreover, the other novelties and advantages of the proposed deep-learning-based pupil
tracking technology are described as follows:
(1) By using a nearfield visible-light eye image dataset for training, the used deep-
learning models achieved real-time and accurate detection of the pupil position at the
visible-light mode.
(2) The proposed design detects the position of the pupil’s object at any eyeball movement
condition, which is more effective than the traditional image processing methods
without deep-learning technologies at the near-eye visible-light mode.
(3) The proposed pupil tracking technology can overcome efficiently the light and shadow
interferences at the near-eye visible-light mode, and the detection accuracy of a pupil’s

Appl. Sci. 2021, 11, 851 6 of 21

location is higher than that of previous wearable gaze tracking designs without using
Appl. Sci. 2021, 11, xdeep-learning
FOR PEER REVIEWtechnologies. 6 of 21
To enhance detection accuracy, we used the YOLOv3-tiny-based deep-learning model
to detect the pupil’s object in the visible-light near-eye images. We also compared its
detection results to the Toother designs,
enhance which
detection include
accuracy, wethe methods
used without using deep-
the YOLOv3-tiny-based deep-learning
model
learning technologies. Theto pupil
detect detection
the pupil’sperformance
object in the visible-light
was evaluatednear-eye images.
in terms We also compared
of precision,
its detection
recall, and pupil tracking results
errors. to the
By the other designs,
proposed which include the
YOLOv3-tiny-based methods without using deep-
deep-learning-based
learning technologies. The pupil detection
method, the trained YOLO-based deep-learning models achieve enough performance was evaluated in terms
precision with of preci-
sion, recall, and pupil tracking errors. By the proposed YOLOv3-tiny-based deep-learn-
small average pupil tracking errors, which are less than 2 pixels for the training mode and
ing-based method, the trained YOLO-based deep-learning models achieve enough preci-
5 pixels for the cross-person test under the near-eye visible-light conditions.
sion with small average pupil tracking errors, which are less than 2 pixels for the training
mode and 5 pixels for the cross-person test under the near-eye visible-light conditions.
2. Background of Wearable Eye Tracker Design
Figure 2 depicts the generaloftwo-stage
2. Background Wearable Eyeoperational flow of the wearable gaze tracker
Tracker Design
with the four-point calibration scheme. The wearable gaze
Figure 2 depicts the general two-stage operational tracking device
flow ofcanthe use the NIR
wearable gaze tracker
or visible-light image sensor-based
with the gaze tracking
four-point calibration scheme. design, whichgaze
The wearable can tracking
apply appearance-,
device can use the NIR
model-, or learning-based technology.
or visible-light First, the gaze
image sensor-based gazetracking
trackingdevice
design,estimates
which canand tracks
apply appearance-,
model-,
the pupil’s center to prepareor learning-based
the calibrationtechnology.
process. For First,
thetheNIR gaze tracking
image device estimates
sensor-based design, and tracks
the pupil’scomputations,
for example, the first-stage center to preparewhich
the calibration
involve process. For thefor
the processes NIRpupil
imagelocation,
sensor-based de-
sign, for
extraction of the pupil’s example,
region the first-stage
of interest (ROI), thecomputations,
binarization which involve
process withthethe
processes
two-stage for pupil lo-
cation, extraction of the pupil’s region of interest (ROI), the binarization process with the
scheme in pupil’s ROI, and the fast scheme of ellipse fitting with binarized pupil’s edge
two-stage scheme in pupil’s ROI, and the fast scheme of ellipse fitting with binarized pu-
points, are utilized [15]. Then, the pupil’s center can be detected effectively. For the visible-
pil’s edge points, are utilized [15]. Then, the pupil’s center can be detected effectively. For
light image sensor-based design,
the visible-light for sensor-based
image example, bydesign, usingfor the gradients-based
example, by using theiris center
gradients-based iris
location technology [21] or the deep-learning-based methodology [37], the gaze
center location technology [21] or the deep-learning-based methodology [37], the gaze tracking
device realizes antracking
effective iris center
device realizestracking
an effectivecapability.
iris centerNext, at capability.
tracking the second processing
Next, at the second pro-
stage, the functioncessing
and process for function
stage, the calibrationandand gazefor
process prediction
calibrationcanandbegaze
done. To increase
prediction can be done.
the accuracy of gazeTo tracking,
increase thetheaccuracy
head movement effect can
of gaze tracking, be compensated
the head movement effect by the
canmotion
be compensated
measurement device by the
[39].motion
In themeasurement device for
following section, [39].the
In proposed
the following section,gaze
wearable for the proposed wear-
tracking
able processing
design, the first-stage gaze trackingprocedure
design, theuses
first-stage processing procedurepupil
the deep-learning-based uses the deep-learning-
tracking
based pupil tracking technology by using a visible-light
technology by using a visible-light nearfield eye image sensor. Next, the second-stage nearfield eye image sensor. Next,
the second-stage process achieves the function of calibration for the gaze tracking and
process achieves the function of calibration for the gaze tracking and prediction work.
prediction work.

Figure 2. The general processing
Figure flow of
2. The general the wearable
processing gaze
flow of tracker. gaze tracker.
the wearable

3. Proposed Methods
3. Proposed Methods
The proposed wearable eye tracking
The proposed wearable method is divided
eye tracking methodinto severalinto
is divided steps as follows:
several steps as follows:
(1) Collecting the visible-light
(1) Collectingnearfield eye images
the visible-light in eye
nearfield datasets and
images labelingand
in datasets these eye image
labeling these eye image
datasets;
datasets; (2) Choosing and (2) designing
Choosing and thedesigning the deep-learning
deep-learning networknetwork architecture
architecture for pupil object
for pupil
detection; (3) Training and testing the trained inference model;
object detection; (3) Training and testing the trained inference model; (4) Using the deep-(4) Using the deep-learning
model to infer and detect the pupil’s object, and estimate the center
learning model to infer and detect the pupil’s object, and estimate the center coordinate coordinate of pupil box;
(5) Following the calibration process and predicting the gaze points. Figure 3 depicts the
of pupil box; (5) Following the calibration process and predicting the gaze points. Figure
computational process of the proposed visible-light gaze tracking design based on the
3 depicts the computational process of the proposed visible-light gaze tracking design
YOLO deep-learning model. In the proposed design, a visible-light camera module is used
based on the YOLO deep-learning model. In the proposed design, a visible-light camera
module is used to capture nearfield eye images. The design uses a deep-learning network

calculates the pupil center from the detected pupil box for the subsequent calibration and
gaze tracking process.
Figure 4 demonstrates the scheme of the self-made wearable eye tracking device. In
the proposed wearable device, both the outward-view scene camera and the nearfield eye
Appl.Appl. Sci. 2021,
Sci. 2021, 11, x 11,
FOR PEER REVIEW camera are required to provide the whole function of the wearable gaze tracker.
851 7 ofThe
7 ofpro-
21 21
posed design uses deep-learning-based technology to detect and track the pupil zone, es-
timate the pupil’s center by using the detected pupil object, and then use the coordinate
information to perform and predict the gaze points. Compared with the previous image
to capture
based nearfield
on YOLO eye
to images.
train anThe designmodel
inference uses a deep-learning
to detect network
the object based and
of pupil, on YOLO
then, the
processing methods of tracking the pupil at the visible-light mode, the proposed method
to train an inference
proposed system model to detect
calculates the thecenter
pupil objectfrom
of pupil,
the and then,
detected thebox
pupil proposed
for the system
subsequent
can detect the pupil object at any eyeball position, and it also is not affected by light and
calculates the pupil
calibration centertracking
and gaze from the detected pupil box for the subsequent calibration and
shadow interferences in realprocess.
applications.
gaze tracking process.
Figure 4 demonstrates the scheme of the self-made wearable eye tracking device. In
the proposed wearable device, both the outward-view scene camera and the nearfield eye
camera are required to provide the whole function of the wearable gaze tracker. The pro-
posed design uses deep-learning-based technology to detect and track the pupil zone, es-
timate the pupil’s center by using the detected pupil object, and then use the coordinate
information to perform and predict the gaze points. Compared with the previous image
processing methods of tracking the pupil at the visible-light mode, the proposed method
can detect the pupil object at any eyeball position, and it also is not affected by light and
shadow interferences in real applications.

Figure 3. The operational flow of the proposed visible-light based wearable gaze tracker.
Figure 3. The operational flow of the proposed visible-light based wearable gaze tracker.
Figure 4 demonstrates the scheme of the self-made wearable eye tracking device. In
the proposed wearable device, both the outward-view scene camera and the nearfield
eye camera are required to provide the whole function of the wearable gaze tracker. The
proposed design uses deep-learning-based technology to detect and track the pupil zone,
estimate the pupil’s center by using the detected pupil object, and then use the coordinate
information to perform and predict the gaze points. Compared with the previous image
processing methods of tracking the pupil at the visible-light mode, the proposed method
can detect the pupil object at any eyeball position, and it also is not affected by light and
shadow interferences in real applications.
Figure 3. The operational flow of the proposed visible-light based wearable gaze tracker.

Figure 4. The self-made visible-light wearable gaze tracking device.

3.1. Pre-Processing for Pupil Center Estimation and Tracking
In the pre-processing work, the visible-light nearfield eye images are recorded for
datasets, and the collected eye image datasets will be labeled for training and testing the
trained inference model. Moreover, we choose several YOLO-based deep-learning models
for pupil object detection. By using the YOLO-based deep-learning model to detect the
pupil’s object, the proposed system can track and estimate the center coordinate of pupil
box for calibration and gaze prediction.
Figure 4. The
Figure 4. self-made visible-light
The self-made wearable
visible-light gazegaze
wearable tracking device.
tracking device.

3.1. Pre-Processing
3.1. Pre-Processing for Pupil
for Pupil Center
Center Estimation
Estimation and Tracking
and Tracking
In pre-processing
In the the pre-processing
work,work, the visible-light
the visible-light nearfield
nearfield eye eye images
images are recorded
are recorded for for
datasets, and the collected eye image datasets will be labeled for training and testing the the
datasets, and the collected eye image datasets will be labeled for training and testing
trained
trained inference
inference model.
model. Moreover,
Moreover, we choose
we choose several
several YOLO-based
YOLO-based deep-learning
deep-learning modelsmodels
for pupil object detection. By using the YOLO-based deep-learning model to detect the the
for pupil object detection. By using the YOLO-based deep-learning model to detect
pupil’s object, the proposed system can track and estimate the center coordinate of pupil
pupil’s object, the proposed system can track and estimate the center coordinate of pupil
box for calibration and gaze prediction.
box for calibration and gaze prediction.

Appl.
Appl.Sci. 2021,
Sci. 11,11,
2021, 851x FOR PEER REVIEW 8 of 2121
8 of

3.1.1.Generation
3.1.1. GenerationofofTraining
Trainingand andValidation
Validation Datasets
Datasets
InInorder
ordertotocollect
collectnearfield
nearfieldeye eyeimages
imagesfrom fromeight
eightdifferent
differentusers
usersforfortraining
trainingandand
testing, each user
testing, each user used used a self-made wearable eye tracker to emulate the gaze
wearable eye tracker to emulate the gaze tracking tracking op-
erations, after
operations, afterwhich
which thethe
users rotated
users theirtheir
rotated gaze gaze
around the screen
around to capture
the screen the nearfield
to capture the
eye images
nearfield eye with
images various eyeball eyeball
with various positions. The collected
positions. and recorded
The collected near-eye
and recorded images
near-eye
cover all
images coverpossible eyeballeyeball
all possible positions. FigureFigure
positions. 5 illustrates nearfield
5 illustrates visible-light
nearfield eye images
visible-light eye
collected
images for training
collected and in-person
for training and in-persontests. In Figure
tests. 5, the numbers
In Figure of near-eye
5, the numbers images
of near-eye
from person
images 1, person
from person 1, 2, person
person 2, 3,person
person3,4,person
person4,5, person
person 6, 5, person
person 7,6,and person
person 8 are
7, and
849, 987,
person 734,
8 are 849,826, 977,
987, 896,
734, 886,
826, and
977, 939,
896, respectively.
886, Moreover, the
and 939, respectively. near-eye
Moreover, images
the cap-
near-eye
images captured
tured from person from9 toperson
person916 to are
person
used16 forare
theused for the cross-person
cross-person test. Thus, thetest. Thus, the
nearfield eye
nearfield
images fromeye images
personfrom 1 toperson
person1 8towithout
person 8closed
without closed
eyes eyes are selected,
are selected, of whichofapproxi-
which
approximately
mately 900 images 900 images per person
per person are usedare used for training
for training and and testing.
testing. The The
imageimage datasets
datasets con-
consisted of a total of 7094 eye patterns.
sisted of a total of 7094 eye patterns. Next, these Next, these selected eye
eye images will go througha a
images will go through
labeling process before training the inference model
labeling process before training the inference model for pupil for pupil object detection
object and and
detection tracking.
track-
Toing.
increase the number
To increase of images
the number in thein
of images training phase,phase,
the training the datathe augmentation
data augmentation process,
pro-
which
cess, randomly
which randomly changeschanges
angle, saturation, exposure,
angle, saturation, and hue and
exposure, of the selected
hue of theimage
selecteddataset,
image
isdataset,
enabledisby using the
enabled framework
by using of YOLO.ofAfter
the framework YOLO. theAfter
160,000
the iterations for training
160,000 iterations the
for train-
YOLOv3-tiny-based
ing the YOLOv3-tiny-based models, the total number of images used for training willtobe
models, the total number of images used for training will be up
10upmillion.
to 10 million.

Figure5.5.Collected
Figure Collecteddatasets
datasetsofofnearfield
nearfieldvisible-light
visible-lighteye
eyeimages.
images.

Before
Beforetraining
trainingthethemodel,
model,the thedesigner
designermust mustfind
findthe
thecorrect
correctimage
imageobject
object(or
(oranswer)
answer)
ininthe
theimage
imagedatasets,
datasets,tototrain
trainthe
themodel
modelwhatwhattotolearn
learnandandtotoidentify
identifythethelocation
locationofofthethe
correct
correctobject
objectininthe
thewhole
wholeimage,
image,which
whichisiscommonly
commonlyknown knownasasthetheground
groundtruth,
truth,sosothe
the
labeling
labelingprocess
processmust
mustbebedone
donebefore
beforethe
thetraining
trainingprocess.
process.TheTheposition
positionofofthethepupil
pupilcenter
center
varies
variesininthe
thedifferent
differentnearfield
nearfieldeyeeyeimages.
images.Figure
Figure6 6demonstrates
demonstratesthe thelabeling
labelingprocess
processforfor
the location of the pupil’s ROI object.
the location of the pupil’s ROI object. For the following learning process of the
following learning process of the detector detectorof
ofthe
thepupil
pupil zone,
zone, thethe
sizesize of labeled
of the the labeled box affects
box also also affects the detection
the detection accuracy accuracy of the
of the network
network architecture.
architecture. In our experiments,
In our experiments, firstly, afirstly,
labeleda labeled
box with box with
the 10 ×the
10 10 × 10size
pixels pixels size
is tested,
isand
tested, and
then, then, a box
a labeled labeled
withbox with
a size a size ranging
ranging from 20 ×from 20 × 20
20 pixels pixels
to 30 × 30to 30 ×is30adopted.
pixels pixels
isThe
adopted. Thelabeled
size of the size ofbounding
the labeled boxbounding box is closely
is closely related relatedof
to the number tofeatures
the number of
analyzed
features analyzed by the deep-learning architecture. In the labeling process
by the deep-learning architecture. In the labeling process [40], in order to improve the [40], in order
to improve the accuracy of the ground truth, the ellipse fitting method was applied for

Appl. Sci. 2021, 11, x FOR PEER REVIEW 9 of 21
Appl. Sci. 2021, 11, 851 9 of 21

accuracy of the ground truth, the ellipse fitting method was applied for the pre-position-
the
ingpre-positioning of the pupil’s
of the pupil’s center, center,
after which weafter
usedwhich we used
a manual a manual
method method
to further to further
correct the co-
correct the coordinates of the pupil center in each selected nearfield eye image. Before
ordinates of the pupil center in each selected nearfield eye image. Before the training pro- the
training process, all labeled images (i.e., 7094 eye images) are randomly divided
cess, all labeled images (i.e., 7094 eye images) are randomly divided into a training part, into a
training part, verification part, and test part, with corresponding ratios of 8:1:1. Therefore,
verification part, and test part, with corresponding ratios of 8:1:1. Therefore, the number
the numbereye
of labeled of images
labeledusedeye images usedisfor
for training training
5676, is number
and the 5676, and
of the number
labeled imagesof labeled
used for
images used and
verification for verification and testing is 709.
testing is 709.

Figure 6. The labeling process for location of the pupil’s ROI.
Figure 6. The labeling process for location of the pupil’s ROI.

3.1.2.The
3.1.2. TheUsed
UsedYOLO-Based
YOLO-BasedModels
Modelsand
andTraining/Validation
Training/ValidationProcess
Process
Theproposed
The proposeddesign
designusesusesthetheYOLOv3-tiny
YOLOv3-tinydeep-learning
deep-learningmodel. model.The Thereasons
reasonsfor for
choosingaanetwork
choosing networkmodel
modelbased
basedon onYOLOv3-tiny
YOLOv3-tinyare: are:(1)
(1)The
TheYOLOv3-tiny-based
YOLOv3-tiny-basedmodel model
achievesgood
achieves gooddetection
detectionand andrecognition
recognitionperformance
performanceatatthe thesame
sametime;
time;(2)(2)The
TheYOLOv3-
YOLOv3-
tiny-based
tiny-basedmodelmodelisiseffective
effectiveforforsmall
smallobject
objectdetection,
detection,andandthe thepupil
pupilininthethenearfield
nearfieldeye eye
image is also a very small object. In addition, because the number of
image is also a very small object. In addition, because the number of convolutional layersconvolutional layers
ininYOLOv3-tiny
YOLOv3-tinyis issmall,small,thethe
inference
inference model
modelbased on YOLOv3-tiny
based on YOLOv3-tiny can achieve real-time
can achieve real-
operations on theon
time operations software-based
the software-basedembedded platform.
embedded platform.
In
Inthetheoriginal
originalYOLOv3-tiny
YOLOv3-tiny model,
model, as as
shown
shownin Figure
in Figure 7, each layer
7, each usesuses
layer threethree
anchoran-
boxes to predict
chor boxes the bounding
to predict boxes inboxes
the bounding a gridincell. However,
a grid only one pupil
cell. However, only oneobject category
pupil object
needs to be
category detected
needs to beindetected
each nearfield
in eacheye image,eye
nearfield after whichafter
image, the number
which the of number
anchor boxes
of an-
required can be reduced. Figure 8 illustrates the network architecture
chor boxes required can be reduced. Figure 8 illustrates the network architecture of the of the YOLOv3-tiny
model with only
YOLOv3-tiny one anchor.
model with only Besides that the
one anchor. number
Besides of the
that required
number anchor boxes can
of required be
anchor
reduced, since the image features of the detected pupil object is not intricate,
boxes can be reduced, since the image features of the detected pupil object is not intricate, the number
ofthe
convolutional layers can alsolayers
number of convolutional be reduced.
can alsoFigure 9 showsFigure
be reduced. the network
9 shows architecture
the network of the
ar-
YOLOv3-tiny model with one anchor and a one-way path. In Figure
chitecture of the YOLOv3-tiny model with one anchor and a one-way path. In Figure 9, 9, since the number
of convolutional
since the numberlayers is reduced, layers
of convolutional the computational
is reduced, the complexity can also
computational be reduced
complexity can
effectively.
also be reduced effectively.
Before the labeled eye images are fed into the YOLOv3-tiny-based deep-learning
models, the parameters used for training deep-learning networks must be set properly,
e.g., batch, decay, subdivision, momentum, etc., after which the inference models for
pupil center tracking will be achieved accurately. In our training design for models, the
parameters of Width/height, Channels, Batch, Max_batches, Subdivision, Momentum,
Decay, and Learning_rate are set to 416, 3, 64, 500200, 2, 0.9, 0.0005, and 0.001, respectively.
In the architecture of the YOLOv3-tiny model, 13 convolution layers with differ-
ent filter numbers and pooling layers with different strides are established to allow the
YOLOv3-tiny model to have a high enough learning capability for training from images. In
addition, the YOLOv3-tiny-based model not only uses the convolution neural network, but
also adds upsampling and concrete calculation methods. During the learning process, the
YOLO model uses training datasets for training, uses the images in the validation dataset
to test the model, observes whether the model is indeed learning correctly, uses the weight
back propagation method to gradually converge the weight, and observes whether the
weight converges with the loss function.

Appl. Sci. 2021, 11, 851 10 of 21
 Appl. Sci. 2021, 11, x FOR PEER REVIEW 10 of 21
 Appl. Sci. 2021, 11, x FOR PEER REVIEW 10 of 21

 Figure7.7.The
 Figure Thenetwork
 networkarchitecture
 architectureofofthe
 theoriginal
 originalYOLOv3-tiny
 YOLOv3-tiny model.
 model.
 Figure 7. The network architecture of the original YOLOv3-tiny model.

 Figure 8. The network architecture of the YOLOv3-tiny model with one anchor.
 Figure8.8.The
 Figure Thenetwork
 networkarchitecture
 architecture
 ofof the
 the YOLOv3-tiny
 YOLOv3-tiny model
 model with
 with one
 one anchor.
 anchor.

Appl.
Appl.Sci. 2021,11,
Sci.2021, 11,851
x FOR PEER REVIEW 1111ofof2121

Figure9.9.The
Figure Thenetwork
networkarchitecture
architectureof
of the
the YOLOv3-tiny
YOLOv3-tiny model
model with
with one
one anchor
anchor and
and aa one-way
one-way path.
path.

Before10
Figure the labeled
depicts theeye images
training andare fed into
testing the YOLOv3-tiny-based
processes by the YOLO-based deep-learning
models for
models, pupils.
tracking the parameters used forand
In the detection training deep-learning
classification processnetworks mustmodel,
by the YOLO be set properly,
the grid
e.g.,anchor
cell, batch, decay,
box, andsubdivision,
activationmomentum,
function areetc.,
used after which the inference
to determine models
the position for
of the pupil
object
center tracking
bounding box inwill
the be achieved
image. accurately.
Therefore, In our training
the well-trained design
model can for models,
detect the param-
the position of
the pupil in the image with the bounding box. The detected pupil box
eters of Width/height, Channels, Batch, Max_batches, Subdivision, Momentum, Decay, can be compared
Appl. Sci. 2021, 11,withFORthe
xand ground
REVIEWtruth
Learning_rate
PEER aretoset
generate the
to 416, 3, 64,intersection over
500200, 2, 0.9, unionand
0.0005, (IoU), precision,
0.001, and recall
respectively. 12 of
values; Inwe
theuse these values
architecture to judge
of the the quality
YOLOv3-tiny of the13
model, model training.layers with different
convolution
filter numbers and pooling layers with different strides are established to allow the
YOLOv3-tiny model to have a high enough learning capability for training from images.
In addition, the YOLOv3-tiny-based model not only uses the convolution neural network,
but also adds upsampling and concrete calculation methods. During the learning process,
the YOLO model uses training datasets for training, uses the images in the validation da-
taset to test the model, observes whether the model is indeed learning correctly, uses the
weight back propagation method to gradually converge the weight, and observes whether
the weight converges with the loss function.
Figure 10 depicts the training and testing processes by the YOLO-based models for
tracking pupils. In the detection and classification process by the YOLO model, the grid
cell, anchor box, and activation function are used to determine the position of the object
bounding box in the image. Therefore, the well-trained model can detect the position of
the pupil in the image with the bounding box. The detected pupil box can be compared
with the ground truth to generate the intersection over union (IoU), precision, and recall
values; we use these values to judge the quality of the model training.
Figure 10. The
Figure training
10. The andand
training testing processes
testing bybythe
processes theYOLO-based
YOLO-basedmodels
models for
for tracking pupil.
tracking pupil.

Table 2 presents the confusion
Table 2 presentsmatrix used formatrix
the confusion the performance
used for theevaluation
performance of classifica-
evaluation of class
tion. In patternfication.
recognition for information
In pattern recognitionretrieval and classification,
for information recall
retrieval and is mentioned
classification, recall is me
to as the true positive rate
tioned to or sensitivity,
as the and
true positive precision
rate is mentioned
or sensitivity, as theispositive
and precision mentionedpredic-
as the positiv
tive value. predictive value.

Table 2. Confusion matrix for classification.

Actual Values
Positives Negatives
Positives

True Positives False Positives
values

(TP) (FP)

Appl. Sci. 2021, 11, 851 12 of 21

Table 2. Confusion matrix for classification.

Actual Values
Positives Negatives
True Positives False Positives
Positives
Predictive values (TP) (FP)
False Negatives True Negatives
Negatives
(FN) (TN)

Both precision and recall are based on an understanding and measurement of rele-
vance. Based on Table 1, the definition of precision is expressed as follows:

Precision = TP/(TP + FP), (1)

and recall is defined as follows:

Recall = TP/(TP + FN). (2)

After using the YOLO-based deep-learning model to detect the pupil’s ROI object,
the midpoint coordinate of the pupil’s bounding box can be found, and the midpoint
coordinate value is the estimated pupil‘s center point. The midpoint coordinates for
the estimated pupil’s centers are evaluated with Euclidean distance by comparing the
estimated coordinates with the coordinates of ground truth, and the tracking errors of
pupil’s center will also be obtained for the performance evaluation. Using the pupil‘s
central coordinates for the coordinate conversion of calibration, the proposed wearable
device can correctly estimate and predict the outward gaze points.

3.2. Calibration and Gaze Tracking Processes
Before using the wearable eye tracker to predict the gaze points, the gaze tracking
Appl. Sci. 2021, 11, x FOR PEER REVIEW 13 of 2
device must undergo a calibration procedure. Because the appearance of each user’s eye
and the application distance to the near-eye camera are different, the calibration must
be done to confirm the coordinate mapping scheme between the pupil’s center and the
the application distance to the near-eye camera are different, the calibration must be done
outward scene, after which the wearable gaze tracker can correctly predict the gaze points
to confirm the coordinate mapping scheme between the pupil’s center and the outward
correspond to the outward
scene, scene.
after which the Figure
wearable11gaze
depicts the
tracker canused calibration
correctly process
predict the of the
gaze points correspond
proposed wearable gaze tracker. The steps of the applied calibration process are described
to the outward scene. Figure 11 depicts the used calibration process of the proposed weara
as follows: ble gaze tracker. The steps of the applied calibration process are described as follows:

Figure 11. The used11.calibration
Figure process ofprocess
The used calibration the proposed wearable
of the proposed gaze tracker.
wearable gaze tracker.

At the first step,At
thethe
system collects
first step, the correspondence
the system between the between
collects the correspondence center point coor- point co
the center
dinates of the pupils captured
ordinates of theby the nearfield
pupils captured eye camera
by the andeye
nearfield thecamera
gaze points
and thesimulated
gaze points simu
by the outward lated
camera. Before
by the outwardcalibration, in addition
camera. Before to recording
calibration, thetopositions
in addition recordingof thethe
positions o
center points of the
the center
pupil points
movement,
of the the trajectory
pupil movement, of markers captured
the trajectory by thecaptured
of markers outward by the out
camera must also ward
be camera
recorded.mustFigure
also be12recorded.
shows the Figure 12 shows
trajectory the trajectory
diagram diagram
of tracking theof tracking
thethe
pupil’s center with pupil’s centereye
nearfield with the nearfield
image eye image
for calibration. for calibration.
In addition, FigureIn12addition,
reveals theFigure 12 re
veals the corresponding trajectory diagram of tracking four markers’ center points with
the outward image for calibration.

Figure 11. The used calibration process of the proposed wearable gaze tracker.

At the first step, the system collects the correspondence between the center point co-
ordinates of the pupils captured by the nearfield eye camera and the gaze points simu-
Appl. Sci. 2021, 11, 851 lated by the outward camera. Before calibration, in addition to recording the positions of
13 of 21
the center points of the pupil movement, the trajectory of markers captured by the out-
ward camera must also be recorded. Figure 12 shows the trajectory diagram of tracking
the pupil’s center with the nearfield eye image for calibration. In addition, Figure 12 re-
corresponding trajectory diagram
veals the corresponding of tracking
trajectory diagramfour markers’four
of tracking center points center
markers’ with the outward
points with
image for calibration.
the outward image for calibration.

Figure 12. Diagram of tracking pupil’s center with the nearfield eye camera for calibration.
Figure 12. Diagram of tracking pupil’s center with the nearfield eye camera for calibration.

InFigure
In Figure12, 12, the
the yellow
yellow line
line indicates
indicatesthe
thesequence
sequenceofofthethemovement
movement path
pathof of
gazing
gazingthe
markers, and the red rectangle indicates the range of the eye movement.
the markers, and the red rectangle indicates the range of the eye movement. In Figure 13, In Figure 13, when
the calibration
when the calibrationprocess is executed,
process the user
is executed, thegazes
user at the four
gazes markers
at the sequentially
four markers from ID
sequentially
1, IDID
from 2, 1,IDID3,2,toIDID3,4,toand
ID 4,the
andsystem records
the system the coordinates
records of theofpupil
the coordinates movement
the pupil movement path
Appl. Sci. 2021, 11, x FOR PEER REVIEW 14 of 21
while
path watching
while watching the gazing
the gazingcoordinates of the
coordinates of outward
the outwardscene withwith
scene markers. Then,
markers. the
Then, pro-
the
posed system
proposed system performs
performs thethe
coordinate
coordinateconversion
conversionto achieve the the
to achieve calibration effect.
calibration effect.

Figure13.
Figure 13.Diagram
Diagramof oftracking
trackingfour
fourmarkers’
markers’center
centerpoints
pointswith
withthe
theoutward
outwardcamera
camerafor
forcalibration
calibra-
tion ① When the user gazes at the marker ID1 for calibration. ② When the user
1 When the user gazes at the marker ID1 for calibration. 2 When the user gazes at the marker gazes at the ID2
marker
for ID2 for 3calibration.
calibration. ③ When
When the user gazesthe usermarker
at the gazes ID3
at the
formarker ID3 for4 calibration.
calibration. ④ When
When the user gazes at
the user gazes at the marker
the marker ID4 for calibration. ID4 for calibration.

The
Thepseudo-affine transformisisused
pseudo-affine transform usedfor
forthe
thecoordinate
coordinate conversion
conversion between
between thethe pupil
pupil and
and
the outward scene. The equation of the pseudo-affine conversion is expressed as follows: as
the outward scene. The equation of the pseudo-affine conversion is expressed
follows:
Xp
 
a1 a2 a3 a4   Yp,

X
= =


, (3)
(3)
Y a5 a6 a7 a8  X p Yp 
1 1
where(X, (X, Y) is the outward coordinate of the markers’ center point coordinate system,
where Y) is the outward coordinate of the markers’ center point coordinate system, and
and
X p , Yp is , the is the nearfield
nearfield eye coordinates
eye coordinates of the pupils’
of the pupils’ center center point coordinate
point coordinate system.system.
In (3),
In (3), the unknown transformation coefficients
the unknown transformation coefficients of a1 ~a8 need of to~
be need
solved tobybe solved
the by the
coordinate coor-
values
, carry
order to

dinate values
computed and computed and estimated
estimated from (X, Y) andfromX p ,(X,
Yp Y). Inand . In out
order
thetocalculation
carry out the
of
calculation
the transformationof the parameters
transformationof a1parameters of ~ Equation
~a8 for calibration, for calibration, Equation
(4) describes (4) de-
the pseudo-
scribes
affine the pseudo-affine
transformation transformation
with four-points with four-points
calibration, which is used calibration,
to solve thewhich is used to
transformation
solve the transformation coefficients of the coordinate conversion. The equation of the
pseudo-affine conversion with four-point calibration is expressed as follows:
1 0 0 0 0
⎡ ⎤
⎡ ⎤ 0 0 0 0 1 ⎡ ⎤

You can also read