Perceptual Font Manifold from Generative Model

Page created by Javier Henderson

IT & Technique

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Perceptual Font Manifold from Generative Model

Yuki Fujita, Haoran Xie, and Kazunori Miyata
Japan Advanced Institute of Science and Technology
Ishikawa, Japan

Figure 1: The perceptual manifolds of fonts. We show all manifolds of causal, formal and POP styles, and separated the manifold from
left to right. In this research, perceptual manifolds are obtained from the results of a perceptual study of the feature space of fonts. The
feature space is learned from the latent space of a generative model of the font generation.

Abstract—Though in recent times, various fonts are available
online for public usage, it is difficult and challenging to
generate, explore, and edit the fonts to meet the preferences of
all users. To address these issues, we propose in this paper, a
font manifold interface to visualize the perceptual adjustment
in the latent space of a generative model of fonts. In this
paper, we adopt the variational autoencoder network for font
generation. Then, we conducted a perceptual study on the
generated fonts from the multi-dimensional latent space of the
generative model. After we obtained the distribution data of
specific preferences, we utilized a manifold learning approach
to visualize the font distribution. As a case study of our Figure 2: Search results from Google Fonts.
proposed method, we developed a user interface for generated
font exploration in the designated user preference using a heat
map representation.
2 shows the user-searching interface of Google Fonts. In
Keywords-Character Fonts; Generative Model; Manifold
Learning; Perceptual Study; order to search for the desired fonts, the user is asked to
select different image features of font, the image features
include thickness, slant, and width. This exploring approach
I. I NTRODUCTION
is inefficient because humans font perceptions cannot easily
Character fonts play crucial role in the representation of be separated into specific features. For example, it is difficult
information. A user may choose the preferred types and to clarify which font features are the most appropriate for
shapes of fonts for special purposes. However, the fonts creating an appealing festival poster. The same issues also
used in festival posters and business documents may be arise in Font Squirrel where tags are used to display the font
entirely different from each other. Usually, the choice of features; moreover, understanding the meaning of these tags
fonts requires the skill and experience of a professional may be challenging.
designer as it is always challenging to choose the most Recent research presents the font manifold in two-
suitable fonts from online sources or font lists in document dimension (2D) by learning the nonlinear mapping from
files to meet the user’s perception need. In recent times, existing fonts [3]. In this approach, a casual user can easily
several kinds of character fonts such as Google Fonts [1] explore and edit fonts using the manifold with smooth
and Font Squirrel [2] are widely used. interpolations. However, it is still challenging to explore the
Exploring and editing to obtain the desired fonts from the fonts with specific perception because all the font features
font libraries are usually challenging to casual users. Figure are only geometrically displayed. To address this issue, in

Figure 3: The framework of our proposed perceptual font manifolds. In the offline process, we employed the VAE generative model
obtained the latent space of font generation. The generated result revealed that perceptual study in the latent space is explored for three
types of perception. We remark that the perceptual manifolds are obtained by the manifold learning approach. Finally, a casual user can
explore the desired fonts on the user interface proposed in the paper.

this paper, we propose a perceptual font manifold with • We propose a perceptual font manifold to meet specific
different perception purposes. Moreover, it is worthy of note perceptual requirement in font exploration and editing.
that the authors adopted the font matching with energy- • We provide a font-exploring user interface based on our
based optimization process in the research, which is time- proposed perceptual font manifolds, which is efficient
consuming and difficult to implement. Thanks to the rapid and user-friendly.
development of generative models based on deep learning
II. R ELATED W ORKS
approaches, with which we directly applied the latent space
from the generative model to construct the font manifold. In Generative Model and Character Fonts. Paul et al.
contrast to the previous work [3], the approach proposed in [4] proposed a neural network architecture, which creates
this paper is more straightforward and easier to implement. a character font dataset from a single input font image. This
dataset contains 62-letter fonts (from ”A” to ”Z”, ”a” to ”z”,
In this paper, we propose the perceptual font manifolds
and ”0” to ”9”). Samaneh et al. [5] proposed an end-to-end
from a generative model based on a deep learning approach
system, called Multi-Content GAN, which generates a set
of the capital letter ”A” fonts. The results of the perceptual
of font images using two networks; ”Glyph Network” and
studies in the latent space of the generative model were visu-
”Ornamentation Network”. Besides, this system can transfer
alized. Furthermore, we utilized the variational autoencoder
the typographic stylization as well as the textual stylization
(VAE) network for font generation. As an advantage of the
(color gradients and effects). Mehta et al. [6] proposed a
VAE model, the latent space is continuous such that similar
character recognition system using artificial neural network
images are classified close to each other; hence, enabling
and nearest neighbour approach from scanned images. We
the exploring of the perception distribution of fonts.
observed that researches that deal with the combination of
Besides, in this paper, based on the generation results a generative model and character fonts focus mainly on
of the VAE model, we propose a user interface for the the generation of character font images; however, they do
perceptual study. When the user changes the latent variables not put the perceptions of font images into consideration.
in the multi-dimensional space from the learning results, the Therefore, in this paper, we bridge the gap observed above
user interface displays the corresponding font image as an by considering the perception distribution in the latent space
output. In order to investigate perceptions of font images, of the generative model.
we surveyed using three perceptions of casual, formal, Character Fonts and Perception. Choi et al. [7] pro-
and POP types of fonts. Finally, based on the perceptual posed a system, which shows the character font correspond-
font manifolds presented in this paper, we propose a user ing to an input image. A large number of character fonts are
interface. The proposed interface is shown to be efficient and mapped on the semantic space in advance, and the system
easy to use in contrast to other widely used font searching examines which character font corresponds to the input
interfaces. image on the space. The research carried out in [7] dealt with
The main contributions of this research are as follows. a combination of character fonts and perception. Regarding

Figure 4: The architecture of our VAE network.

mapping on the semantic space, the authors did not consider IV. F ONT G ENERATION
the acquiring of the image feature values. However, in this This paper adopts the generative model for font genera-
paper, we utilized the generative model for mapping to the tion. We clarified the learning network; moreover, the details
latent space and considered the feature values of character of our dataset are as follows.
images.
Distribution of Character Fonts. Campbell et al. [3] A. Generative Model
visualized the distribution of character fonts using the man- In this paper, we utilized the VAE, which is the most
ifold. The user can move on this manifold with a mouse representative generative model in the deep learning field
and transform the font image continuously. Guo et al. [8] (see [9] and [10]). Figure 4 shows the network architecture
visualized Chinese font manifold. This system is learned of the VAE model employed in this paper. The number under
using the feature of shape and skeleton; hence, it can gener- each layer indicates the size of each array. The encoder
ate new fonts (not existing fonts). Furthermore, the systems has four convolutional layers, and a flatten layer, while the
display the distribution of the character fonts such that one decoder has a lambda layer and two convolutional layers.
can generate and search the fonts using that distribution. The lambda layer calculates the latent variables using a mean
However, they also do not consider any perception from and a deviation. For the implementation of the VAE model,
font images. In this paper, besides the display distribution, we used the Keras library [11].
we considered the perception.
B. Training Dataset
To construct the training dataset for the VAE network, we
III. S YSTEM OVERVIEW downloaded 2244 TrueType font files from Google Fonts.
Among all these files, the valid image data includes 2169
Figure 3 illustrates the framework of the system proposed PNG image files because 75 files could not be converted
in this paper. Contrary to the conventional generative model into image files. Then, we executed the data cleansing of
based on deep learning approaches, in our proposed system, the font images ready for machine learning.
the human effort is involved in the font generation processes. Figure 5 illustrates the data cleansing process. The goal
Therefore, our framework includes two main components: of the data cleansing process is to create 28 × 28 images for
machine computation and human involvement. our learning network. Similar to MNIST dataset, we utilized
At the machine computation stage, we adopt generative the same learning image sizes. Usually, the training image
network to generate the latent space of various fonts, while has a white margin part except for the font body. We erased
at the human involvement stage, we ask the participant to the white margin of font images and maintained the font
respond to font perception using the perceptual font user body as much as possible for all image data.
interface proposed in this paper. In this interface, the user can In the data cleansing process, we first converted the
adjust the latent variables to achieve the desired perception. downloaded TrueType font files into grayscale image files.
In the case study considered, we recorded all latent variables At the beginning of the process, the image files were set in a
related to causal, formal, and POP feelings. Furthermore, the resolution with 256 × 256 pixels and sufficiently large size.
perceptual study revealed that the perceptual manifold of Then, the rectangle-bounding box of the font is calculated.
fonts is obtained via manifold learning. Finally, we propose To convert the rectangular image into a square size, we
a user interface for font exploration, which is based on the added the white row (column) to the shorter side of the
approach proposed in this paper. rectangle alternately. Finally, we scaled the image to 28×28

Figure 5: Data Cleansing Process

Figure 7: The proposed user interface (a) used in our perceptual
study (b).

variables as shown in Figure 7 (b). Observe that each latent
Figure 6: Font examples in our training dataset. variable can be operated from 0 to 99. However, because
the latent space is calculated based on Gaussian distribution,
we adopt the percent point function and divide the variable
pixels using bilinear interpolation. Figure 6 shows examples range from 5% to 95% into one hundred equal parts. In our
of font images in our font training dataset. proposed user interface, the output font image corresponding
to the selected latent variable is displayed in real time.
V. P ERCEPTUAL F ONT M ANIFOLD
To classify the fonts into different user perceptions, we
In this section, we propose the perceptual font manifold
adopted three styles of font perception in this paper, includ-
considering the user perception in the font generation and
ing POP, formal, and casual styles. Whenever the user feels
exploration. To effectively analyze the user perception of
the right perception from the font image, the user was asked
font images, we conducted a perceptual study of font styles
to click the perception buttons on the user interface, and the
based on the generated results from the VAE network. In this
system will save the selected latent variables. To achieve a
study, the participants can choose the latent variables with
good starting point, the user can also click the ”Changing a
the output of the font image generated in the latent space.
Starting Font” button to change all latent variables randomly.
The latent space will have enough information if the latent
dimensions are high. However, it is difficult to obtain data The purpose of this perceptual study is to classify the
from the perceptual study in high-dimensional latent space. font images into three designated user perceptions. At the
Considering the trade-off between the computation cost and beginning of the perceptual study, we showed examples of
the perception evaluation load, in our study, we chose five- feature font of three user perceptions as shown in Figure
dimensional latent space. 8. Besides, we invited 17 graduate students to join our
perceptual study. The survey time was limited to five minutes
A. Perceptual Study because the user may feel exhausted to repeat the same task
Figure 7 (a) shows the user interface for the perceptual for a long time period. Finally, a total of 884 latent variable
study. The participants were asked to select their favorite sets were collected including 273 POP, 311 formal, and 298
fonts using the five sliders to change the values of the latent casual styles.

(1)POP (2)Formal (3)Casual
Figure 8: Feature Examples

Figure 10: Our proposed user interface using the perceptual font
manifold.

interface in our case study, which is implemented as a Web
application for easy usage.
The proposed user interface is simple and easy to use.
The interface shows the corresponding font image on the
upper-left window continuously while moving the control
point on the heatmap. To help choose the font in different
user perceptions, the user can select the perception buttons
on the right side of the user interface (All, POP, Formal,
and Casual). The corresponding heatmap (Figure 1), from
a different perspective is presented to the end user. When
the user finds the desired font image, the user can click
on it, and the modal window will be displayed for further
Figure 9: Results of dimensionality reduction from our perceptual confirmation.
study. Blue points indicate the POP style, green points denote
formal style, and yellow points for causal style of fonts. tSNE-
1 and tSNE-2 are the dimensions of the reduced distribution data.
VI. C ASE S TUDY
A. Comparison Study
B. Manifold Learning In our case study, we compared the user interface pro-
The outcome of the perception study of font images posed here with the traditional user interface for font search-
revealed that one could obtain distribution data in a latent ing as used in online font libraries such as Font Squirrel.
space of five dimensions. To reduce the distribution data in Note that we only focused on ”A” font in this research
two dimensions, we adopted a manifold learning approach. for simplicity purpose. Figure 11 shows the traditional user
In this paper, we utilized the t-distributed Stochastic Neigh- interface for font exploration, which displays 1592 font
bor Embedding (tSNE) method for model reduction [12]. images from the generation results of the VAE network. All
Figure 9 shows the result of the dimensionality reduction of these font images are arranged in 10 columns and 160 rows
the distribution data from our perceptual study. with the scrollbar in a web application.
To visualize the two dimensional distribution data from Our case study, we randomly selected 10 target font
the manifold learning, we adopted the kernel density es- images from the generated results as shown in Figure 12.
timation method to obtain a heatmap representation of In both the traditional and the proposed user interfaces, the
the perceptual manifold of fonts [13]. Consequently, we target font images are located on the upper-left side of the
achieved the heatmaps of all perception styles; POP, formal, interface windows. The participants were asked to look for
and casual styles of fonts as shown in Figure 1. The high- the target font images on both user interfaces and click on
density area is displayed in red color, while the low-density the explored position with a confirmation window. Figure
area is displayed in blue color. 12 shows the distribution of target images on the proposed
user interface.
C. User Interface We conducted the comparison study with 20 graduate
Using the perceptual manifold of fonts introduced in this students and randomly divided them into two groups; group
paper, we propose a user interface for font exploration from 1 and group 2. The members in group 1 were asked to use
the VAE generative network. Figure 10 shows the user the traditional user interface first, and after that, to use the

Figure 11: Traditional user interface for font exploration in a
comparison study.

Figure 13: Comparison of SSIM scores between traditional (red)
and our proposed user interface (green).

Figure 12: Test dataset used in our case study and their distribu-
tions in the manifold of fonts. Figure 14: Comparison of time usage between traditional (red)
and our proposed user interface (green).

proposed interface (Figure 10). The members in group 2
were asked to use the proposed user interface first, and then where x and y are pixel positions on two font images. µx
the traditional user interface. and µy denote mean pixel values, σx and σy are standard
deviations of pixel value on two images, and σxy is the
covariance value. C1 = 6.5 and C2 = 58.5 are constant
B. Discussion values. SSIM score is expressed between 0 and 1, and the
In the case study carried out in this paper, regarding the closer to 1, the higher the similarity.
usage of the two user interface considered here, we com- Figure 13 shows the comparison results of exploration
pared the exploration accuracy and time cost. Regarding the accuracy in SSIM score using the traditional and our pro-
exploration accuracy, we used structural similarity (SSIM) posed user interface. Regarding SSIM scores, there is no
to quantify the similarity between two images [14]. In our apparent difference between the two interfaces. In details,
research, SSIM scores are calculated between the target font the user interface proposed in this paper achieved a little
images and the explored images from user operation. The higher median and maximum values in SSIM scores than
formulation of SSIM score is given as follows. the traditional interface.
Figure 14 shows the comparison results of time cost
(2µx µy + C1 )(2σxy + C2 ) in seconds in using the traditional and our proposed user
SSIM (x, y) = (1)
(µ2x + µ2y + C1 )(σx2 + σy2 + C2 ) interface. Regarding the average of times cost, the partici-

Figure 15: Explorations of font manifolds with three perception style demonstrating our matching results.

VII. R ESULTS
The proposed system was implemented on a desktop
computer with ubuntu 16.04 LTS, Intel Core i7-7700 CPU
@ 3.60GHz ×8, GeForce GTX 1060 3GB GPU, and pro-
grammed in Python 3.5 with CUDA 8.0, CuDNN 6.0 and
Keras-gpu 2.1.6. The utilized VAE model was trained for 50
epochs.
Figure 15 shows the exploration results of fonts in three
perceptual font manifolds. For POP perception of font im-
ages, it is clear that these fonts are in bold and slant styles,
whereas they are all in slim shape without slant styles in
the formal perception. Also, for casual perception, there is
an apparent feature of circular curves in the font styles. All
these exploration results agree well with the common sense
of these user perceptions as shown in Figure 8 (examples of
Figure 16: Application of our results with mapping from A to Z images).
of test words. Though the perception study conducted in this research
only handles ”A” fonts for simplicity, we can map the
explored fonts into all character fonts. Figure 16 shows the
text application of our results from ”A” to ”Z” which are
obtained by matching the closest SSIM scores of ”A” fonts
pants operated 1.7 times slower in the traditional interface in existing fonts to our generated font manifolds. The star
(97 seconds) than in our proposed interface (54 seconds). points on the left figures of the font manifolds denote the
To confirm the difference between the two interfaces, we font positions. We observed that the perception of the three
utilized the student’s t-test to verify the time cost. The result font styles is apparently correct for common users.
revealed a score of 0.71 %, which shows that there is a
significant difference between the two interfaces. Hence, the VIII. C ONCLUSION
user interface proposed in this paper is more efficient than In this paper, we proposed the perceptual font manifolds
the traditional interface. using latent space from a generative model and the percep-

tual study carried out in this research. For the generative [7] Saemi Choi, Kiyoharu Aizawa, and Nicu sebe: FontMatcher:
model, we employed the VAE network architecture with five Font Image Paring for Harmonious Digital Graphic Design,
latent dimensions of ”A” dataset, whereas for the perceptual ACM, pp.37–41 (2018).
study, we obtained the distribution data of POP, formal, [8] Yuan Guo, Zhouhui Lian, Yungmin Tang, and Jianguo Xiao:
and casual styles of a font in the latent space. We reduced Creating New Chinese Fonts Based on Manifold Learning
the data dimensions into two dimensions using manifold and Adversarial Networks, EUROGRAPHICS (2018).
learning and made the heat maps of each distribution using
kernel density estimation method. Finally, we proposed a [9] Diederik P. Kingma, Max Welling: Auto-Encoding Variational
Bays, In Proceeding of the International Conference on Learn-
user interface for font exploration using our perceptual font ing Representations (ICLR). (2014).
manifolds.
In the case study of this research, the participants were [10] Danilo J. Rezende, Shakir Mohamed, and Daan Wierstra:
asked to explore the target font images using the traditional Stochastic Back-propagation and Approximate Inference in
font exploration interface and the user interface proposed in Deep Generative Models, Technical report, arXiv: 1401.4082.
(2014).
this paper. It was verified that our proposed user interface
requires less time cost while keeping good exploration [11] Keras: The Python Deep Learning Library (online),
accuracy. Furthermore, the exploration results of different https://keras.io/
perception styles revealed that our proposed system could
achieve the desired style of fonts suitable for not only ”A” [12] Laurens van der Maaten, Geofferey Hinton: Visualizing Data
using t-SNE, Journal of machine Learning Research 9,
font but also all character fonts. pp.2579–2605. (2008).
As limitations of this research, the system proposed in this
paper only handles ”A” fonts. Though we have matched all [13] B. W. Silverman: Density estimation for statistics and data
the fonts by finding the closest position in ”A” font, the analysis, (1986).
accuracy for other character fonts may be lost. In order
[14] Zhou Wang, Alan Conrad Bovik, Hamid Rahim Sheikh and
to address this issue, the supervised font style transfer Euro P. Simoncelli: Image Quality Assessment: From Error
employed in [4] should be uitlized. Besides, there are many Visibility to Structural Similarity, IEEE Transaction on Image
fonts defined in the shapes of outline fonts represented by Processing, Vol.13, No.4. (2004).
TrueType font. The outline font expresses the font data as the
curve parameters and responds appropriately to scaling. We
also considered the exploration of the perceptual manifold
of the outline fonts.
As possible future research, the concept of the perceptual
study of generative models may be applied to other research
targets such as human faces and foods.
R EFERENCES
[1] Google Fonts (online),
https://fonts.google.com/

[2] FONT SQUIRREL (online),
https://www.fontsquirrel.com/

[3] Neill D.F. Campbell, Jan Kautz: Learning a Manifold of
Fonts, ACM Transactions on Graphics, Vol 33, 4 (2014).

[4] Paul Upchurch, Noah Snavely, and Kavita Bala: From A to Z:
Supervised Transfer of Style and Content Using Deep Neural
Network Generators, arXiv: 1603.02003 (2016).

[5] Samaneh Azadi, Matthew Fisher, Vladimir Kim, Zhaowen
Wang, Eli Shechtman, and Trevor Darrell: Multi-Content
GAN for Few-Shot Font Style Transfer, arXiv: 1712.00516
(2017).

[6] H. Mehta, S. Singla and A. Mahajan: Optical character recog-
nition (OCR) system for Roman script English language us-
ing Artificial Neural Network (ANN) classifier, International
Conference on Research Advances in Integrated Navigation
Systems (RAINS), Bangalore, 2016.