HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projections

Page created by Leslie Curry

Finance

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projections

Ines Chami * 1 Albert Gu * 1 Dat Nguyen * 1 Christopher Ré 1

Abstract neighbor search (Krauthgamer & Lee, 2006; Wu & Charikar,
2020), hierarchical clustering (Monath et al., 2019; Chami
This paper studies Principal Component Anal-
et al., 2020a), or dimensionality reduction which is the focus
ysis (PCA) for data lying in hyperbolic spaces.
of this work.
Given directions, PCA relies on: (1) a parameteri-
zation of subspaces spanned by these directions, Euclidean Principal Component Analysis (PCA) is a fun-
(2) a method of projection onto subspaces that damental dimensionality reduction technique which seeks
preserves information in these directions, and (3) directions that best explain the original data. PCA is an
an objective to optimize, namely the variance ex- important primitive in data analysis and has many important
plained by projections. We generalize each of uses such as (i) dimensionality reduction (e.g. for mem-
these concepts to the hyperbolic space and pro- ory efficiency), (ii) data whitening and pre-processing for
pose H ORO PCA, a method for hyperbolic dimen- downstream tasks, and (iii) data visualization.
sionality reduction. By focusing on the core prob-
Here, we seek a generalization of PCA to hyperbolic geom-
lem of extracting principal directions, H ORO PCA
etry. Given a core notion of directions, PCA involves the
theoretically better preserves information in the
following ingredients:
original data such as distances, compared to pre-
vious generalizations of PCA. Empirically, we 1. A nested sequence of affine subspaces (flags) spanned
validate that H ORO PCA outperforms existing di- by a set of directions.
mensionality reduction methods, significantly re- 2. A projection method which maps points to these sub-
ducing error in distance preservation. As a data spaces while preserving information (e.g. dot-product)
whitening method, it improves downstream clas- along each direction.
sification by up to 3.9% compared to methods 3. A variance objective to help choose the best directions.
that don’t use whitening. Finally, we show that
The PCA algorithm is then defined as combining these prim-
H ORO PCA can be used to visualize hyperbolic
itives: given a dataset, it chooses directions that maximize
data in two dimensions.
the variance of projections onto a subspace spanned by
those directions, so that the resulting sequence of directions
optimally explains the data. Crucially, the algorithm only de-
1. Introduction pends on the directions of the affine subspaces and not their
Learning representations of data in hyperbolic spaces has locations in space (Fig. 1a). Thus, in practice we can assume
recently attracted important interest in Machine Learning that they all go through the origin (and hence become linear
(ML) (Nickel & Kiela, 2017; Sala et al., 2018) due to their subspaces), which greatly simplifies computations.
ability to represent hierarchical data with high fidelity in low Generalizing PCA to manifolds is a challenging problem
dimensions (Sarkar, 2011). Many real-world datasets ex- that has been studied for decades, starting with Principal
hibit hierarchical structures, and hyperbolic embeddings Geodesic Analysis (PGA) (Fletcher et al., 2004) which pa-
have led to state-of-the-art results in applications such rameterizes subspaces using tangent vectors at the mean of
as question answering (Tay et al., 2018), node classifica- the data, and maximizes distances from projections to the
tion (Chami et al., 2019), link prediction (Balazevic et al., mean to find optimal directions. More recently, the Barycen-
2019; Chami et al., 2020b) and word embeddings (Tifrea tric Subspace Analysis (BSA) method (Pennec, 2018) was
et al., 2018). These developments motivate the need for introduced. It finds a more general parameterization of
algorithms that operate in hyperbolic spaces such as nearest nested sequences of submanifolds by minimizing the unex-
* plained variance. However, both PGA and BSA map points
Equal contribution 1 Stanford University, CA, USA. Correspon-
dence to: Ines Chami . onto submanifolds using closest-point or geodesic projec-
tions, which do not attempt to preserve information along
Proceedings of the 38 th International Conference on Machine principal directions; for example, they cannot isometrically
Learning, PMLR 139, 2021. Copyright 2021 by the author(s).