We introduce SG-t-SNE-, a high-performance software for swift embedding of a large, sparse, stochastic graph into a -dimensional space () on a shared-memory computer. The algorithm SG-t-SNEÂ and the software t-SNE-Â were first described in Reference [1]. The algorithm is built upon precursors for embedding a -nearest neighbor (NN) graph, which is distance-based and regular with constant degree . In practice, the precursor algorithms are also limited up to 2D embedding or suffer from overly long latency in 3D embedding. SG-t-SNE removes the algorithmic restrictions and enables -dimensional embedding of arbitrary stochastic graphs, including, but not restricted to, NN graphs. SG-t-SNE- expedites the computation with high-performance functions and materializes 3D embedding in shorter time than 2D embedding with any precursor algorithm on modern laptop/desktop computers.
The original t-SNEÂ [2] has given rise to several variants. Two of the variants, t-SNE-BHÂ [3] and FIt-SNEÂ [4], are distinctive and representative in their approximation approaches to reducing algorithmic complexity. They are, however, limited to NN graph embedding. Specifically, at the user interface, a set of data points, , is provided in terms of their feature vectors in an -dimensional vector space equipped with a metric/distance function. The input parameters include for the embedding dimension, for the number of near-neighbors, and for the perplexity. A t-SNE algorithm maps the data points to data points in a -dimensional space.
There are two basic algorithmic stages in a conventional t-SNE algorithm. In the preprocessing stage, the NN graph is generated from the feature vectors according to the metric function and input parameter . Each data point is associated with a graph vertex. Next, the NN graph is cast into a stochastic one, , and symmetrized to ,
where is the binary-valued adjacency matrix of the NN graph, with zero diagonal elements (i.e., the graph has no self-loops), and is the distance between and . The Gaussian parameters are determined by the point-wise equations related to the same perplexity value ,
The next stage is to determine and locate the embedding coordinates by minimizing the Kullback-Leibler divergence
where matrix is made of the ensemble regulated by the Student t-distribution,
In other words, the objective of (3) is to find the optimal stochastic matching between and defined, respectively, over the feature vector set and the embedding coordinate set . The optimal matching is obtained numerically by applying the gradient descent method. A main difference among the precursor algorithms lies in how the gradient of the objective function is computed.
The computation per iteration step is dominated by the calculation of the gradient. Van der Maaten reformulated the gradient into two terms [3]:
The attractive interaction term can be cast as the sum of matrix-vector products with the sparse matrix . The vectors are composed of the embedding coordinates, one in each dimension. The repulsive interaction term can be cast as the sum of matrix-vector products with the dense matrix . For clarity, we simply refer to the two terms as the term and the term, respectively.
The (repulsion) term is in fact a broad-support, dense convolution with the Student t-distribution kernel on non-equispaced, scattered data points. As the matrix is dense, a naive method for calculating the term takes arithmetic operations. The quadratic complexity limits the practical use of t-SNE to small graphs. Two types of existing approaches reduce the quadratic complexity to , they are typified by t-SNE-BH and FIt-SNE. The algorithm t-SNE-BH, introduced by van der Maaten [3], is based on the Barnes-Hut algorithm. The broad-support convolution is factored into convolutions of narrow support, at multiple spatial levels, each narrowly supported algorithm takes operations. FIt-SNE, presented by Linderman et al. [4], may be viewed as based on non-uniform fast Fourier transforms. The execution time of each approximate algorithm becomes dominated by the (attraction) term computation. The execution time also faces a steep rise from 2D to 3D embedding.
With the algorithm SG-t-SNE we extend the use of t-SNE to any sparse stochastic graph . The key input is the stochastic matrix , , associated with the graph, where is not restricted to the form of (1). We introduce a parametrized, non-linear rescaling mechanism to explore the graph sparsity. We determine rescaling parameters by
where is an input parameter and is a monotonically increasing function. We set in the present version of SG-t-SNE-. Unlike (2), the rescaling mechanism (6) imposes no constraint on the graph, its solution exists unconditionally. For the conventional t-SNE as a special case, we set by default. One may still make use of and exploit the benefit of rescaling ().
With the implementation SG-t-SNE-, we accelerate the entire gradient calculation of SG-t-SNE and enable practical 3D embedding of large sparse graphs on modern desktop and laptop computers. We accelerate the computation of both and terms by utilizing the matrix structures and the memory architecture in tandem.
The matrix in the attractive interaction term of (5) has the same sparsity pattern as matrix , regardless of iterative changes in . Sparsity patterns are generally irregular. Matrix-vector products with irregular sparse matrix invoke irregular memory accesses and incur non-equal, prolonged access latencies on hierarchical memories. We moderate memory accesses by permuting the rows and columns of matrix such that rows and columns with similar nonzero patterns are placed closer together. The permuted matrix becomes block-sparse with denser blocks, resulting in better data locality in memory reads and writes.
The permuted matrix is stored in the Compressed Sparse Blocks (CSB) storage format [5]. We utilize the CSB routines for accessing the matrix and calculating the matrix-vector products with the sparse matrix . The elements of the matrix are formed on the fly during the calculation of the attractive interaction term.
We factor the convolution in the repulsive interaction term of (5) into three consecutive convolutional operations. We introduce an internal equispaced grid within the spatial domain of the embedding points at each iteration, similar to the approach used in FIt-SNEÂ [4]. The three convolutional operations are:
S2G
: Local translation of the scattered (embedding) points to their
neighboring grid points.
G2G
: Convolution across the grid with the same t-distribution kernel
function, which is symmetric, of broad support, and aperiodic.
G2S
: Local translation of the gridded data to the scattered points.
The G2S
operation is a gridded interpolation and S2G
is its
transpose; the arithmetic complexity is , where is the
interpolation window size per side. Convolution on the grid takes
arithmetic operations, where is the
number of grid points, i.e., the grid size. The grid size is determined
by the range of the embedding points at each iteration, with respect to
the error tolerance set by default or specified by the user. In the
current implementation, the local interpolation method employed by
SG-t-SNE- is accurate up to cubic polynomials in separable
variables ().
Although the arithmetic complexity is substantially reduced in comparison to the quadratic complexity of the direct way, the factored operations suffer either from memory access latency or memory capacity issues, which were not recognized or resolved in existing t-SNE software. The scattered translation incurs high memory access latency. The aperiodic convolution on the grid suffers from excessive use of memory when the grid is periodically extended in all sides at once by zero padding. The exponential memory growth with limits the embedding dimension or the graph size.
We resolve these memory latency and capacity issues in SG-t-SNE-.
Prior to S2G
, we relocate the scattered data points to the grid bins.
This binning process has two immediate benefits. It improves data
locality in the subsequent interpolation. It also establishes a data
partition for parallel, multi-threaded execution of the scattered
interpolation. We omit the parallelization details. For G2G
, we
implement aperiodic convolution by operator splitting, without using
extra memory.
In sparse or structured matrix computation of arithmetic complexity, the execution time is dominated by memory accesses. We have described in the previous sections how we use intra-term permutations to improve data locality and reduce memory access latency in computing the attraction and the repulsion terms of (5). In addition, we permute and relocate in memory the embedding data points between the two terms, at every iteration step. The inter-term data relocation is carried out at multiple layers, exploiting block-wise memory hierarchy. The data permutation overhead is well paid-off by the much shortened time for arithmetic calculation with the permuted data. We use in the software name SG-t-SNE- to signify the importance and the role of the permutations in accelerating t-SNE algorithms, including the conventional one, and enabling 3D embeddings.
Supplementary material and performance plots are found at http://t-sne-pi.cs.duke.edu.
[1] N. Pitsianis, A.-S. Iliopoulos, D. Floros, and X. Sun. Spaceland embedding of sparse stochastic graphs. In Proceedings of IEEE High Performance Extreme Computing Conference, 2019. (To appear.)
[2] L. van der Maaten and G. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research 9(Nov):2579–2605, 2008.
[3] L. van der Maaten. Accelerating t-SNE using tree-based algorithms. Journal of Machine Learning Research 15(Oct):3221–3245, 2014.
[4] G. C. Linderman, M. Rachh, J. G. Hoskins, S. Steinerberger, and Y. Kluger. Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data. Nature Methods 16(3):243–245, 2019.
[5] A. Buluç, J. T. Fineman, M. Frigo, J. R. Gilbert, and C. E. Leiserson. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In Proceedings of Annual Symposium on Parallelism in Algorithms and Architectures, pp. 233–244, 2009.
SG-t-SNE- is developed for shared-memory computers with
multi-threading, running Linux or macOS operating system. The source
code is (to be) compiled by a C++ compiler supporting Cilk. The
current release is tested with the GNU g++
compiler 7.4.0 and the
Intel icpc
compiler 19.0.4.233.
SG-t-SNE- uses the following open-source software:
On Ubuntu:
sudo apt-get install libtbb-dev libflann-dev libmetis-dev libfftw3-dev doxygen
On macOS:
sudo port install flann tbb metis fftw-3 doxygen
To generate the SG-t-SNE- library, test and demo programs:
./configure
make all
To specify the C++
compiler:
./configure CXX=<compiler-executable>
To test whether the installation is successful:
bin/test_modules
To generate the documentation:
make documentation
SG-t-SNE- supports the conventional t-SNE algorithm, through a set of preprocessing functions. Issue
make tsnepi
cp bin/tsnepi <BHTSNEPATH>/bh_tsne
to generate the bin/tsnepi
binary, which is fully compatible with the existing wrappers provided by van der Maaten [3], and replace the bh_tsne
binary. <BHTSNEPATH>
in the installation path of bhtsne
.
To compile the SG-t-SNE- MATLAB wrappers, use the
--enable-matlab
option in the configure
command. The default
MATLAB installation path is /opt/local/matlab
; otherwise, set
MATLABROOT
:
./configure --enable-matlab MATLABROOT=<matlab-path>
We provide two data sets of modest size for demonstrating stochastic graph embedding with SG-t-SNE-:
tar -xvzf data/mobius-graph.tar.gz
bin/demo_stochastic_matrix mobius-graph.mtx
tar -xvzf data/pbmc-graph.tar.gz
bin/demo_stochastic_matrix pbmc-graph.mtx
The MNIST data set can be tested using existing wrappers provided by van der Maaten [3].
The SG-t-SNE- library is licensed under the GNU general public license v3.0. To contribute to SG-t-SNE- or report any problem, follow our contribution guidelines and code of conduct.
Design and development:
Nikos Pitsianis1,2, Dimitris Floros1,
Alexandros-Stavros Iliopoulos2, Xiaobai
Sun2
1Â Department of Electrical and Computer Engineering,
Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
2Â Department of Computer Science, Duke University, Durham, NC
27708, USA