Density Separation with Tensor Factorization

8 July 2024 | SIAM AN24 | Spokane, WA

Nicholas Richardson

Department of Mathematics

Naomi Graham

Department of Computer Science

Michael P. Friedlander

Departments of Computer Science & Mathematics

Joel Saylor

Department of Earth, Ocean and Atmospheric Sciences

Acknowledgments

Registration and travel support for this presentation was provided by the Society for Industrial and Applied Mathematics.

Research was funded with the help of the University of British Columbia (UBC) and the Natural Sciences and Engineering Research Council of Canada (NSERC).

Physical Setting

Few source locations upstream a river
- Each source has their own distribution of minerals
Rocks from these sources are mixed by the river and deposited downstream
Take scoops of rocks at locations downstream, called “sinks”
- Sinks are different mixtures of rocks from the upstream sources

Modeling the Physical Problem

Grains of rock from sink \(i\) are samples (random vectors) from one of the sources \(r\) with probability \(a_{ir}\)
Goal: Estimate probabilities \(a_{ir}\) and source distributions \(\mathbf b_r\)

Why do we care?

With these rocks…

Measure amount of rare-earth elements
Use radioisotope dating (U-Pb)
Element ratios

Geology:
- Earth’s crust thickness
- Tectonic plate movement
- Chronostratigraphy (dating)
Mining:
- Locate rare-earth elements
- Minimal exploration

Mathematical Setting

Source separation / signal decomposition
- have different mixtures \(\mathbf{y}_i\) of (the same) sources \(\mathbf{b}_r\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\quad\qquad \mathbf{y}_i = \sum_{r=1}^R a_{ir} \mathbf{b}_r \qquad i=1, 2, \dots, I\)
- Goal 1: estimate the sources \(\mathbf{b}_r\)
- Goal 2: estimate the contribution of each source \(a_{ir}\)
In our case, have the constraints:
- signals \(\mathbf{y}_i\) and sources \(\mathbf{b}_r\) are probability densities
- mixture coefficients \(a_{ir} \geq 0\) and \(\sum_{r=1}^R a_{ir} = 1\)

Other source separation applications

Spatial Transcriptomics:
- uncovering cell types from gene expressions
Musical Decomposition:
- separating audio into individual instruments
Other source separation problems
- need to model each source as some “rank-\(1\)” term
- find the transformation of your data where it is linearly decomposable

The Method

Density Estimation

Often cannot measure the mixture densities \(\mathbf{y}_i\) directly
Estimate from samples via kernel density estimation (KDE)¹
Unlike Gaussian mixture models, sources \(\mathbf{b}_r\) can be arbitrary

Higher Order KDE & Tensors

If independent features
- store discretized \(1\)-dim KDEs for each sink \(i\) and feature \(j\) in \(\mathbf{Y}_{i j :}\)
Doing it this way, reduces the size of the data
- \(J\)-dim KDEs¹ (\(I\) of them) is more expensive to compute than \(IJ\) cheeper 1-dim KDEs
- \(IK^J\) to \(IJK\) tensor entries

Higher Order KDE & Tensors

Contributions / advancements

Perform this source seperation on all features jointly
Model that scales well to arbitrary number of features \(J\)
Size of problem is independent of number of rocks collected

Tucker-1 Decomposition

A generalization of rank-\(R\) matrix factorization¹

\(\mathbf{Y} = \mathbf{A}\mathbf{B}\)

\(Y_{ijk} = \sum_{r=1}^R A_{ir}B_{rjk}\)

\(\mathbf{Y}_i = \sum_{r=1}^R A_{ir}\mathbf{B}_{r}\)

Source separation model

\[ \min_{\mathbf A, \mathbf B} \left\{ \ell(\mathbf A, \mathbf B):=\tfrac{1}{2}\left\lVert \mathbf Y-\mathbf A \mathbf B \right\rVert_{F}^2 \,\middle\vert\, \mathbf A \in \Delta_{\mathbf A},\ \mathbf B \in \Delta_{\mathbf B} \right\} \]

\(\left\lVert \mathbf \cdot \right\rVert_{F}^2\) squared Frobenius norm (sum-of-squares)

\(\Delta_{\mathbf A}\) non-negative entries & rows sum to 1

\(\Delta_{\mathbf B}\) non-negative entries & fibres \(\mathbf B_{ij:}\) sum to 1

Nonnegativity implies it’s NP hard¹ ☹️
Simplex implies a bounded feasible set 😊
Not convex, but block-convex and smooth

Block Coordinate Descent

Alternatly update \(\mathbf{A}\) and \(\mathbf{B}\) via projected gradient descent¹
- \(\mathbf A^{t+1} = P_{\Delta_{\mathbf A}}(\mathbf A^{t} - \frac{1}{L_{\mathbf A}} \nabla_{\mathbf A} \ell(\mathbf A^t, \mathbf B^t))\)
- \(\mathbf B^{t+1} = P_{\Delta_{\mathbf B}}(\mathbf B^{t} - \frac{1}{L_{\mathbf B}} \nabla_{\mathbf B} \ell(\mathbf A^{t+1}, \mathbf B^t))\)
Convergence to nash equilibria \((\mathbf A^*, \mathbf B^*)\)
- block-wise min: \(\ell(\mathbf A^*, \mathbf B^*) \leq \min\left(\ell(\mathbf A^*, \mathbf B), \ell(\mathbf A, \mathbf B^*)\right)\)
Also stationary: \(0 \in \partial(\ell + \delta_{\Delta_{\mathbf A} \times \Delta_{\mathbf B}})(\mathbf A^*, \mathbf B^*)\)

Results: Sediment Source Separation

Other Applications

Application 2: Spatial Transcriptomics

ID cell types based on gene expressions of cells spatially

Application 2: Spatial Transcriptomics

Application 3: Music Decomposition

Separate audio mixture into guitar and flute with STFT magnitude

flute (above), guitar (below)

guitar

Application 3: Music Decomposition

amplitude of low rank factors in time

frequency spectrum of low rank factors

Summary

Combine KDE with Tucker-\(1\) factorization into a scalable nonparametric density decomposition method
Algorithm converges to block-minimum and stationary point
- Open-source Julia code on GitHub: MatrixTensorFactor.jl
Practical model applicable to many areas
- geology, music, genetics etc.

References

Chen Y-C (2017) A tutorial on kernel density estimation and recent advances. Biostatistics & Epidemiology 1:161–187. https://doi.org/10.1080/24709360.2017.1396742

Graham N, Richardson N, Friedlander MP, Saylor J (2024) Tracing Sedimentary Origins in Multivariate Geochronology via Constrained Tensor Factorization. Preprint

Kolda TG, Bader BW (2009) Tensor Decompositions and Applications. SIAM Rev 51:455–500. https://doi.org/10.1137/07070111X

O’Brien TA, Kashinath K, Cavanaugh NR, et al (2016) A fast and objective multidimensional kernel density estimation method: fastKDE. Computational Statistics & Data Analysis 101:148–160. https://doi.org/10.1016/j.csda.2016.02.014

Richardson N, Graham N, Friedlander MP (2024a) MatrixTensorFactor.jl. GitHub

Richardson N, Graham N, Friedlander MP, Saylor J (2024b) SedimentAnalysis.jl. GitHub

Vavasis SA (2010) On the Complexity of Nonnegative Matrix Factorization. SIAM Journal on Optimization 20:1364–1377. https://doi.org/10.1137/070709967

Xu Y, Yin W (2013) A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion. SIAM J Imaging Sci 6:1758–1789. https://doi.org/10.1137/120887795

Additional Slides

Estimating the rank

Try many ranks \(R\)
Compare the objective as a function of the rank \(\left\lVert \mathbf Y-\mathbf A \mathbf B \right\rVert_{F}^2\)

Rescaling trick

Relax constraints to
- \(\mathbf A \geq 0\), \(\mathbf B \geq 0\)
- \(\frac{1}{J}\sum_{jk} B_{rjk} = 1\) for all \(r\) (vs \(\sum_{k} B_{rjk} = 1\) for all \(r,j\))
Updates now look like:
- \(\mathbf A^{t+1/2} = (\mathbf A^{t} - \frac{1}{L_{\mathbf A}} \nabla_{\mathbf A} \ell(\mathbf A^t, \mathbf B^t))_+\)
- \(\mathbf B^{t+1/2} = (\mathbf B^{t} - \frac{1}{L_{\mathbf B}} \nabla_{\mathbf B} \ell(\mathbf A^{t+1/2}, \mathbf B^t))_+\)
- \(\mathbf B^{t+1} =\mathbf C^{-1} \mathbf B^{t+1/2}\) and \(\mathbf A^{t+1} = \mathbf A^{t+1/2} \mathbf C\)
- where \(C_{rr} = \frac{1}{J}\sum_{jk} B_{rjk}\)

Rescaling vs simplex projection

Compare \(\text{dist}\left(0, \partial(\ell + \delta_{\geq 0})(\mathbf A, \mathbf B)\right)\) every iteration

What features did we look at?

age
Eu anomaly
Ti-based crystallization temperature
Th-U ratio
sum of light rare-earth elements over heavy rare-earth elements (ΣLREE/ΣHREE)
Dy-Yb ratio
normalized (Ce/ND)/Y ratio

Convergence Details

when iterates are bounded iterates
- there are limit points
- know this because feasible set is bounded
when the objective function is KL
- sequence of iterates converges to a finite limit point