Physical Setting

  • Source locations have their own distribution of minerals
  • Rocks from these sources are mixed & deposited downstream
  • Take scoops of rocks at locations downstream, called “sinks”

Modeling the Physical Problem

  • Grains of rock from sink \(i\) are samples (random vectors) from one of the sources \(r\) with probability \(a_{ir}\)
  • Goal: Estimate probabilities \(a_{ir}\) and source distributions \(\mathbf b_r\)

Why do we care?

With these rocks…

  • Measure amount of rare-earth elements
  • Use radioisotope dating (U-Pb)
  • Element ratios
  • Geology:
    • Earth’s crust thickness
    • Tectonic plate movement
    • Chronostratigraphy (dating)
  • Mining:
    • Locate rare-earth elements
    • Minimal exploration

Mathematical Setting

  • Source separation / signal decomposition

    • have different mixtures \(\mathbf{y}_i\) of (the same) sources \(\mathbf{b}_r\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\quad\qquad \mathbf{y}_i = \sum_{r=1}^R a_{ir} \mathbf{b}_r \qquad i=1, 2, \dots, I\)

    • Goal 1: estimate the sources \(\mathbf{b}_r\)

    • Goal 2: estimate the contribution of each source \(a_{ir}\)

  • In our case, have the constraints:

    • signals \(\mathbf{y}_i\) and sources \(\mathbf{b}_r\) are probability densities
    • mixture coefficients \(a_{ir} \geq 0\) and \(\sum_{r=1}^R a_{ir} = 1\)

Other source separation applications

  • Spatial Transcriptomics:
    • uncovering cell types from gene expressions
  • Musical Decomposition:
    • separating audio into individual instruments
  • Other source separation problems
    • need to model each source as some “rank-\(1\)” term
    • find the transformation of your data where it is linearly decomposable

The Method

Density Estimation

  • Often cannot measure the mixture densities \(\mathbf{y}_i\) directly
  • Estimate from samples via kernel density estimation (KDE)1
  • Unlike Gaussian mixture models, sources \(\mathbf{b}_r\) can be arbitrary

Higher Order KDE & Tensors

  • If independent features
    • store discretized \(1\)-dim KDEs for each sink \(i\) and feature \(j\) in \(\mathbf{Y}_{i j :}\)
  • Doing it this way, reduces the size of the data
    • \(J\)-dim KDEs1 (\(I\) of them) is more expensive to compute than \(IJ\) cheeper 1-dim KDEs
    • \(IK^J\) to \(IJK\) tensor entries

Higher Order KDE & Tensors

Contributions / advancements

  • Perform this source seperation on all features jointly
  • Model that scales well to arbitrary number of features \(J\)
  • Size of problem is independent of number of rocks collected

Tucker-1 Decomposition

  • A generalization of rank-\(R\) matrix factorization1
  • \(\mathbf{Y} = \mathbf{A}\mathbf{B}\)
  • \(Y_{ijk} = \sum_{r=1}^R A_{ir}B_{rjk}\)
  • \(\mathbf{Y}_i = \sum_{r=1}^R A_{ir}\mathbf{B}_{r}\)

Source separation model

\[ \min_{\mathbf A, \mathbf B} \left\{ \ell(\mathbf A, \mathbf B):=\tfrac{1}{2}\left\lVert \mathbf Y-\mathbf A \mathbf B \right\rVert_{F}^2 \,\middle\vert\, \mathbf A \in \Delta_{\mathbf A},\ \mathbf B \in \Delta_{\mathbf B} \right\} \]

\(\left\lVert \mathbf \cdot \right\rVert_{F}^2\) squared Frobenius norm (sum-of-squares)

\(\Delta_{\mathbf A}\) non-negative entries & rows sum to 1

\(\Delta_{\mathbf B}\) non-negative entries & fibres \(\mathbf B_{ij:}\) sum to 1

  • Nonnegativity implies it’s NP hard1 ☹️
  • Simplex implies a bounded feasible set 😊
  • Not convex, but block-convex and smooth

Block Coordinate Descent

  • Alternatly update \(\mathbf{A}\) and \(\mathbf{B}\) via projected gradient descent1
    • \(\mathbf A^{t+1} = P_{\Delta_{\mathbf A}}(\mathbf A^{t} - \frac{1}{L_{\mathbf A}} \nabla_{\mathbf A} \ell(\mathbf A^t, \mathbf B^t))\)
    • \(\mathbf B^{t+1} = P_{\Delta_{\mathbf B}}(\mathbf B^{t} - \frac{1}{L_{\mathbf B}} \nabla_{\mathbf B} \ell(\mathbf A^{t+1}, \mathbf B^t))\)
  • Convergence to nash equilibria \((\mathbf A^*, \mathbf B^*)\)
    • block-wise min: \(\ell(\mathbf A^*, \mathbf B^*) \leq \min\left(\ell(\mathbf A^*, \mathbf B), \ell(\mathbf A, \mathbf B^*)\right)\)
  • Also stationary: \(0 \in \partial(\ell + \delta_{\Delta_{\mathbf A} \times \Delta_{\mathbf B}})(\mathbf A^*, \mathbf B^*)\)

Results: Sediment Source Separation

http://dzgrainalyzer.eoas.ubc.ca

Summary

  • Combine KDE with Tucker-\(1\) factorization into a scalable nonparametric density decomposition method
  • Algorithm converges to block-minimum and stationary point
    • Open-source Julia code on GitHub: MatrixTensorFactor.jl
  • Practical model applicable to many areas
    • geology, music, genetics etc.

Paper & Code

References

Chen Y-C (2017) A tutorial on kernel density estimation and recent advances. Biostatistics & Epidemiology 1:161–187. https://doi.org/10.1080/24709360.2017.1396742
Graham N, Richardson N, Friedlander MP, Saylor J (2024) Tracing Sedimentary Origins in Multivariate Geochronology via Constrained Tensor Factorization. Preprint
Kolda TG, Bader BW (2009) Tensor Decompositions and Applications. SIAM Rev 51:455–500. https://doi.org/10.1137/07070111X
O’Brien TA, Kashinath K, Cavanaugh NR, et al (2016) A fast and objective multidimensional kernel density estimation method: fastKDE. Computational Statistics & Data Analysis 101:148–160. https://doi.org/10.1016/j.csda.2016.02.014
Richardson N, Graham N, Friedlander MP (2024a) MatrixTensorFactor.jl. GitHub
Richardson N, Graham N, Friedlander MP, Saylor J (2024b) SedimentAnalysis.jl. GitHub
Vavasis SA (2010) On the Complexity of Nonnegative Matrix Factorization. SIAM Journal on Optimization 20:1364–1377. https://doi.org/10.1137/070709967
Xu Y, Yin W (2013) A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion. SIAM J Imaging Sci 6:1758–1789. https://doi.org/10.1137/120887795

Other Applications

Application 2: Spatial Transcriptomics

  • ID cell types based on gene expressions of cells spatially

Application 2: Spatial Transcriptomics

Application 2: Spatial Transcriptomics

Application 3: Music Decomposition

  • Separate audio mixture into guitar and flute with STFT magnitude

time frequency

Left: Input spectrogram of guitar and flute. Above: Flute part isolated.

Application 3: Music Decomposition

time musical note

amplitude of low rank factors in time

frequency musical note

frequency spectrum of low rank factors

Additional Slides

Estimating the rank

  • Try many ranks \(R\)
  • Compare the objective as a function of the rank \(\left\lVert \mathbf Y-\mathbf A \mathbf B \right\rVert_{F}^2\)

Rank robustness

Rescaling trick

  • Relax constraints to
    • \(\mathbf A \geq 0\), \(\mathbf B \geq 0\)
    • \(\frac{1}{J}\sum_{jk} B_{rjk} = 1\) for all \(r\) (vs \(\sum_{k} B_{rjk} = 1\) for all \(r,j\))
  • Updates now look like:
    • \(\mathbf A^{t+1/2} = (\mathbf A^{t} - \frac{1}{L_{\mathbf A}} \nabla_{\mathbf A} \ell(\mathbf A^t, \mathbf B^t))_+\)
    • \(\mathbf B^{t+1/2} = (\mathbf B^{t} - \frac{1}{L_{\mathbf B}} \nabla_{\mathbf B} \ell(\mathbf A^{t+1/2}, \mathbf B^t))_+\)
    • \(\mathbf B^{t+1} =\mathbf C^{-1} \mathbf B^{t+1/2}\) and \(\mathbf A^{t+1} = \mathbf A^{t+1/2} \mathbf C\)
    • where \(C_{rr} = \frac{1}{J}\sum_{jk} B_{rjk}\)

Rescaling vs simplex projection

  • Compare \(\text{dist}\left(0, \partial(\ell + \delta_{\geq 0})(\mathbf A, \mathbf B)\right)\) every iteration
PPR

What features did we look at?

  • age
  • Eu anomaly
  • Ti-based crystallization temperature
  • Th-U ratio
  • sum of light rare-earth elements over heavy rare-earth elements (ΣLREE/ΣHREE)
  • Dy-Yb ratio
  • normalized (Ce/ND)/Y ratio

Convergence Details

  • when iterates are bounded iterates
    • there are limit points
    • know this because feasible set is bounded
  • when the objective function is KL
    • sequence of iterates converges to a finite limit point