Acknowledgments

Registration and travel support for this presentation was provided by the Society for Industrial and Applied Mathematics.

Research was funded with the help of the University of British Columbia (UBC) and the Natural Sciences and Engineering Research Council of Canada (NSERC).

Physical Setting

  • Few source locations upstream a river
    • Each source has their own distribution of minerals
  • Rocks from these sources are mixed by the river and deposited downstream
  • Take scoops of rocks at locations downstream, called “sinks”
    • Sinks are different mixtures of rocks from the upstream sources

 

Modeling the Physical Problem

  • Grains of rock from sink \(i\) are samples (random vectors) from one of the sources \(r\) with probability \(a_{ir}\)
  • Goal: Estimate probabilities \(a_{ir}\) and source distributions \(\mathbf b_r\)

Why do we care?

With these rocks…

  • Measure amount of rare-earth elements
  • Use radioisotope dating (U-Pb)
  • Element ratios
  • Geology:
    • Earth’s crust thickness
    • Tectonic plate movement
    • Chronostratigraphy (dating)
  • Mining:
    • Locate rare-earth elements
    • Minimal exploration

Mathematical Setting

  • Source separation / signal decomposition

    • have different mixtures \(\mathbf{y}_i\) of (the same) sources \(\mathbf{b}_r\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\quad\qquad \mathbf{y}_i = \sum_{r=1}^R a_{ir} \mathbf{b}_r \qquad i=1, 2, \dots, I\)

    • Goal 1: estimate the sources \(\mathbf{b}_r\)

    • Goal 2: estimate the contribution of each source \(a_{ir}\)

  • In our case, have the constraints:

    • signals \(\mathbf{y}_i\) and sources \(\mathbf{b}_r\) are probability densities
    • mixture coefficients \(a_{ir} \geq 0\) and \(\sum_{r=1}^R a_{ir} = 1\)

Other source separation applications

  • Spatial Transcriptomics:
    • uncovering cell types from gene expressions
  • Musical Decomposition:
    • separating audio into individual instruments
  • Other source separation problems
    • need to model each source as some “rank-\(1\)” term
    • find the transformation of your data where it is linearly decomposable

The Method

Density Estimation

  • Often cannot measure the mixture densities \(\mathbf{y}_i\) directly
  • Estimate from samples via kernel density estimation (KDE)1
  • Unlike Gaussian mixture models, sources \(\mathbf{b}_r\) can be arbitrary

Higher Order KDE & Tensors

  • If independent features
    • store discretized \(1\)-dim KDEs for each sink \(i\) and feature \(j\) in \(\mathbf{Y}_{i j :}\)
  • Doing it this way, reduces the size of the data
    • \(J\)-dim KDEs1 (\(I\) of them) is more expensive to compute than \(IJ\) cheeper 1-dim KDEs
    • \(IK^J\) to \(IJK\) tensor entries

Higher Order KDE & Tensors

Contributions / advancements

  • Perform this source seperation on all features jointly
  • Model that scales well to arbitrary number of features \(J\)
  • Size of problem is independent of number of rocks collected

Tucker-1 Decomposition

  • A generalization of rank-\(R\) matrix factorization1
  • \(\mathbf{Y} = \mathbf{A}\mathbf{B}\)
  • \(Y_{ijk} = \sum_{r=1}^R A_{ir}B_{rjk}\)
  • \(\mathbf{Y}_i = \sum_{r=1}^R A_{ir}\mathbf{B}_{r}\)

Source separation model

\[ \min_{\mathbf A, \mathbf B} \left\{ \ell(\mathbf A, \mathbf B):=\tfrac{1}{2}\left\lVert \mathbf Y-\mathbf A \mathbf B \right\rVert_{F}^2 \,\middle\vert\, \mathbf A \in \Delta_{\mathbf A},\ \mathbf B \in \Delta_{\mathbf B} \right\} \]

\(\left\lVert \mathbf \cdot \right\rVert_{F}^2\) squared Frobenius norm (sum-of-squares)

\(\Delta_{\mathbf A}\) non-negative entries & rows sum to 1

\(\Delta_{\mathbf B}\) non-negative entries & fibres \(\mathbf B_{ij:}\) sum to 1

  • Nonnegativity implies it’s NP hard1 ☹️
  • Simplex implies a bounded feasible set 😊
  • Not convex, but block-convex and smooth

Block Coordinate Descent

  • Alternatly update \(\mathbf{A}\) and \(\mathbf{B}\) via projected gradient descent1
    • \(\mathbf A^{t+1} = P_{\Delta_{\mathbf A}}(\mathbf A^{t} - \frac{1}{L_{\mathbf A}} \nabla_{\mathbf A} \ell(\mathbf A^t, \mathbf B^t))\)
    • \(\mathbf B^{t+1} = P_{\Delta_{\mathbf B}}(\mathbf B^{t} - \frac{1}{L_{\mathbf B}} \nabla_{\mathbf B} \ell(\mathbf A^{t+1}, \mathbf B^t))\)
  • Convergence to nash equilibria \((\mathbf A^*, \mathbf B^*)\)
    • block-wise min: \(\ell(\mathbf A^*, \mathbf B^*) \leq \min\left(\ell(\mathbf A^*, \mathbf B), \ell(\mathbf A, \mathbf B^*)\right)\)
  • Also stationary: \(0 \in \partial(\ell + \delta_{\Delta_{\mathbf A} \times \Delta_{\mathbf B}})(\mathbf A^*, \mathbf B^*)\)

Results: Sediment Source Separation

Other Applications

Application 2: Spatial Transcriptomics

  • ID cell types based on gene expressions of cells spatially

Application 2: Spatial Transcriptomics

Application 3: Music Decomposition

  • Separate audio mixture into guitar and flute with STFT magnitude

time frequency

flute (above), guitar (below)

guitar

Application 3: Music Decomposition

time musical note

amplitude of low rank factors in time

frequency musical note

frequency spectrum of low rank factors

Summary

  • Combine KDE with Tucker-\(1\) factorization into a scalable nonparametric density decomposition method
  • Algorithm converges to block-minimum and stationary point
    • Open-source Julia code on GitHub: MatrixTensorFactor.jl
  • Practical model applicable to many areas
    • geology, music, genetics etc.

Paper & Code

References

Chen Y-C (2017) A tutorial on kernel density estimation and recent advances. Biostatistics & Epidemiology 1:161–187. https://doi.org/10.1080/24709360.2017.1396742
Graham N, Richardson N, Friedlander MP, Saylor J (2024) Tracing Sedimentary Origins in Multivariate Geochronology via Constrained Tensor Factorization. Preprint
Kolda TG, Bader BW (2009) Tensor Decompositions and Applications. SIAM Rev 51:455–500. https://doi.org/10.1137/07070111X
O’Brien TA, Kashinath K, Cavanaugh NR, et al (2016) A fast and objective multidimensional kernel density estimation method: fastKDE. Computational Statistics & Data Analysis 101:148–160. https://doi.org/10.1016/j.csda.2016.02.014
Richardson N, Graham N, Friedlander MP (2024a) MatrixTensorFactor.jl. GitHub
Richardson N, Graham N, Friedlander MP, Saylor J (2024b) SedimentAnalysis.jl. GitHub
Vavasis SA (2010) On the Complexity of Nonnegative Matrix Factorization. SIAM Journal on Optimization 20:1364–1377. https://doi.org/10.1137/070709967
Xu Y, Yin W (2013) A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion. SIAM J Imaging Sci 6:1758–1789. https://doi.org/10.1137/120887795

Additional Slides

Estimating the rank

  • Try many ranks \(R\)
  • Compare the objective as a function of the rank \(\left\lVert \mathbf Y-\mathbf A \mathbf B \right\rVert_{F}^2\)

Rescaling trick

  • Relax constraints to
    • \(\mathbf A \geq 0\), \(\mathbf B \geq 0\)
    • \(\frac{1}{J}\sum_{jk} B_{rjk} = 1\) for all \(r\) (vs \(\sum_{k} B_{rjk} = 1\) for all \(r,j\))
  • Updates now look like:
    • \(\mathbf A^{t+1/2} = (\mathbf A^{t} - \frac{1}{L_{\mathbf A}} \nabla_{\mathbf A} \ell(\mathbf A^t, \mathbf B^t))_+\)
    • \(\mathbf B^{t+1/2} = (\mathbf B^{t} - \frac{1}{L_{\mathbf B}} \nabla_{\mathbf B} \ell(\mathbf A^{t+1/2}, \mathbf B^t))_+\)
    • \(\mathbf B^{t+1} =\mathbf C^{-1} \mathbf B^{t+1/2}\) and \(\mathbf A^{t+1} = \mathbf A^{t+1/2} \mathbf C\)
    • where \(C_{rr} = \frac{1}{J}\sum_{jk} B_{rjk}\)

Rescaling vs simplex projection

  • Compare \(\text{dist}\left(0, \partial(\ell + \delta_{\geq 0})(\mathbf A, \mathbf B)\right)\) every iteration

What features did we look at?

  • age
  • Eu anomaly
  • Ti-based crystallization temperature
  • Th-U ratio
  • sum of light rare-earth elements over heavy rare-earth elements (ΣLREE/ΣHREE)
  • Dy-Yb ratio
  • normalized (Ce/ND)/Y ratio

Convergence Details

  • when iterates are bounded iterates
    • there are limit points
    • know this because feasible set is bounded
  • when the objective function is KL
    • sequence of iterates converges to a finite limit point