Wasserstein Wormhole: Scalable Optimal Transport Distance with Transformers
Wasserstein Wormhole: Scalable Optimal Transport Distance with Transformers

Wasserstein Wormhole: Scalable Optimal Transport Distance with Transformers


Documentation for the package is available here.

Introduction

  • Authors present a transformer-based approach to approximate OT distances. This is done by embedding data (i.e. distribution or point cloud) into a latent space of a single point. A transformer model is then used to infer a probabilistic Wasserstein distance between another embedded data point. More specifically, they train the model in such a way that the Euclidean distance in the latent space, matches the OT distance (i.e. Wasserstein distance).
    •  
  • The main advantage of Wormhole is that it can compute OT distances in where is the dimensionality of the latent space. Sadly, authors do mention that Wasserstein distances cannot be equated to Euclidean distances. This implies that there will always be an error on the obtained distance (authors do derive upper/lower bounds).
 
  • The choice for a transformer-based architecture as a ML-backbone is because of their indifference to input length. This is a necessity when comparing point clouds (to create 3D spatial and 4D spatiotemporal reconstructions)
 
  • Both an encoder and decoder is trained, which allows for downstream tasks related to the (approximated) Wasserstein distance, such as interpolation (see this paper for more information). Interpolation is what is necessary for spatial/spatiotemporal reconstruction.
 

Prior Work

This section features some interesting papers in general, but nothing that contributes to the goal of the thesis. What is helpful here is the comparison of Wormhole and other OT solvers. Interestingly, it shows that Wormhole’s runtime is indeed independent of cohort size, allowing it to scale to larger datasets.
  • ⚠️ They only publish cohort sizes up to points, but ideally we would like to see what these numbers are for cohort sizes up to and even .
  • ❌ Wormhole is only suited for cohort sizes due to pre-training. In cases where the cohort size is smaller, other approaches such as the regular Sinkhorn algorithm perform best. This might be an interesting point to keep in mind when constructing a pipeline and working with mipmaps (i.e. when increasing resolution, adaptively choose the OT solver to minimize computation time).
 

Preliminaries