Spatial Landmark Detection and Tissue Registration With Deep Learning

Final Notes -

Multimodal, using histology images as support for landmark detection.

Additional Material -

Abstract - Correctly mentions that different techniques are incapable of handling nonlinear deformations between tissue sections and are thus ineffective for so-called z-stacking. The authors introduce landmark detection using neural-network-guided thin-plate splines.

Introduction

Automating the process of locating spatial landmarks can boost the scalability of sizable spatial omics experiments.

Unsupervised models tend to perform better due to the lack of high-quality annotated datasets. This doesn’t come without challenges:

Limited availability of datasets

Nonlinear deformations

Multimodal data (not a concern for us)

The authors introduce ELD (effortless landmark detection) by “creating a landmark detection network for the identification and leveraging of thin-plate splines.”

Results

Benchmarking ELD against existing methods

The authors do compare to other methods quite extensively, which is appreciated.

ELD can be described as follows:

“The ELD system uses an unsupervised trained spatial landmark detection network to pinpoint landmarks on the desired tissue slices“.

“ELD uses landmark-centric alignment techniques, such as TPS or homography, to align regions”.

“As a final step, ELD projects all the aligned tissue regions onto a CCF, facilitating comparative studies across slices”.

The authors use non-ST-based image datasets (e.g. the CelebA dataset) to facilitate training of landmarks. This can cause issues since ST-based data is extremely different in terms of features (i.e. grayscale, luminance and spot based).

(🙏) The authors included runtime benchmarks. Sadly, runtimes seem to be TERRIBLE.

In terms of # of genes - The runtime seems to increase linearly (though only hypothetical, since authors only seem to include 10, 50, 100, 200 and 500 genes measurements. This is sloppy.
In terms of # of landmarks - The runtime seems to be exponential, which is worrisome. If this is indeed the case, the method would not be ideal (e.g. it takes 200 minutes to create 70 landmark features for one single image, while we are tasked with many images).

Performance evaluation on single-modality data

One “good” option is Eggplant, but sadly this requires manual annotation.

ELD method performs at least as good as Eggplant.

The authors claim that ELD outperforms ST-Align.

3D Modeling

ELD can generate anchor points instead of landmarks for z-stacking. The anchor points act as fixed -coordinates across the -axis.

Based on a slice-selection procedure (see template-matching in GPSA:
Alignment Of Spatial Genomics Data using Deep Gaussian Processes) and artificially deforming the slice. This is not ideal.

The 3D alignment uses only a handful (e.g. 20) of anchors for alignment. This is probably due to the exponential cost of adding more landmarks.

In the benchmark, it would be useful to see if the loss decreases with increasing number of landmarks.

Alignment seems more consistent across entire tissue volumes (more so than pairs of slices) according to the (A)TRE.

(A)TRE - The (accumulated) target to registration error describes the error in terms of Euclidean distance between actual and predicted locations of points. The target points are not used in the registration process. The ATRE is the cumulative TRE over all tissue sections.

Since the (A)TRE is normalized in comparison to manual alignment, the closer the score is to 1, the better.

Performance evaluation on multimodal data

It seems to be a trend to include multimodal data, which is not of interest to our research, since our data consists of only a single modality (?).