Some genes missing after SCTransform

I have been using SCTransform to normalize our single-cell RNA-seq data, mainly because it accounts for the differences in read depth across the cells. However, a few days ago when I was trying to visualize the expression levels of some genes identified as differentially expressed by Seurat, I kept running into these error messages saying that:

> FeaturePlot(seurat_obj, "ENSMUSG000000225101")
Error in `Feature_PreCheck()`:
! No features were found.
• The following are not present in object:
ℹ ENSMUSG000000225101
Run `rlang::last_trace()` to see where the error occurred.

The message suggested that somehow the expression data for the gene of interest could not be found from the SeuratObject. One potential reason is that some steps in the analysis pipeline might have excluded genes that are expressed in only a few cells from the SeuratObject.

Then a quick search ¹_^,²_^,³ gave me the answer – it turns out that SCTransform does indeed omit the expression data of genes expressed in fewer than 5 cells, although strangely, it is not clearly mentioned in the documentation.

Here is a simple solution to retain all the genes in the SeuratObject after running SCTransform: by setting the min_cells option. To note, this is not about keeping the normalized counts of only the highly variable genes or all the genes (which is controlled by the variable.features.n option), but instead refers to the trick to retain all the genes in the "SCT" assay.

I’ll use our own single-cell data to demonstrate this (unpublished, but no more than the basic stats will be revealed so no worry here). Before running SCTransform normalization, the SeuratObject include the expression data of 19,547 protein-coding genes:

> seurat_obj

An object of class Seurat 
19547 features across 5029 samples within 1 assay 
Active assay: RNA (19547 features, 0 variable features)
 1 layer present: counts

Using the default parameters (without specifying min_cells), genes being expressed in a few cells will be excluded. Now only 17,547 protein-coding genes remained in the analysis.

seurat_obj <- SCTransform(
    seurat_obj, 
    vst.flavor = "v2"
)

> seurat_obj

An object of class Seurat 
37094 features across 5029 samples within 2 assays 
Active assay: SCT (17547 features, 3000 variable features)
 3 layers present: counts, data, scale.data
 1 other assay present: RNA

However, if we set the min_cells option, then the expression data of all the genes should now be retained in the "SCT" assay:

seurat_obj <- SCTransform(
    seurat_obj,
    vst.flavor = "v2",
    min_cells = 0
)

> seurat_obj

An object of class Seurat 
39094 features across 5029 samples within 2 assays 
Active assay: SCT (19547 features, 3000 variable features)
 3 layers present: counts, data, scale.data
 1 other assay present: RNA

Now all the genes are back!

However, note that it’s important to be cautious with such genes. Although adjusting the min_cells option brings these genes back into the analysis, they could have a significant impact on the results of differentially expressed gene testing. Since these genes are not detected in most cells within a cluster, their expression might simply be due to stochastic gene expression or noise in the sequencing data rather than de facto contributing to the the transcriptomic identity of the cell type (especially if the UMI count is low). If not carefully managed, this could lead to misleading interpretations of the data, such as if the false discovery rate or the logFC threshold is not carefully controlled.

BIOLOGIST J

Some genes missing after SCTransform

Leave a ReplyCancel reply

Some genes missing after SCTransform

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from BIOLOGIST J