Introduction

Celda is a Bayesian hierarchical model that can perform bi-clustering of features into modules and observations into subpopulations. In this tutorial, we will apply Celda to a real-world single-cell RNA sequencing (scRNA-seq) dataset of 2,700 Peripheral Blood Mononuclear Cells (PBMCs) collected from a healthy donor. This dataset (PBMC3K) is available from 10X Genomics and can be found on the 10X website.

The celda package uses the SingleCellExperiment (SCE) object for management of expression matrices, feature/cell annotation data, and metadata. All of the functions have an SCE object as the first input parameter. The functions operate on a matrix stored in the assay slot of the SCE object. The parameter useAssay can be used to specify which matrix to use (the default is "counts"). Matrices can be of class matrix or dgCMatrix from the Matrix package. While the primary clustering is performed with functions from the celda package, the singleCellTK package is used for some other tasks such as importing data, quality control, and marker identification with differential expression.

Importing data

The PBMC3K data can be easily loaded via the Bioconductor package TENxPBMCData. TENxPBMCData is an experiment package that provides resources for various PBMC datasets generated by 10X Genomics. When using this package, the column names of returned SCE object are NULL by default. For this example, we paste together the name of the sample with the cell barcode to generate column names for the SCE object. Additionally, the count matrix within sce object is converted from a DelayedMatrix object to a sparse matrix dgCMatrix object.

library(TENxPBMCData)
sce <- TENxPBMCData("pbmc3k")
colnames(sce) <- paste0("pbmc3k_", colData(sce)$Sequence)
counts(sce) <- as(counts(sce), "dgCMatrix")

If you have the singleCellTK package installed, then this dataset can be imported and converted with a single command:

To get your own data into a SingleCellExperiment object, the singleCellTK package has several importing functions for different preprocessing tools including CellRanger, STARsolo, BUStools, Optimus, DropEST, SEQC, and Alevin/Salmon. For example, the following code can be used as a template to read in multiple samples processed with CellRanger:

library(singleCellTK)
sce <- importCellRanger(sampleDirs = c("path/to/sample1/", "path/to/sample2/"))

Note: As a reminder, you can view the assays, column annotation, and row annotation stored in the SCE with the commands assays(sce), colData(sce), and rowData(sce), respectively.

Finally, we set the rownames of the SCE to the gene symbol:

rownames(sce) <- rowData(sce)$Symbol_TENx

Quality Control

Quality control and filtering of cells is often needed before down-stream analyses such as dimensionality reduction and clustering. Typical filtering procedures include exclusion of poor quality cells with low numbers of counts/UMIs, estimation and removal of ambient RNA, and identification of potential doublet/multiplets. Many tools and packages are available to perform these operations and users are free to apply their tool(s) of choice as the celda clustering functions will work with any matrix stored in an SCE object. The celda package does contain a Bayesian method called decontX to estimate and remove transcript contamination in individual cells in a scRNA-seq dataset.

To perform QC, we suggest using the runCellQC function in singleCellTK package. This is a wrapper for several methods for calculation of QC metrics, doublet detection, and estimation of ambient RNA (including decontX). Below is a quick example of how to perform standard QC before applying celda. If you have another preferred approach or your data has already been QC’ed, you can move to Feature selection section. For this tutorial, we will only run one doublet detection algorithm and one decontamination algorithms. For a full list of algorithms that this function runs by default, see ?runCellQC. We will also quantify the percentage of mitochondrial genes in each cell as this is often used as a measure of cell viability.

library(singleCellTK)

# Get list of mitochondrial genes
mito.genes <- grep("^MT-", rownames(sce), value = TRUE)

# Run QC
sce <- runCellQC(sce, sample = NULL, algorithms = c("QCMetrics", "scDblFinder", "decontX"), geneSetList = list(mito=mito.genes), geneSetListLocation = "rownames")

Note: If you have cells from multiple samples stored in the SCE object, make sure to supply the sample parameter as the QC tools need to be applied to cells from each sample individually.

Individual sets of QC metrics can be plotted with specific functions. For example to plot distributions of total numbers of UMIs derived from runPerCellQC, doublet scores from runScDblFinder, and contamination scores from runDecontX (all of which were run by the runCellQC function), the following plotting functions can be used:

plotScDblFinderResults(sce, reducedDimName = "decontX_UMAP")

plotDecontXResults(sce, reducedDimName = "decontX_UMAP")

An comprehensive HTML report can be generated to visualize and explore the QC metrics in greater detail:

After examining the distributions of various QC metrics, poor quality cells will need to be removed. Typically, thresholds for QC metrics should exclude cells that are outliers of the distribution (i.e. long tails in the violin or density plots). Cells can be removed using the subsetSCECols function. Metrics stored in the colData of the SCE object can be filtered using the colData parameter. Here we will limit to cells with at least 600 counts and 300 genes detected:

# Filter SCE
sce <- subsetSCECols(sce, colData = c("total > 600", "detected > 300"))

# See number of cells after filtering
ncol(sce)
## [1] 2675

Other common metrics to filter on include subsets_mito_percent for removal of cells with high mitochondrial percentage, decontX_contamination for removal of cells with higher levels of contamination from ambient RNA, scDblFinder_class to remove doublets (or calls from any of the other doublet detection algorithms). See the singleCellTK documentation For more information on performing comprehensive QC and filtering.

Feature selection

In general, removing features with low numbers of counts across all cells is recommended to reduce computational run time. A simple selection can be performed by removing features with a minimum number of counts in a minimum number of cells using the selectFeatures function:

# Select features with at least 3 counts in at least 3 cells
library(celda)
useAssay <- "counts"
altExpName <- "featureSubset"
sce <- selectFeatures(sce, minCount = 3, minCell = 3, useAssay = useAssay, altExpName = altExpName)

# See number of features after filtering
nrow(altExp(sce, altExpName))
## [1] 2639

The useAssay parameter is used to denote which assay/matrix within the SCE to use for filtering. The default raw counts matrix is traditionally stored in the "counts" assay. If decontX was previously run during QC, then the decontaminated counts can be used by setting this parameter to "decontXcounts". We will save this parameter in a variable called useAssay which will be used as input in several downstream functions.

Note: The subsetted matrix is stored in the “alternative experiment” slot (altExp) within the SCE. This allows for a matrix with a different number of rows to be stored within the same SCE object (rather than creating two SCE objects). The celda functions described in the next several sections operate on a matrix stored in the altExp slot. The default name given to the alternative experiment and used in all downstream celda functions is "featureSubset". If the altExpName parameter is changed here, then it will need to be supplied to downstream plotting functions as well. The list of alternative experiments in an SCE can be view with altExpNames(sce). If you have already have an SCE with selected features or do not want to perform feature selection, then you need to set the alternative experiment directly with a command like altExp(sce, "featureSubset") <- assay(sce, "counts"). In the future, this will be updated to be more simple by utilizing the ExperimentSubset package.

If the number of features is still relatively large (e.g. >5000), an alternative approach is to select highly variable features that can be used in the downstream clustering. The advantage of this approach is that it can greatly speed up celda and can improve with module detection among highly variable features with overall lower expression. The disadvantage of this approach is that features that do not fall into the highly variable group will not be clustered into modules. The celda package does not include methods for selection of highly variable genes (HVGs). However, the singleCellTK provides wrappers for methods used in Seurat and Scran. We recommend keeping at least 2,000-5,000 HVGs for clustering. Here is some example code of how to select the top 5,000 most variable genes and store it back in the SCE as an altExp:

library(singleCellTK)
sce <- seuratFindHVG(sce, useAssay = useAssay, hvgMethod = "vst")
g <- getTopHVG(sce, method = "vst", n = 5000)
altExp(sce, altExpName) <- sce[g, ]

For the rest of the analysis with the PBMC3K data, we will use the first approach where features with at least 3 counts in 3 cells were included.

Analysis with Celda

Bi-clustering with known numbers of clusters

As mentioned earlier, celda is discrete Bayesian model that is able to simultaneously bi-cluster features into modules and cells into cell clusters. The primary bi-clustering model can be accessed with the function celda_CG. This function operates on a matrix stored as an alternative experiment in the altExp slot. If you did not perform feature selection as recommended in the previous section and your matrix of interest is not currently located in an altExp slot, the following code can be used to copy a matrix in the main assay slot to the altExp slot:

useAssay <- "counts"
altExpName <- "featureSubset"
altExp(sce, altExpName) <- assay(sce, useAssay)`. 

The two major adjustable parameters in this model are L, the number of modules, and K, the number of cell populations. The following code bi-clusters the PBMC3K dataset into 100 modules and 15 cell populations:

sce <- celda_CG(sce, L = 100, K = 15, useAssay = useAssay, altExpName = altExpName)

However, in most cases, the number of feature modules (L) and the number of cell clusters (K) are not known beforehand. In the next sections, we outline procedures that can be used suggest reasonable choices for these parameters. If the data is clustered with the code above by supplying K and L directly to the celda_CG function, then you can skip the next section and proceed to Creating 2-D embeddings.

Finding the number of modules

In order to help choose a reasonable solutions for L and K, celda provides step-wise splitting procedures along with measurements of perplexity to suggest reasonable choices for L and K. First, the function recursiveSplitModule can be used to cluster features into modules for a range of L. Within each step, the best split of an existing module into 2 new modules is chosen to create the L-th module. The module labels of the previous model with \(L-1\) modules are used as the initial starting values in the next model with \(L\) modules. Note that the initialization step may take longer with larger numbers of cells in the dataset and the splitting procedure will take longer with larger numbers features in the dataset. Celda models with a L range between initialL = 10 and maxL = 150 are tested in the example below.

moduleSplit <- recursiveSplitModule(sce, useAssay = useAssay, altExpName = altExpName, initialL = 10, maxL = 150)

Perplexity has been commonly used in the topic models to measure how well a probabilistic model predicts observed samples (Blei et al., 2003). Here, we use perplexity to evaluate the performance of individual models by calculating the probability of observing expression counts given an estimated Celda model. Rather than performing cross-validation which is computationally expensive, a series of test sets are created by sampling the counts from each cell according to a multinomial distribution defined by dividing the counts for each gene in the cell by the total number of counts for that cell. Perplexity is then calculated on each test set and can be visualized using function plotGridSearchPerplexity. A lower perplexity indicates a better model fit.

plotGridSearchPerplexity(moduleSplit, altExpName = altExpName, sep = 10)

The perplexity alone often does not show a clear elbow or “leveling off”. However, the rate of perplexity change (RPC) can be more informative to determine when adding new modules does not add much additional information Zhao et al., 2015). An RPC closer to zero indicates that the addition of new modules or cell clusters is not substantially decreasing the perplexity. The RPC of models can be visualized using function plotRPC:

plotRPC(moduleSplit, altExpName = altExpName, sep = 10, n = 30)

In this case, we will choose an L of 80 as the RPC curve tends to level off at this point:

L <- 80
Note: Perplexity and RPC are meant to be guides to give a sense of a possible starting point for L. However, they may not always give a clear “leveling off” depending of the complexity and quality of the dataset. Do not give up if the choice of L is unclear or imperfect! If the L to choose is unclear from these, then you can set a somewhat high number (e.g. 75) and move to the next step of selecting K. Later on, manual review of modules using functions such as moduleHeatmap can give a sense of whether individual modules should be further split up by selecting higher L. For example, you can start exploring the cell populations and modules with L = 75. If some modules need to be further split, you can then try L = 100, L = 125, and so on.

Finding the number of cell subpopulations

Now we extract the Celda model of L =\(L\) with function subsetCeldaList and run recursiveSplitCell to fit models with a range of K between 3 and 25:

temp <- subsetCeldaList(moduleSplit, list(L = L))
sce <- recursiveSplitCell(sce, useAssay = useAssay, altExpName = altExpName, initialK = 3, maxK = 25, yInit = celdaModules(temp))

The perplexities and RPC of models can be visualized using the same functions plotGridSearchPerplexity and plotRPC.

plotRPC(sce)

The perplexity continues to decrease with larger values of K. The RPC generally levels off between 13 and 16 and we choose the model with K = 14 for downstream analysis. The follow code selects the final celda_CG model with L = 80 and K = 14:

K <- 14
sce <- subsetCeldaList(sce, list(L = L, K = K))

Note: Similar to choosing L, you can guess an initial value of K based off of the perplexity and RPC plots and then move to the downstream exploratory analyses described in the next several sections. After reviewing the cell clusters on 2-D embeddings and module heatmaps, you may have to come back to tweak the choice of K until you have something that captures the cellular heterogeneity within the data without “over-clustering” cells into too many subpopulations. This may be an iterative procedure of going back-and-forth between choices of K and plotting the results. So do not let imperfect perplexity/PRC plots prevent you from moving on to the rest of the analysis. Often times, using an initial guess for K will allow you to move on in the analysis to get a sense of the major sources of biological heterogeneity present in the data.

Exploring cell populations

Creating 2-D embeddings

After selecting a celda model with specific values of L and K, we can then perform additional exploratory and downstream analyses to understand the biology of the transcriptional modules and cell populations. We can start by generating a dimension reduction plot with the Uniform Manifold Approximation and Projection (UMAP) method to visualize the relationships between the cells in a 2-D embedding. This can be done with function celdaUmap.

sce <- celdaUmap(sce, useAssay = useAssay, altExpName = altExpName)

Alternatively, a t-distributed stochastic neighbor embedding (t-SNE) can be generated using function celdaTsne. The UMAP and t-SNE plots generated by celdaUmap and celdaTsne are computed based on the module probabilities (analogous to using PCs from PCA). The calculated dimension reduction coordinates for the cells are stored under the reducedDim slot of the altExp slot in the original SCE object. The follow command lists the names of the dimensionality reductions that can be used in downstream plotting functions in the next few sections:

reducedDimNames(altExp(sce, altExpName))
## [1] "decontX_UMAP" "celda_UMAP"

Plotting cell population cluster labels

The function plotDimReduceCluster can be used to plot the cluster labels for cell populations identified by celda on the UMAP:

plotDimReduceCluster(sce, reducedDimName = "celda_UMAP", labelClusters = TRUE)

Plotting expression of specific features

Usually, biological features of some cell populations are known a priori and can be identified with known marker genes. The expression of selected marker genes can be plotted on the UMAP with the function plotDimReduceFeature.

markers <- c("CD3D", "IL7R", "CD4", "CD8B", "CD19", "FCGR3A", "CD14", "FCER1A", "PF4")

plotDimReduceFeature(x = sce, features = markers, reducedDimName = "celda_UMAP", useAssay = useAssay, altExpName = altExpName, normalize = TRUE)

The parameter displayName can be used to switch between IDs stored in the rownames of the SCE and columns of the rowData of the SCE. If the assay denoted by useAssay is a raw counts matrix, then setting normalize = TRUE is recommended (otherwise the z-score of the raw counts will be plotted). When set to TRUE, each count will be normalized by dividing by the total number of counts in each cell. An alternative approach is to perform normalization with another method and then point to the normalized assay with the useAssay parameter. For example, normalization can be performed with the scater package:

library(scater)
sce <- logNormCounts(sce, exprs_values = useAssay, name = "logcounts")
plotDimReduceFeature(x = sce, features = markers, reducedDimName = "celda_UMAP", useAssay = "logcounts", altExpName = altExpName, normalize = FALSE)

This second approach may be faster if plotting a lot of marker genes or if the dataset is relatively large.

Plotting cell subpopulations with labels

Once we identify of various cell subpopulations using the known marker genes, these custom labels can be added on the UMAP colored by cluster:

g <- plotDimReduceCluster(sce, reducedDimName = "celda_UMAP", altExpName = altExpName, labelClusters = TRUE)

labels <- c("1: Megakaryocytes",
    "2: CD14+ Monocytes 1",
    "3: CD14+ Monocytes 2",
    "4: FCGR3A (CD16+) Monocytes",
    "5: CD14+ Monocytes 3",
    "6: CD8+ Cytotoxic T-cells",
    "7: CD4+ T-cells",
    "8: CD8+ Cytotoxic T-cells",
    "9: B-cells",
    "10: Naive CD8+ T-cells",
    "11: Naive CD4+ T-cells",
    "12: NK-cells",
    "13: Unknown T-cells",
    "14: Dendritic cells")

library(ggplot2)
g <- g + scale_color_manual(labels = labels,
    values = distinctColors(length(labels)))
print(g)

Exploring relationship between modules and cell populations

Celda has the ability to identify modules of co-expressed features and quantify the probability of these modules in each cell population. An overview of the relationships between modules and cell subpopulations can be explored with the function celdaProbabilityMap. The “Absolute probability” heatmap on the left shows the proportion of counts in each module for each cell population. The “Absolute probability” map gives insights into the absolute abundance of a module within a given cell subpopulation. The absolute heatmap can be used to explore which modules are higher than other modules within a cell population. The “Relative expression” map shows the standardized (z-scored) module probabilities across cell subpopulations. The relative heatmap can be used to explore which modules are relatively higher than other modules across cell populations.

celdaProbabilityMap(sce, useAssay = useAssay, altExpName = altExpName)

In this plot, we can see a variety of patterns. Modules 15 - 20 are highly expressed across most cell populations indicating that they may contain housekeeping genes (e.g. ribosomal). Other modules are specific to a cell population or groups of cell populations. For example, module 35 is only on in population 1 while module 70 is expressed across populations 2, 3, and to some degree in population 5. The unknown T-cell population 13 has highly specific levels of modules 30. In the next section, we can look at the genes in these modules to gain insights into the biological properties of each of these cell populations.

Exploring feature modules

The primary advantage of celda over other tools is that it can cluster features that are co-expressed across cells into modules. These modules are often more biologically coherent than features correlated with principal components from PCA. Below are several ways in which modules can be explored and visualized.

Table of features in each module

The function featureModuleTable can be used to get the names of all features in each module into a data.frame.

# Save to a data.frame
ta <- featureModuleTable(sce, useAssay = useAssay, altExpName = altExpName)
dim(ta)
## [1] 154  80
head(ta[,"L70"])
## [1] "S100A9"   "S100A8"   "S100A12"  "RBP7"     "FOLR3"    "C19orf59"

The parameter displayName can be used to switch between IDs stored in the rownames of the SCE and columns of the rowData of the SCE. The the outputFile parameter is set, the table will be saved to a tab-delimited text file instead of to a data.frame:

# Save to file called "modules.txt"
featureModuleTable(sce, useAssay = useAssay, altExpName = altExpName, outputFile = "modules.txt")

The modules for this model are shown below:

L1 L2 L3 L4 L5 L6 L7 L8 L9 L10 L11 L12 L13 L14 L15 L16 L17 L18 L19 L20 L21 L22 L23 L24 L25 L26 L27 L28 L29 L30 L31 L32 L33 L34 L35 L36 L37 L38 L39 L40 L41 L42 L43 L44 L45 L46 L47 L48 L49 L50 L51 L52 L53 L54 L55 L56 L57 L58 L59 L60 L61 L62 L63 L64 L65 L66 L67 L68 L69 L70 L71 L72 L73 L74 L75 L76 L77 L78 L79 L80
CCL3 GNLY CTSW NKG7 RPS19 MT-CO2 MT-CO3 DDX5 RPL28 RPL18A FOS EIF1 JUNB GIMAP7 RPL13A RPS6 RPS2 RPL10 RPL13 RPS14 RPSA RPS27 LTB PTPRCAP MALAT1 LDHB IL32 CD79B CD37 TUBA1B GAPDH PPIA ACTG1 CCL5 PPBP RGS10 OAZ1 TAGLN2 MT-ND1 MT-CO1 ARPC3 SH3BGRL3 CYBA PTMA TMSB10 LAPTM5 ARHGDIB HLA-B CFL1 SRGN ACTB TMSB4X C9orf142 ANXA1 UBB B2M MYL12A HLA-A FCGR3A IFITM2 FAM26F FCER1G AIF1 FTH1 FCER1A HLA-DQA1 HLA-DPB1 CD74 HLA-DRA S100A9 LYZ CST3 VIM NEAT1 S100A4 GSTP1 LGALS1 GABARAP TYROBP FTL
IGFBP7 GZMB CD247 GZMA NACA CD52 MT-ND4 TSC22D3 RPS9 RPL12 FXYD5 H3F3B TMEM66 GIMAP4 RPS18 RPS3 RPL19 RPL11 RPL32 EEF1A1 JUN RPL21 MYC CXCR4 MYLIP IL7R CD3D CD79A SNHG7 HMGB2 EIF4A1 HNRNPA2B1 CORO1A GZMK PF4 TUBA4A FKBP1A GDI2 PFDN5 LSP1 YBX1 SERF2 CLIC1 HNRNPA1 EIF3K SNX3 UBC SRP14 PSMB9 ITGB2 PFN1 GMFG APOBEC3G HCST RAC2 HLA-C HSPA8 CALM1 RHOC CTSC NCF1 FGR LST1 COTL1 CLEC10A HLA-DQB1 HLA-DPA1 IRF8 HLA-DMA S100A8 LGALS2 CFP S100A10 ISG15 S100A6 GPX1 TYMP TSPO FCN1 CTSS
HAVCR2 FGFBP2 GZMM CST7 NAP1L1 PPDPF MT-CYB TXNIP FAU RPL8 CD48 DUSP1 ZFP36L2 FYB RPS8 RPS12 RPLP1 RPL6 RPLP2 RPS4X NPM1 RPS3A SIT1 ISG20 ATM NOSIP CD3E MS4A1 SNX2 EIF1AY SLC25A5 HMGB1 CHCHD2 LAG3 HIST1H2AC CDC42SE2 TALDO1 ATP5C1 SLC25A6 ATP6V0E1 LY6E ARPC1B SUPT4H1 RPL36AL ATP5E UQCRH MYL12B PSME1 PPP1CA CD63 MYL6 CAPZB CDC37 ID2 UCP2 HLA-E EVL CD99 CDKN1C MYO1G LYN CD86 IFITM3 SAT1 ENHO HLA-DQA2 HLA-DRB1 LAT2 LY86 S100A12 MS4A6A CPVL NFKBIA ANXA2 S100A11 AP1S2 LGALS3 RAC1 NCF2 NPC2
CCL4L1 CCL4 LYAR PRF1 SOD1 ATP6V1G1 MT-ND2 CIRBP UBA52 RPL29 APRT ITM2B TMEM123 GIMAP1 RPL10A RPL3 RPL15 RPL26 RPS16 RPL27A RPSAP58 RPS27A BIRC3 CD69 ANKRD44 GIMAP5 CD7 TCL1A PRKCB MANF TPI1 HSP90AA1 ENO1 SDPR FERMT3 H3F3A PRDX6 ATP5G2 BRK1 RHOA ALDOA IFI35 COX7C ATP5L GABARAPL2 YWHAB RBM3 PSMB8 CTSD ARPC2 TUBA1A ARGLU1 RNF181 CD53 ARL6IP5 IFITM1 SEPT7 CKB ABI3 POU2F2 CD300C CFD PSAP DNASE1L3 CD1C HLA-DRB5 EAF2 HLA-DMB RBP7 CD14 IGSF6 AMICA1 PRELID1 GRN TNFSF13B PYCARD CDA BRI3
SPON2 HOPX GZMH CYTIP PRR13 MT-ATP6 LIMD2 RPS24 GNB2L1 GSTK1 KLF6 BTG2 CITED2 RPS5 RPS15A RPS15 RPL14 RPS28 TPT1 HINT1 RPL9 RIC3 STK17A CARS C12orf57 CD2 IGLL5 ARL4A HMGA1 PKM HMGN1 SLC25A3 TSC22D1 DNAJB6 GSTO1 DNAJC8 ZFP36 C11orf31 CALM2 LAMTOR4 IRF7 CNBP PSMA7 POLD4 OST4 C19orf43 TPM3 MYO1F EMP3 PGK1 TMCO1 CKLF PLAC8 BIN2 AES PDIA3 LYPD2 ATP1B3 SCPEP1 FAM49A SERPINA1 TIMP1 SERPINF1 FCGR2B PLD4 MEF2C RNASE6 FOLR3 ALDH2 RAB32 MYADM IFI6 CEBPD RNF130 TKT SLC7A7 CTSB
CLIC3 C12orf75 KLRD1 CLNS1A SYF2 SSR2 KLF2 COX4I1 RPS11 CD44 IER2 NDFIP1 SEPW1 RPS23 RPL23A RPS7 RPL27 RPL36 RPL7 EIF4A2 BTG1 STMN3 ACAP1 DNAJB1 LCK IGJ HVCN1 STMN1 COX8A SRSF7 LDHA NRGN LIMS1 SOD2 YWHAH SERP1 COPE RNASET2 COX5B MIR142 TMA7 COX6B1 SEP15 NDUFA4 SUMO2 COX6A1 FLNA PSME2 ARF1 ADRM1 IL10RA CD164 RAP1B RARRES3 LITAF VMO1 SPN SYNGR2 ARRB1 CD68 CEBPB LILRA4 PPP1R14A CCDC50 GAPT C19orf59 IL8 CD33 PPT1 C1orf162 FCGRT STX11 AP2S1 FPR1 BLVRA
XCL2 CD8A CD160 CCT7 EVI2B TMEM14B HNRNPDL BTF3 RPL23 PPP1R15A GPSM3 RWDD1 GIMAP2 RPL18 RPL30 RPL7A EEF1D RPL22 RPL35A SNRPD2 GLTSCR2 CRIP2 RHOH PRDX2 LAT LINC00926 SYPL1 PRDX4 PSMB3 RAN PPIB TUBB1 MAX POLE4 MRPL14 GPX4 COX5A DRAP1 TCEB2 LAP3 EIF3F VAMP8 LSM7 SKP1 EIF3G GUK1 GLIPR2 ARPC5 LRRFIP1 PPP2CA CMTM3 PPP1R18 HLA-F TRAF3IP3 XBP1 ADA LYST TCF7L2 SPI1 STXBP2 PHACTR1 CSF3R CD302 CORO1B RHOG CTSH MNDA ATP6V0B MAFB RHOB
TTC38 KLRG1 FCRL6 ATP1A1 YPEL5 C19orf24 ANAPC16 PABPC1 EEF2 PLP2 EDF1 SOCS3 DGCR6L RPL5 RPS25 RPL35 RPL24 C6orf48 RPL34 SELL FOXP1 TNFRSF4 BIN1 CD27 CD3G BANK1 ADK CHTF8 GNB2 HNRNPK SRSF3 GNG11 AP3S1 RAB10 SPINT2 ATP5D WDR83OS SH3BGRL BST2 TMEM179B UBE2D3 ARL6IP4 RASGRP2 COX6C TMBIM6 TRAPPC1 EFHD2 ATP5B CALM3 PDHB FYN SCP2 DHRS7 IL2RG ANXA6 SH3BP1 LYL1 IFI30 ASAH1 PLBD1 LILRB4 TMEM14C MT2A BLVRB FGL2 RGS2 FCGR2A PLAUR
AKR1C3 ZAP70 XCL1 ARID4B DAZAP2 MPC2 VAMP2 EIF3H NBEAL1 KHDRBS1 UQCR11 TRADD TMEM173 RPS10 RPL31 RPL4 RPL38 C21orf33 RPL17 PEBP1 RP11-796E2.4 RP11-706O15.1 OCIAD2 LEPROTL1 CD8B VPREB3 SMIM14 SP140 UQCRFS1 MIF PARK7 RGS18 CTSA PTPN18 H2AFJ HIGD2A FIS1 BLOC1S1 SF3B5 PLSCR1 ERP29 NEDD8 MX1 PTPRC SAP18 CAP1 PLEK MSN FKBP8 TPST2 ETHE1 TMEM9B TMEM50A CCND3 YWHAQ GUSB STX7 APOBEC3A BID ASGR1 RAB31 DHRS4L2 CSTB NUP214 KLF4 LINC00936 TMEM176B CTSZ
PRSS23 APMAP KLRC1 THYN1 SP110 TMED4 UBE2D2 PNRC1 EIF3L SURF1 SSR4 CHURC1 GBP1 RPS13 RPS20 RPL37A RPS4Y1 C1orf228 RPL36A CMPK1 KCNQ1OT1 KRT1 FAIM3 PIK3IP1 OPTN FCER2 MYCBP2 HIST1H4C ANAPC11 SRSF2 PSMB1 CLU NCOA4 GLUL TPM4 FOSB PSMB7 RNH1 GLRX TMEM205 GTF3A NDUFA13 MAF1 PTGES3 JTB LCP1 CD300A HN1 ATRAID PPP6C PTGER2 ZNF207 SIGIRR SRP9 ALOX5AP FAM110A YBX3 MS4A7 WARS TNFAIP2 CD4 C10orf54 IL1B ODF3B CNPY3 VSTM1 TBXAS1
GPR56 SAMD3 RP11-347P5.1 CCDC12 MRPL21 UXT UBXN1 ZFAS1 STAT1 MT-ND5 STAT3 DNAJA2 RPLP0 RPS29 RPL37 TOMM7 COMMD6 PPA1 LBH RGCC ITM2A PDLIM1 CYB561A3 PCNA PGAM1 HNRNPC H2AFZ CD9 RHEB RNPEP COMMD3 NDUFA11 SRSF9 PTPN6 SERPINB1 TLN1 PCBP2 ATP5G3 RCSD1 ACTR3 EID1 HCLS1 AOAH DOK2 M6PR ATXN10 CDC42EP3 NDUFS2 SH3KBP1 FUS TMED9 ASB8 SNX10 HCK NINJ1 BST1 UBE2Q1 ANXA5 PGD SMCO4 CAPG RETN G0S2
HBA1 NCR3 TTC3 STUB1 TMEM165 PNISR RPL39 HPCAL1 PCBP1 TNFAIP3 SORL1 EEF1B2 RPS26 EIF3D SRSF5 ST13 ANKRD12 GPR183 TCF7 RORA P2RX5 MBD4 NME1 TIMM13 SPCS1 GHITM ACRBP ODC1 FAM45A ETFA MCL1 ARF5 ZNF706 ATP6V1F ANXA4 HSP90AB1 UBL5 DDT COX7A2 NDUFB8 ATP5F1 CX3CR1 PTP4A2 C1orf43 TMX2 GYG1 PSMC4 RASSF5 DUSP2 SUN2 UTRN NOTCH2NL RP11-290F20.3 CPPED1 FCGR1A H2AFY MSRB1 TGFBI LGALS9 MTMR11 C5AR1
PTGDS CHST12 RBM4 C19orf70 LYPLA1 RBMX ZFP36L1 RBMS1 ICAM3 UBAC2 PTGER4 RPS21 ZFAND1 MZT2B CCNI CCDC109B SLC2A3 RCN2 SPOCK2 SH2D1A BLK IFT57 SNRNP25 MOB1A C1QBP PRDX1 MMD RBBP6 MPP1 CIR1 LAMTOR1 PPP4C GNG5 UBE2L6 LAIR1 ATP5O COX7A2L HERPUD1 CIB1 C9orf16 VPS28 MIR4435-1HG CAPZA2 GBP2 RPL7L1 IFI44 SRP72 SYTL1 GYPC TAP1 TMEM18 BTK LILRA5 DUSP6 QPCT PGLS VCAN NUDT16 SAT2 CSTA MPEG1
PTPN7 DSCR3 TNFRSF14 PCSK7 SON APEX1 USP3 NDUFB11 HSPB1 MT1X SNHG8 FBL RPL41 AIMP1 ARHGAP15 PPM1K CCR7 ETS1 FCRLA PPAPDC1B AHCY CYC1 TRMT112 SNRPB CA2 AMD1 PLEKHO1 ADIPOR1 SCAND1 EIF5 VASP POLR2L DDAH2 HNRNPA0 CRIP1 PAPOLA TMEM59 UFC1 DBI ACTN4 RAB8A VPS29 CLEC2B SMIM12 ZFAND6 RASAL3 PPP2R5C BTN3A2 INSIG1 UNC93B1 PILRA TESC PID1 CARD16 ID1 PRAM1 IFNGR2 CYBB
TIGIT BAZ1A POLR2I TBC1D10C EIF2S3 ACTR1B CHMP4A MRPS33 RCBTB2 EEF1G EBPL CUTA TNFAIP8 ARID5B AQP3 CDC25B MZB1 NAT9 MCM5 SNF8 ERH UBE2I PTCRA GRAP2 MTHFD2 FDFT1 GNAS LAMTOR5 RBX1 SEC11A PARP14 ANP32B ATP5H RTFDC1 HNRNPF ARF6 DYNLL1 ASCL2 IDH2 MKRN1 EMG1 FLOT1 PMAIP1 MAEA DDIT4 PRMT2 CUX1 SCIMP LRRC25 SLC16A3 CXCL2 CASP1 CD1D APLP2 SLC11A1
PRR5 MRPL18 ARF4 FAM65B MED30 SSU72 RP11-51J9.5 HNRNPH3 EIF3E PCNP PPP3CC FLT3LG BCL11B CD72 RBM5 FABP5 FIBP CNN2 SEC61B SPARC RSU1 SNN GRSF1 HSD17B11 REEP5 RGS19 CASP4 LMO4 ATP5A1 NDUFB10 LSM10 C11orf58 HNRNPM PSMB10 MPST CLTB MYH9 DPM1 PSTPIP1 RAB11B RAB37 TERF2IP BUB3 UBLCP1 ALOX5 LILRB2 CTSL EREG ARRB2 NCOR2 JUND CLEC7A
KLRB1 SMIM7 N4BP2L1 PPP1R2 RP11-349A22.5 RAP1A IL27RA EPB41L4A-AS1 RSL1D1 MZT2A LY9 TRAT1 GATA3 TSPAN13 TPD52 PTTG1 TMEM208 DAD1 NDUFA1 MYL9 NT5C3A TAX1BP3 ILK CHMP1B ATP5EP2 RTN4 NOP10 SKAP2 NDUFB9 UQCR10 RPL22L1 RBM39 HNRNPA3 PRDX5 TMEM140 PSMA4 NMI VPS25 LINC00152 NT5C SEPT1 HSPA5 CDIP1 HHEX C19orf38 NAGA TNFSF10 GSN C4orf48 SULT1A1
MATK TRIM22 GMPR2 NCOR1 KIAA0040 POLR1D PRMT10 IMPDH2 CCNL1 DDX24 RP1-313I6.12 PRKCQ-AS1 SIRPG HLA-DOB ITM2C MCM7 MX2 SERBP1 ABRACL GP9 R3HDM4 MFSD1 CNDP2 ERGIC3 POLR2J CHCHD10 C20orf24 EPN1 C19orf53 ATP5J2 ZC3H15 CDC42 EIF3I CSNK2B OASL MPC1 CRELD2 DNAJC1 ATF6B ARL4C WIPF1 PPM1N SYK SLC31A2 GNAI2 RASSF4 AGTRAP GNS
IL2RB GRK6 C10orf32 SP100 IFI44L UBE2K TAF1D NSA2 TSTD1 PITPNA-AS1 LEF1 GPR171 SPIB CXXC5 FEN1 TXNDC17 TUBB MDH2 F13A1 NGFRAP1 MARCH2 HIST1H2BK IDH3G SUMO3 NDUFB5 TMEM219 U2AF1 MRPL23 LSM2 YWHAZ RABAC1 NDUFB2 UBE2B MOB2 SMARCA4 FDPS TECR ARPC5L CD83 HMOX1 DYNLT1 ENTPD1 GCA ADAP2
S100B C1orf63 MESDC2 MEAF6 IGBP1 LGALS3BP ZNF331 ILF3-AS1 SBDS NUP54 MAL AC092580.4 PKIG MAP3K8 MAD2L1 TIMM8B SUB1 PSMB6 TMEM40 CD82 PARVB THOC6 MT-ND3 DBNL OAS1 CLTA ISCU ATP5I EIF1B C4orf3 ENSA RAB7A RNF149 CMC1 CTD-2035E11.3 GNG2 SEPT6 LAMP1 IFIT2 LILRA3 NAAA OSM C20orf27
CD320 PPCS POLR3K RNPS1 METTL9 CAMK2G YME1L1 PSIP1 CHI3L2 SUSD3 BLNK IL4R TYMS MRPL28 C14orf166 ATP5J TREML1 PTTG1IP CORO1C POP7 OXA1L SNX17 ATG3 NDUFS7 MORF4L1 NDUFA2 EZR SEPT9 TMEM258 FAM49B SQRDL RAB9A RAB27A PHF14 PIM1 ARL6IP1 SIGLEC10 CSF1R EPSTI1 SULF2 SCO2
SH2D2A ADAR TLE4 KRT10 MRFAP1 TMC8 DDX18 AAK1 KIAA0125 SWAP70 EZH2 SPCS3 CYCS UQCRQ ITGA2B ACTN1 H1F0 LMNA TMEM147 EIF6 CD55 AHNAK NDUFS5 ALKBH7 DECR1 PAIP2 RBM8A OSTF1 UPP1 MRPL19 RRAGC TMEM109 ADD3 MRPS6 RELT APOBEC3B OAZ2 MGST2 NAPRT1
PLEKHF1 DNAJC15 MORF4L2 EIF4B NDUFA3 ADSL NDNL2 LDLRAP1 POU2AF1 NXT2 KIAA0101 EIF3A XRCC6 CALR CMTM5 HIST1H1C SOX4 SLC39A3 C1orf86 XAF1 MARCKSL1 ZNHIT1 NHP2L1 RNF7 MRPL43 DNAJA1 AP2M1 ARPC4 MGAT1 DHX36 FAM105A MAPK1IP1L HDAC1 LSM14A KYNU GPBAR1 HSBP1 IER3 EIF4EBP1
ZNHIT3 SF3A1 ARHGEF1 CMTM7 CCT2 TTC39C AL928768.3 HELQ RRM2 ISOC2 EIF4G2 PSMD8 CLDN5 RUFY1 DAPP1 C14orf2 WAS ENY2 VAMP5 SMDT1 MRPL54 RNASEH2B VAPA IRF1 ARHGDIA MMP24-AS1 GFER CYB561D2 ANXA2R RNF167 PIK3AP1 RXRA ATP6V0D1 GMPR NAGK
PRKAR1A SRSF11 SQSTM1 FAM107B MGAT4A RP5-887A10.1 CENPN GGCT ATPIF1 AURKAIP1 ERV3-1 RIOK3 PQBP1 PFDN2 COX17 SDCBP CCDC85B VDAC1 YPEL3 C9orf78 TMEM230 TBCB ANAPC13 CXXC1 CELF1 HMOX2 C19orf66 KIAA0930 OSCAR EIF4E2 SRA1
APBB1IP TAF7 DPP7 FGFR1OP2 OXNAD1 TNFRSF13B GMNN YIF1B SPCS2 MTDH TUBA1C ZNF263 DNAJC7 ACTR2 LYSMD2 SAMHD1 EIF3M HADHA CCT6A DYNLRB1 KRTCAP2 BAX PLIN2 STX18 DTNBP1 RPA2 SLC9A3R1 HES1 LILRB1 ZYX MIR24-2
IKZF1 KMT2E FNBP1 PIM2 NUCB2 TNFRSF17 TK1 HAUS4 PSMC5 PSMA5 BBC3 DERA BNIP3L ERP44 GRB2 SDHB CSDE1 PSMG2 DRAM2 SELK PDCD6 SELT IFIT1 PGM1 BAZ2A NDUFA5 TSC22D4 HES4 TNFRSF1B NR4A1
SNAP23 TAPSAR1 MGST3 DNAJB9 RGL4 GINS2 HDGF RANBP1 H2AFV PLA2G12A PICALM NENF TWF2 LSM6 LAMTOR2 SF1 ETFB CAPZA1 PYURF VDAC2 TMBIM4 ARRDC1 DYNLL2 HSH2D DENND2D BZW1 CAMK1 CAPNS1 CECR1
BEST1 TCEA1 NUCB1 DEGS1 CD40LG ZWINT PGP DUT TOMM22 PGRMC1 APP AKR1A1 EMC7 ZFAND5 NDUFS6 TRA2B COX14 SMAP2 S1PR4 FAM96B PSMD4 IFI27 APOBEC3C TBCC DEF6 CD300E UBE2D1 MIDN
TAOK3 UBXN4 TGOLN2 XXbac-BPG299F13.17 TRABD2A BIK REEP3 TBCA NDUFC2 FHL1 MID1IP1 NDUFS3 CHMP4B CD97 NDUFV2 PSMB4 PLEKHJ1 BCAP31 LMAN2 WDR1 RSAD2 ZBP1 PBXIP1 C5orf56 CD300LF THEMIS2 GRINA
CARD8 MED10 PTPN2 PARP1 CD6 CCNA2 LMNB1 HSP90B1 LSM4 SLC40A1 HADHB BAG1 SMS TGFB1 UQCRC2 PSMD9 TMEM256 ICAM2 TAPBP NDUFB7 ATP5SL ABHD14B RSRC2 ARHGEF40 NANS PTPRE
KLF3 SEC62 EVI2A TOB1 SH3YL1 BIRC5 ECH1 PDIA6 PNMA1 TREX1 LRPAP1 CAT GLIPR1 EIF2S2 ADI1 BLOC1S2 TMED2 SUMO1 RAB5C COMMD10 RGS1 HDDC2 CXCL16 ATOX1 CARS2
LIPA CCT4 TNIP1 SVIP CAMK4 XRCC5 TUFM TPM1 PARL SHKBP1 LTA4H RILPL2 MINOS1 NAA10 WBP2 SSBP1 GADD45B CHMP2A CD38 IFRD1 SMARCE1 TPPP3 MAPKAPK3 DNTTIP1
ZNF394 KIF5B EMC10 TMEM261 SATB1 HNRNPU P4HB CCDC69 TRAPPC2L FAM32A SSR3 MRPL20 TXN2 EIF5B ANXA11 GADD45GIP1 SRI ITGB7 DDX6 LILRA2 CBR1 MYO9B
CLK3 STK4 MTIF3 BEX4 FHIT NCL DEK HTATIP2 C7orf50 PMVK UQCRC1 TOMM20 MRPL41 MMADHC CD47 ACP1 SIVA1 DNAJC19 SDF4 EMR2 TNFRSF1A UBE2R2
SNX9 POLR3GL FRG1 CYLD USP10 NHP2 UBE2L3 ARHGAP4 BNIP2 YWHAE SERPINB6 C19orf60 NDUFA7 SLTM NDUFB4 PSMA1 RALY RNF139 DERL1 MS4A4A ADRBK1 LACTB
OCIAD1 CSRNP1 ASF1A UXS1 PA2G4 C19orf10 MRPL40 MRPS23 PARVG LSMD1 IDS RNF187 HAGH NDUFA12 RPS19BP1 BSG FKBP11 PRPF31 CTD-2006K23.1 TCIRG1 CDKN1A
MPHOSPH8 SLC38A1 CISH NOL7 MYEOV2 MTHFS FAM173A ACAA1 PHPT1 ATF4 OLA1 COQ7 PSMA2 RPN2 TCF25 SKAP1 SPSB3 C1QA NAMPT CREG1
CRBN CDKN1B PASK MDH1 PSMB2 DNAJC4 HBP1 NFKBIZ NDUFB1 DGUOK MRPL52 FBXW5 SSB BANF1 POLR2G NAP1L4 EBP ZNF703 ZNF106 FUOM
TMEM243 LSM5 TNFRSF25 CCT8 MRPL51 PRPF8 TIMMDC1 VMP1 SPG21 MRPS21 MPG PNKD CELF2 HMGN3 NDUFS8 YTHDF2 GCHFR CEBPA AP2A1 FBP1
PPIG G3BP1 CCDC104 HSPE1 TXN ZNF581 ABTB1 CYTH4 IFI27L2 CAMLG NDUFA9 TINF2 RPS27L KARS BUD31 STT3B MFSD10 ALDH3B1 C11orf21 PDXK
MED4 MPHOSPH10 INTS12 PHB SNRPD1 ACAP2 ZNF511 MTPN VIMP PPP1CC EMC6 SMARCB1 NUDC SHISA5 RNF213 REXO2 UBA2 C1QB NRROS PLIN3
UGP2 THAP7 NELL2 CCT3 SNRPC RPL26L1 TPP1 MYD88 COMT NDUFAF3 MVP SET CAPN2 IMP3 ATP6AP2 RBL2 ALDH9A1 HCAR3 MANBA ATF3
DCK ID3 AKTIP EWSR1 MRPS34 TRMT1 IFNGR1 AKIRIN2 MAP2K3 ANXA7 NDUFC1 LRCH4 IK C16orf13 MAP1LC3B NOP58 ORAI1 CXCL3 MBOAT7
WTAP SLC25A45 LINC00176 HNRNPR PSMD7 FAM195A DOCK2 COMMD9 CHCHD5 HAX1 COX7B SIAH2 EIF4H CDC42SE1 RER1 CD96 SURF4 PRKCD
ORMDL1 RP11-489E7.4 CD28 CBX3 ILF2 FAM192A IFIT3 STX10 MBNL1 MRPS16 UBE2J1 HMHA1 SPAG7 FMNL1 SH3BP5 B3GAT3 SGK1
CCDC107 GPATCH4 SCGB3A1 CACYBP CCT5 CINP ERICH1 HM13 NR4A2 ZNRD1 C12orf10 JAK1 SDHC DCTN3 EMB PLGRKT RAB34
TAGAP TNRC6C OSTC ATP5G1 ARPC1A VKORC1 MRP63 MKKS SMCHD1 TRAM1 EMC4 PCMT1 PRKCH GBP5
PDCD4 RP11-291B21.2 THOC7 SHFM1 RFXANK ECHDC1 SF3B2 STRA13 TANK COPS6 APH1A CDKN2D NUDT16L1 PRKD2
TCP1 HAPLN3 PRMT1 ANP32A PCGF5 SLA RAD23A GPI STX5 RAB2A ARL5A RBCK1 ODF2L TRPV2
CCNG1 HSPD1 NDUFAB1 SNX5 SNAPIN TRAPPC6A ANAPC15 THRAP3 DSTN SF3B14 LAPTM4A C14orf1 STAMBP
NONO MATR3 NUTF2 ERCC1 RTN3 PHB2 KXD1 FAM96A CCDC115 PIN1 CSK CYB5B MLLT11
IL16 EIF5A HSD17B10 NCKAP1L PEPD RSL24D1 MRPL16 ISCA2 RHOF SRRM2 CHMP5 PSMD5-AS1 SYNE1
LUC7L3 PDCD5 POLR2E IL10RB STK38 STK17B SLIRP AAMP EAPP CFLAR SSNA1 ORMDL3 TFDP2
FNTA UBE2N PPP1R7 PFKL C9orf89 LINC00493 UQCRB CHPT1 COPS5 CCNDBP1 WASF2 CCDC167
N4BP2L2 MAGOH SEC11C UNC119 CDV3 GGNBP2 CTNNBL1 REL SNX6 COPZ1 KDELR2 GIMAP6
TPR PPHLN1 SNRPD3 ATP2B1 HEXB NSMCE1 FAM50A ITGAE SARS TMED10 COX6A1P2 PYCR2
G3BP2 VDAC3 PSMC3 RABGAP1L ZNF524 CWC15 KDELR1 USF2 FBXO7 DDOST SELPLG RNF115
ELOVL5 NUCKS1 OTUB1 NFYC VMA21 PDCD2 IRF2BP2 ITGA4 GNAI3 SDHD WSB1 PTPN4
SCAF11 AIP MRPS7 MTSS1 PLD3 MRPS18B PSMD11 NSFL1C CCM2 MRPL34 C7orf73 SYNE2
PRPF38B RBM17 PSMA3 RALB ATP6V1B2 PRRC2C HARS GRPEL1 SNHG15 TMEM160 CMTM6 TMEM87A
SLC3A2 GTF2A2 CDK2AP2 DPEP2 CHIC2 CNPY2 SEC13 SAMD9L ABT1 ZMAT2 SUCLG1 RBM38
DPY30 STRAP SRM SNAP29 OAS3 RP11-1143G9.4 FAM89B CCS RNF5 IFI16 PRDX3 THAP11
CYTH1 SNRPE UFD1L GCH1 CEBPG NECAP2 NIT2 ACO2 FAM162A TRAPPC3 MIEN1 OBFC1
CCDC59 EPC1 POMP GINM1 CBWD1 DDX17 FLYWCH2 MRPL12 CCNH C17orf62 RNASEH2C CD59
WHSC1L1 RNF126 MRPL11 RIPK2 TXNL1 PTOV1 STARD7 KTN1 PDAP1 SH3GLB1 CCND2
SF3B1 ADH5 GTF3C6 UTP6 H1FX FH TRIM38 LCP2 RNF166 AKAP13 PHACTR4
RBM23 PPM1G SDF2L1 PACSIN2 MED28 TRA2A BCL7B TXNDC12 ELOF1 GOLT1B
BBX SNRPF PSMC2 LINC01003 SRRM1 COMMD5 GLUD1 IDH3B NDUFA6 MTFP1
NAA38 WBSCR22 NDUFB3 RBM25 CKS1B ELF1 MRPL55 ATP6V1E1 CXCR3
IRF9 MAPRE1 TUBB4B DCXR DUSP22 COMMD8 POLR2F FKBP2 GALM
MAT2B METTL23 XRN2 SFPQ PTPN1 DCTN2 ARHGAP30 CAST ACD
GPBP1 SNRPA1 CARHSP1 CHMP3 SRSF6 TMEM126B DDX46 SFT2D1 TNFRSF18
CHD2 AKR1B1 C11orf48 DAP3 TEN1 NME4 COMMD7 CISD3
EGLN2 PPP1R12A NDUFS4 MRPS15 COMMD4 CCZ1 PPP1R11 NDUFB6
ARL2BP STOML2 TCEB1 EIF2A TSSC1 RPF1 AUP1 UBE2F
RBBP7 MRPL9 IMP4 MFNG CWC25 YY1 PPP2R1A PSENEN
RPAIN MFF MRPS18C PSMF1 CEPT1 RAB11A PSMD13 FAM204A
NKTR PITHD1 SEC61G PHF5A CHD9 KPNB1 PET100 SCAMP2
PNN ANAPC5 ZDHHC12 TAF9 CD81 PDCL3 ITGB1BP1 NAPA
DARS CDC123 PTBP1 NDUFA10 RFC2 GLG1 TEX264 OS9
NCBP2 SNRPG HNRNPD SNW1 ACP5 METTL5 CSNK1A1 ASNA1
AATF LYRM4 NUDT1 FDX1 UROS LSM1 MLX
RSF1 GLRX3 PAK2 ECHS1 MAGED2 GSDMD MEA1
CHCHD7 SRP19 SEPT2 MT-ND4L ZCRB1 RTF1 UBE2A
AL592183.1 C17orf89 AK2 DHPS CCDC53 NME3 EMC3
BTF3L4 SNRNP70 RPS6KB2 PMF1 APOL3 TAF12 STK10
FLI1 PSMD6 TIMM17A SUCLG2 SHARPIN PUF60 TPGS1
NR3C1 NUDT5 DCTPP1 PRKCSH PPA2 IAH1 TMEM141
ITSN2 DDX39A HMGN2 BABAM1 ZFAND2B PDLIM2 PKN1
ZC3HAV1 HSPA9 MTCH2 PABPC4 SLFN5 MGMT UBE2E3
MTERFD2 VCP MRPL47 ATG12 CCDC90B MRPS12 CCDC124
GRAMD1A SRSF1 UQCC2 NXT1 TMBIM1 BRD2 COA3
ZNF24 EIF4A3 MRPL36 PPIE TSNAX CANX RPN1
USE1 CLPP NKAP PIH1D1 SEPHS2 NAA20 RAB4A
USP16 TTC1 DAZAP1 UBE2E1 SLC44A2 NOP56 TADA3
ARFGAP2 IDI1 MRPL15 QARS ASCC2 FUNDC2 MRPS11
FRG1B MRPS26 TIMM10 GID8 BFAR MTCH1 DNAJC2
TIMM9 HNRNPUL1 DTYMK MRPL3 CCDC25 TXNL4A FKBP5
MARCH7 HINT2 TOP1 USP15 NRBP1 PTRHD1 RRP7A
LENG1 SLBP PSMG3 MRPL32 PHF3 SLC25A11 DNAJB11
CYB5A PSMD14 DLD MAPKAPK5-AS1 SYS1 COX16 YIPF3
SMIM19 FKBP3 DCPS PTP4A1 PPM1B NDUFV1 DIAPH1
LINC-PINT SDHA TFDP1 EIF1AX YAF2 ARHGAP9 HNRNPH2
OARD1 NCBP2-AS2 RP11-139H15.1 LNPEP LSM3 PFDN6
TTC14 SNRNP40 FXR1 PEX16 GPAA1
TMEM242 RAD21 EIF2B1 SRSF4 MAPK1
DDIT3 VBP1 ESD PHF11 ACAA2
SAFB2 C14orf119 SF3A3 VTI1B PANK2
CEBPZ-AS1 FAM177A1 MCTS1 AKR7A2 IFNAR2
BRD9 HPRT1 CCNC URM1 MAD2L2
PRDM2 SLC25A39 COA6 AHSA1 CCDC88C
CD84 RPA3 PHYKPL JAGN1 RAB4B
EGR1 SNRPA CAMTA1 GRHPR G6PD
GLRX5 C19orf25 ROMO1 ARFGAP3
TIAL1 CHCHD1 PFDN1 BCCIP
EXOSC8 EIF4E STX8 RABL6
LBR C11orf83 RNPEPL1 SMC4
ILF3 SREK1 TMEM248 MVD
SAP30BP ANKRD11 GGA1 MRPS28
ACTR10 TMEM138 IFT20
NUDT21 PTGES2 AKAP9
UBE2G1 LGALS8 C18orf32
LARP7 MLEC BRMS1
PPIH C5orf15 CAPN1
EBNA1BP2 FEM1B DDRGK1
VOPP1 RAD23B
GNL3 WDR33
CISD2 WDR61
SSRP1 SUGT1
PDIA4 LAGE3
SYNCRIP RBBP4
C1orf35 ESYT1
MAP7D1 APOA1BP
DNAJC9 MIF4GD
HAUS1 CFDP1
ILKAP UBE2J2
RPUSD3 MRPS5
CDKN2AIPNL SRPK2
POLD2 FAM200B
HNRNPAB C17orf49
TMEM106C NUBP2
CBX1 PRKAG1
NAA50 SURF2
MCM3 SSSCA1
CISD1 EI24
TFPT CSNK1D
UBA5 DCTD
LMAN1 PEX2
PGRMC2 PNKP
C19orf48 TMEM70
C14orf142 JMJD6
PAICS DCAF5
RCE1

Module lookup

If you want to quickly find which module a particular feature was assigned to, the featureModuleLookup function can be used. Here will will look up a marker gene for T-cells called “CD3E”:

mod <- featureModuleLookup(sce, feature = c("CD3E", "S100A8"))
mod
##   CD3E S100A8 
##     27     70

Module heatmaps

The function moduleHeatmap can be used to view the expression of features across cells for a specific module. The featureModule parameter denotes the module(s) to be displayed. Cells are ordered from those with the lowest probability of the module on the left to the highest probability on the right. Similarly, features are ordered from those with the highest probability within the module on the top to the lowest probability on the bottom.

moduleHeatmap(sce, featureModule = 27, useAssay = useAssay, altExpName = altExpName)

The parameter topCells can be used to control the number of cells included in the heatmap. By default, only the 100 cells with the lowest probabilities and the 100 cells with the highest probabilities for each selected module are included (i.e. topCells = 100 by default). To display all cells, this parameter can be set to NULL:

moduleHeatmap(sce, featureModule = 27, topCells = NULL, useAssay = useAssay, altExpName = altExpName)

Note: Multiple modules can be displayed by giving a vector of module indices to the parameter featureModule. If featureModule is not specified, then all modules will be plotted.

Module probabilities on 2-D embeddings

The function plotDimReduceModule can be used visualize the probabilities of a particular module or sets of modules on a reduced dimensional plot such as a UMAP. This can be another quick method to see how modules are expressed across various cells in 2-D space. As an example, we can look at module 70 which contained S100A8:

plotDimReduceModule(sce, modules = 70, useAssay = useAssay, altExpName = altExpName, reducedDimName = "celda_UMAP")

Similarly, multiple modules can be plotting in a grid of UMAPs:

plotDimReduceModule(sce, modules = 70:78, useAssay = useAssay, altExpName = altExpName, reducedDimName = "celda_UMAP")

In this grid, we can see that module 70 (which has high levels of S100A8 and S100A9) is highly expressed in cell populations 2 and 3, module 71 (which contains CD14) can be used to identify all CD14+ monocytes, module 72 (which contains CST3) is expressed across both CD14 and FCGR3A (CD16) expressing monocytes, and module 73 (which contains CD4) is expressed broadly across both monocytes and dendritic cells as well as some T-cell populations. If we were interesting in defining transcriptional programs active across all monocytes, we could examine the genes found in module 72. If we were interested in defining transcriptional programs for all CD14+ monocytes, we could examine the genes in module 71. These patterns can also be observed in the Probability Map

In the celda probability map, we saw that the unknown T-cell population 13 had high levels of module 30. We can examine both module heatmaps and module probability maps to further explore this:

moduleHeatmap(sce, featureModule = 30, useAssay = useAssay, altExpName = altExpName)

plotDimReduceModule(sce, modules = 30, useAssay = useAssay, altExpName = altExpName, reducedDimName = "celda_UMAP")

Module 30 has high levels of genes associated with proliferation including HMGA1, STMN1, PCNA, HMGB2, and TUBA1B. We can therefore re-label these cells as “Proliferating T-cells”.

Identification and plotting of marker genes

In addition to examining modules, differential expression can be used to identify potential marker genes up-regulated in specific cell populations. The function findMarkerDiffExp in the singleCellTK package will find markers up-regulated in each cell population compared to all the others.

Differential expression to identify marker genes

# Normalize counts (if not performed previously)
library(scater)
sce <- logNormCounts(sce, exprs_values = useAssay, name = "logcounts")

# Run differential expression analysis
sce <- findMarkerDiffExp(sce, useAssay = "logcounts", method = "wilcox", cluster = celdaClusters(sce), minMeanExpr = 0, fdrThreshold = 0.05, log2fcThreshold = 0, minClustExprPerc = 0, maxCtrlExprPerc = 1)

The function plotMarkerDiffExp can be used to plot the results in a heatmap. The topN parameter will plot the top N ranked genes for each cluster.

# Plot differentially expressed genes that pass additional thresholds 'minClustExprPerc' and 'maxCtrlExprPerc'
plotMarkerDiffExp(sce, topN = 5, log2fcThreshold = 0, rowLabel = TRUE, fdrThreshold = 0.05, minClustExprPerc = 0.6, maxCtrlExprPerc = 0.4, minMeanExpr = 0)

Other parameters such as minClustExprPerc (the minimum number of cells expressing the marker gene in the cluster) and maxCtrlExprPerc (the maximum number of cells expression the marker gene in other clusters) can be used to control how specific each marker gene is to each cell populations. Similarly, adding a log2 fold-change cutoff (e.g. 1) can select for markers that are more strongly up-regulated in a cell population.

Violin plots for marker genes

The plotCeldaViolin function can be used to examine the distribution of expression of various features across cell population clusters derived from celda. Here we can see that the gene CD79A has high expression in the B-cell cluster and HMGB2 has high expression in the proliferating T-cell population.

# Normalize counts if not performed in previous steps
library(scater)
sce <- logNormCounts(sce, exprs_values = useAssay, name = "logcounts")

# Make violin plots for marker genes
plotCeldaViolin(sce, useAssay = "logcounts", features = c("CD79A", "HMGB2"))

Generating HTML reports

The celda package comes with two functions for generating comprehensive HTML reports that 1) capture the process of selecting K/L for a celda_CG model and 2) plot the results from the downstream analysis. The first report runs both recursiveSplitModule and recursiveSplitCell for selection of L and K, respectively. To recapitulate the complete analysis presented in this tutorial in the HTML report, the following command can be used:

sce <- reportCeldaCGRun(sce, sampleLabel = NULL, useAssay = useAssay, altExpName = altExpName, minCell = 3, minCount = 3, initialL = 10, maxL = 150, initialK = 3, maxK = 25, L = 80, K = 14)

All of the parameters in this function are the same that were used throughout this tutorial in the selectFeatures, recursiveSplitModule, and recursiveSplitCell functions. Note that this report does not do cell filtering, so that must be completed before running this function. The returned SCE object will have the celda_CG model with selected K and L which can be used in any of the downstream plotting functions as well as input into the second plotting report described next.

The second report takes in as input an SCE object with a fitted celda_CG model and systematically generates several plots that facilitate exploratory analysis including cell subpopulation cluster labels on 2-D embeddings, user-specified annotations on 2-D embeddings, module heatmaps, module probabilities, expression of marker genes on 2-D embeddings, and the celda probability map. The report can be generated with the following code:

reportCeldaCGPlotResults(sce, reducedDimName = "celda_UMAP", features = markers, useAssay = useAssay, altExpName = altExpName, cellAnnot = c("total", "detected", "decontX_contamination", "subsets_mito_percent"), cellAnnotLabel = "scDblFinder_class")

User-supplied annotations to plot on the 2-D embedding can be specified through the cellAnnot and cellAnnotLabel variables. Both parameters will allow for plotting of variables stored in the colData of the SCE on the 2-D embedding plot specified by reducedDimName parameter. For cellAnnot, integer and numeric variables will be plotted as as continuous variables while factors and characters will be plotted as categorical variables. For cellAnnotLabel, all variables will be coerced to a factor and the labels of the categories will be plotted on the scatter plot.

Other useful functions

Matrix factorization

The celda model factorizes the original matrix into three matrices:

1) module - The probability of each feature in each module (Psi)

2) cellPopulation - The probability of each module in each cell population (Phi)

3) sample - The probability of each cell population in each sample (Theta)

Additionally, we can calculate the probability of each module within each cell (cell). The cell matrix can essentially be used to replace PCs from PCA and is useful for downstream visualization (e.g. generating 2-D embeddings). All of these matrices can be retrieved with the factorizeMatrix function. The matrices are returned in three different versions: unnormalized counts, proportions (normalized by the total), or posterior estimates (where the Dirichlet concentration parameter is added in before normalization).

# Factorize the original counts matrix
fm <- factorizeMatrix(sce)

# Three different version of each matrix:
names(fm)
## [1] "counts"      "proportions" "posterior"
# Get normalized proportional matrices
dim(fm$proportions$cell) # Matrix of module probabilities for each cell
## [1]   80 2675
dim(fm$proportions$module) # Matrix of feature probabilities for each module
## [1] 2639   80
dim(fm$proportions$cellPopulation) # Matrix of module probabilities for each cell population
## [1] 80 14
dim(fm$proportions$sample) # Matrix of cell population probabilities in each sample
## [1] 14  1

Changing the feature display name

The parameter displayName can be used to change the labels of the rows from the rownames to a column in the rowData of the SCE object. The function is available in plotDimReduceFeature and moduleHeatmap. For example, if we did not change the rownames to Symbol_TENx in the beginning of the tutorial, the following code still could be run in moduleHeatmap to display the gene symbol even if the rownames were set to the original Ensembl IDs:

moduleHeatmap(sce, featureModule = 27, useAssay = useAssay, altExpName = altExpName, displayName = "Symbol_TENx")

Session information

sessionInfo()

## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] scater_1.18.6               kableExtra_1.3.4           
##  [3] knitr_1.31                  ggplot2_3.3.5              
##  [5] celda_1.9.2                 singleCellTK_2.1.3         
##  [7] TENxPBMCData_1.8.0          HDF5Array_1.18.1           
##  [9] rhdf5_2.34.0                DelayedArray_0.16.2        
## [11] Matrix_1.3-2                SingleCellExperiment_1.12.0
## [13] SummarizedExperiment_1.20.0 Biobase_2.50.0             
## [15] GenomicRanges_1.42.0        GenomeInfoDb_1.26.4        
## [17] IRanges_2.24.1              S4Vectors_0.28.1           
## [19] BiocGenerics_0.36.0         MatrixGenerics_1.2.1       
## [21] matrixStats_0.58.0         
## 
## loaded via a namespace (and not attached):
##   [1] utf8_1.2.1                    reticulate_1.18              
##   [3] R.utils_2.10.1                tidyselect_1.1.0             
##   [5] RSQLite_2.2.4                 AnnotationDbi_1.52.0         
##   [7] grid_4.0.4                    combinat_0.0-8               
##   [9] BiocParallel_1.24.1           Rtsne_0.15                   
##  [11] scDblFinder_1.4.0             DropletUtils_1.10.3          
##  [13] munsell_0.5.0                 codetools_0.2-18             
##  [15] ragg_1.1.3                    statmod_1.4.35               
##  [17] scran_1.18.5                  xgboost_1.3.2.1              
##  [19] withr_2.4.1                   colorspace_2.0-0             
##  [21] highr_0.8                     rstudioapi_0.13              
##  [23] assertive.base_0.0-9          labeling_0.4.2               
##  [25] GenomeInfoDbData_1.2.4        GSVAdata_1.26.0              
##  [27] bit64_4.0.5                   farver_2.1.0                 
##  [29] rprojroot_2.0.2               vctrs_0.3.6                  
##  [31] generics_0.1.0                xfun_0.22                    
##  [33] BiocFileCache_1.14.0          fishpond_1.6.0               
##  [35] R6_2.5.0                      doParallel_1.0.16            
##  [37] ggbeeswarm_0.6.0              clue_0.3-58                  
##  [39] rsvd_1.0.3                    RcppEigen_0.3.3.9.1          
##  [41] locfit_1.5-9.4                bitops_1.0-6                 
##  [43] rhdf5filters_1.2.0            cachem_1.0.4                 
##  [45] gridGraphics_0.5-1            assertthat_0.2.1             
##  [47] promises_1.2.0.1              scales_1.1.1                 
##  [49] beeswarm_0.3.1                gtable_0.3.0                 
##  [51] beachmat_2.6.4                Cairo_1.5-12.2               
##  [53] rlang_0.4.10                  systemfonts_1.0.1            
##  [55] GlobalOptions_0.1.2           BiocManager_1.30.10          
##  [57] yaml_2.2.1                    reshape2_1.4.4               
##  [59] httpuv_1.5.5                  tools_4.0.4                  
##  [61] ellipsis_0.3.1                jquerylib_0.1.3              
##  [63] RColorBrewer_1.1-2            Rcpp_1.0.6                   
##  [65] plyr_1.8.6                    sparseMatrixStats_1.2.1      
##  [67] zlibbioc_1.36.0               purrr_0.3.4                  
##  [69] RCurl_1.98-1.2                dbscan_1.1-6                 
##  [71] GetoptLong_1.0.5              viridis_0.5.1                
##  [73] cowplot_1.1.1                 cluster_2.1.1                
##  [75] ggrepel_0.9.1                 fs_1.5.0                     
##  [77] magrittr_2.0.1                data.table_1.14.0            
##  [79] RSpectra_0.16-0               magick_2.7.0                 
##  [81] circlize_0.4.12               mime_0.10                    
##  [83] evaluate_0.14                 xtable_1.8-4                 
##  [85] gridExtra_2.3                 shape_1.4.5                  
##  [87] compiler_4.0.4                tibble_3.1.0                 
##  [89] crayon_1.4.1                  R.oo_1.24.0                  
##  [91] htmltools_0.5.1.1             later_1.1.0.1                
##  [93] MCMCprecision_0.4.0           DBI_1.1.1                    
##  [95] ExperimentHub_1.16.0          assertive.files_0.0-2        
##  [97] dbplyr_2.1.0                  ComplexHeatmap_2.6.2         
##  [99] rappdirs_0.3.3                assertive.numbers_0.0-2      
## [101] assertive.types_0.0-3         R.methodsS3_1.8.1            
## [103] igraph_1.2.6                  pkgconfig_2.0.3              
## [105] pkgdown_1.6.1                 scuttle_1.0.4                
## [107] xml2_1.3.2                    foreach_1.5.1                
## [109] svglite_2.0.0                 vipor_0.4.5                  
## [111] bslib_0.2.4                   dqrng_0.2.1                  
## [113] webshot_0.5.2                 XVector_0.30.0               
## [115] rvest_1.0.0                   stringr_1.4.0                
## [117] digest_0.6.27                 rmarkdown_2.7                
## [119] enrichR_3.0                   uwot_0.1.10                  
## [121] edgeR_3.32.1                  DelayedMatrixStats_1.12.3    
## [123] curl_4.3                      shiny_1.6.0                  
## [125] gtools_3.8.2                  rjson_0.2.20                 
## [127] lifecycle_1.0.0               jsonlite_1.7.2               
## [129] Rhdf5lib_1.12.1               BiocNeighbors_1.8.2          
## [131] desc_1.3.0                    viridisLite_0.3.0            
## [133] limma_3.46.0                  fansi_0.4.2                  
## [135] pillar_1.5.1                  lattice_0.20-41              
## [137] fastmap_1.1.0                 httr_1.4.2                   
## [139] interactiveDisplayBase_1.28.0 glue_1.4.2                   
## [141] FNN_1.1.3                     png_0.1-7                    
## [143] iterators_1.0.13              multipanelfigure_2.1.2       
## [145] bluster_1.0.0                 BiocVersion_3.12.0           
## [147] bit_4.0.4                     assertive.properties_0.0-4   
## [149] stringi_1.5.3                 sass_0.3.1                   
## [151] blob_1.2.1                    textshaping_0.3.5            
## [153] BiocSingular_1.6.0            AnnotationHub_2.22.0         
## [155] memoise_2.0.0                 dplyr_1.0.5                  
## [157] irlba_2.3.3