A wrapper function for decontX. Identify potential contamination from experimental factors such as ambient RNA.

runDecontX(
  inSCE,
  sample = NULL,
  useAssay = "counts",
  background = NULL,
  bgAssayName = NULL,
  bgBatch = NULL,
  z = NULL,
  maxIter = 500,
  delta = c(10, 10),
  estimateDelta = TRUE,
  convergence = 0.001,
  iterLogLik = 10,
  varGenes = 5000,
  dbscanEps = 1,
  seed = 12345,
  logfile = NULL,
  verbose = TRUE
)

Arguments

inSCE

A SingleCellExperiment object.

sample

A single character specifying a name that can be found in colData(inSCE) to directly use the cell annotation; or a character vector with as many elements as cells to indicates which sample each cell belongs to. Default NULL. decontX will be run on cells from each sample separately.

useAssay

A string specifying which assay in the SCE to use. Default 'counts'.

background

A SingleCellExperiment with the matrix located in the assay slot under bgAssayName. It should have the same structure as inSCE except it contains the matrix of empty droplets instead of cells. When supplied, empirical distribution of transcripts from these empty droplets will be used as the contamination distribution. Default NULL.

bgAssayName

Character. Name of the assay to use if background is a SingleCellExperiment. If NULL, the function will use the same value as useAssay. Default is NULL.

bgBatch

Batch labels for background. If background is a SingleCellExperiment object, this can be a single character specifying a name that can be found in colData(background) to directly use the barcode annotation; or a numeric / character vector that has as many elements as barcodes to indicate which sample each barcode belongs to. Its unique values should be the same as those in sample, such that each batch of cells have their corresponding batch of empty droplets as background, pointed by this parameter. Default to NULL.

z

Numeric or character vector. Cell cluster labels. If NULL, PCA will be used to reduce the dimensionality of the dataset initially, 'umap' from the 'uwot' package will be used to further reduce the dataset to 2 dimenions and the 'dbscan' function from the 'dbscan' package will be used to identify clusters of broad cell types. Default NULL.

maxIter

Integer. Maximum iterations of the EM algorithm. Default 500.

delta

Numeric Vector of length 2. Concentration parameters for the Dirichlet prior for the contamination in each cell. The first element is the prior for the native counts while the second element is the prior for the contamination counts. These essentially act as pseudocounts for the native and contamination in each cell. If estimateDelta = TRUE, this is only used to produce a random sample of proportions for an initial value of contamination in each cell. Then fit_dirichlet is used to update delta in each iteration. If estimateDelta = FALSE, then delta is fixed with these values for the entire inference procedure. Fixing delta and setting a high number in the second element will force decontX to be more aggressive and estimate higher levels of contamination at the expense of potentially removing native expression. Default c(10, 10).

estimateDelta

Boolean. Whether to update delta at each iteration.

convergence

Numeric. The EM algorithm will be stopped if the maximum difference in the contamination estimates between the previous and current iterations is less than this. Default 0.001.

iterLogLik

Integer. Calculate log likelihood every iterLogLik iteration. Default 10.

varGenes

Integer. The number of variable genes to use in dimensionality reduction before clustering. Variability is calcualted using modelGeneVar function from the 'scran' package. Used only when z is not provided. Default 5000.

dbscanEps

Numeric. The clustering resolution parameter used in 'dbscan' to estimate broad cell clusters. Used only when z is not provided. Default 1.

seed

Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made.

logfile

Character. Messages will be redirected to a file named `logfile`. If NULL, messages will be printed to stdout. Default NULL.

verbose

Logical. Whether to print log messages. Default TRUE.

Value

A SingleCellExperiment object with 'decontX_Contamination' and 'decontX_Clusters' added to the

colData slot. Additionally, the decontaminated counts will be added as an assay called 'decontXCounts'.

Examples

data(scExample, package = "singleCellTK")
sce <- subsetSCECols(sce, colData = "type != 'EmptyDroplet'")
sce <- runDecontX(sce[,sample(ncol(sce),20)])
#> Sat Mar 18 10:30:33 2023 ... Running 'DecontX'
#> --------------------------------------------------
#> Starting DecontX
#> --------------------------------------------------
#> Sat Mar 18 10:30:33 2023 .. Analyzing all cells
#> Sat Mar 18 10:30:33 2023 .... Generating UMAP and estimating cell types
#> Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'
#> Also defined by ‘spam’
#> Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'
#> Also defined by ‘spam’
#> Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'
#> Also defined by ‘spam’
#> Sat Mar 18 10:30:36 2023 .... Estimating contamination
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 10 | converge: 0.02473
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 20 | converge: 0.00978
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 30 | converge: 0.003904
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 40 | converge: 0.001642
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 50 | converge: 0.00151
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 60 | converge: 0.001521
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 70 | converge: 0.001474
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 80 | converge: 0.001358
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 90 | converge: 0.001361
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 100 | converge: 0.001303
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 110 | converge: 0.001346
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 120 | converge: 0.001485
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 130 | converge: 0.001613
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 140 | converge: 0.00174
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 150 | converge: 0.001866
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 160 | converge: 0.001989
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 170 | converge: 0.002103
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 180 | converge: 0.002199
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 190 | converge: 0.002262
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 200 | converge: 0.002297
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 210 | converge: 0.002308
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 220 | converge: 0.002303
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 230 | converge: 0.002288
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 240 | converge: 0.002262
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 250 | converge: 0.002227
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 260 | converge: 0.002182
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 270 | converge: 0.002128
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 280 | converge: 0.002066
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 290 | converge: 0.001979
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 300 | converge: 0.001897
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 310 | converge: 0.001827
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 320 | converge: 0.001772
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 330 | converge: 0.001665
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 340 | converge: 0.001546
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 350 | converge: 0.001421
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 360 | converge: 0.001297
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 370 | converge: 0.001178
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 380 | converge: 0.001064
#> Sat Mar 18 10:30:36 2023 ...... Completed iteration: 386 | converge: 0.0009989
#> Sat Mar 18 10:30:36 2023 .. Calculating final decontaminated matrix
#> --------------------------------------------------
#> Completed DecontX. Total time: 2.920179 secs
#> --------------------------------------------------