Clusters the rows and columns of a count matrix containing
single-cell data into L modules and K subpopulations, respectively. The
useAssay
assay slot in
altExpName
altExp slot will be used if
it exists. Otherwise, the useAssay
assay slot in x
will be used if
x
is a SingleCellExperiment object.
celda_CG(
x,
useAssay = "counts",
altExpName = "featureSubset",
sampleLabel = NULL,
K,
L,
alpha = 1,
beta = 1,
delta = 1,
gamma = 1,
algorithm = c("EM", "Gibbs"),
stopIter = 10,
maxIter = 200,
splitOnIter = 10,
splitOnLast = TRUE,
seed = 12345,
nchains = 3,
zInitialize = c("split", "random", "predefined"),
yInitialize = c("split", "random", "predefined"),
countChecksum = NULL,
zInit = NULL,
yInit = NULL,
logfile = NULL,
verbose = TRUE
)
# S4 method for SingleCellExperiment
celda_CG(
x,
useAssay = "counts",
altExpName = "featureSubset",
sampleLabel = NULL,
K,
L,
alpha = 1,
beta = 1,
delta = 1,
gamma = 1,
algorithm = c("EM", "Gibbs"),
stopIter = 10,
maxIter = 200,
splitOnIter = 10,
splitOnLast = TRUE,
seed = 12345,
nchains = 3,
zInitialize = c("split", "random", "predefined"),
yInitialize = c("split", "random", "predefined"),
countChecksum = NULL,
zInit = NULL,
yInit = NULL,
logfile = NULL,
verbose = TRUE
)
# S4 method for ANY
celda_CG(
x,
useAssay = "counts",
altExpName = "featureSubset",
sampleLabel = NULL,
K,
L,
alpha = 1,
beta = 1,
delta = 1,
gamma = 1,
algorithm = c("EM", "Gibbs"),
stopIter = 10,
maxIter = 200,
splitOnIter = 10,
splitOnLast = TRUE,
seed = 12345,
nchains = 3,
zInitialize = c("split", "random", "predefined"),
yInitialize = c("split", "random", "predefined"),
countChecksum = NULL,
zInit = NULL,
yInit = NULL,
logfile = NULL,
verbose = TRUE
)
A SingleCellExperiment
with the matrix located in the assay slot under useAssay
.
Rows represent features and columns represent cells. Alternatively,
any matrix-like object that can be coerced to a sparse matrix of class
"dgCMatrix" can be directly used as input. The matrix will automatically be
converted to a SingleCellExperiment object.
A string specifying the name of the assay slot to use. Default "counts".
The name for the altExp slot to use. Default "featureSubset".
Vector or factor. Denotes the sample label for each cell (column) in the count matrix.
Integer. Number of cell populations.
Integer. Number of feature modules.
Numeric. Concentration parameter for Theta. Adds a pseudocount to each cell population in each sample. Default 1.
Numeric. Concentration parameter for Phi. Adds a pseudocount to each feature module in each cell population. Default 1.
Numeric. Concentration parameter for Psi. Adds a pseudocount to each feature in each module. Default 1.
Numeric. Concentration parameter for Eta. Adds a pseudocount to the number of features in each module. Default 1.
String. Algorithm to use for clustering cell subpopulations. One of 'EM' or 'Gibbs'. The EM algorithm for cell clustering is faster, especially for larger numbers of cells. However, more chains may be required to ensure a good solution is found. Default 'EM'.
Integer. Number of iterations without improvement in the log likelihood to stop inference. Default 10.
Integer. Maximum number of iterations of Gibbs sampling to perform. Default 200.
Integer. On every splitOnIter
iteration,
a heuristic
will be applied to determine if a cell population or feature module should
be reassigned and another cell population or feature module should be split
into two clusters. To disable splitting, set to -1. Default 10.
Integer. After stopIter
iterations have been
performed without improvement, a heuristic will be applied to determine if
a cell population or feature module should be reassigned and another cell
population or feature module should be split into two clusters. If a split
occurs, then 'stopIter' will be reset. Default TRUE.
Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made.
Integer. Number of random cluster initializations. Default 3.
Chararacter. One of 'random', 'split', or 'predefined'.
With 'random', cells are randomly assigned to a populations. With 'split',
cells will be split into sqrt(K) populations and then each population will
be subsequently split into another sqrt(K) populations. With 'predefined',
values in zInit
will be used to initialize z
. Default 'split'.
Character. One of 'random', 'split', or 'predefined'.
With 'random', features are randomly assigned to a modules. With 'split',
features will be split into sqrt(L) modules and then each module will be
subsequently split into another sqrt(L) modules. With 'predefined', values
in yInit
will be used to initialize y
. Default 'split'.
Character. An MD5 checksum for the counts matrix. Default NULL.
Integer vector. Sets initial starting values of z. 'zInit' is only used when `zInitialize = 'predfined'`. Default NULL.
Integer vector. Sets initial starting values of y. 'yInit' is only be used when `yInitialize = "predefined"`. Default NULL.
Character. Messages will be redirected to a file named `logfile`. If NULL, messages will be printed to stdout. Default NULL.
Logical. Whether to print log messages. Default TRUE.
A SingleCellExperiment object. Function parameter settings are stored in metadata
"celda_parameters"
in altExp slot.
In altExp slot,
columns celda_sample_label
and celda_cell_cluster
in
colData contain sample labels and celda cell
population clusters. Column celda_feature_module
in
rowData contains feature modules.
celda_G for feature clustering and celda_C for clustering cells. celdaGridSearch can be used to run multiple values of K/L and multiple chains in parallel.
data(celdaCGSim)
sce <- celda_CG(celdaCGSim$counts,
K = celdaCGSim$K,
L = celdaCGSim$L,
sampleLabel = celdaCGSim$sampleLabel,
nchains = 1)
#> --------------------------------------------------
#> Starting Celda_CG: Clustering cells and genes.
#> --------------------------------------------------
#> Mon Nov 6 07:59:35 2023 .. Initializing 'z' in chain 1 with 'split'
#> Mon Nov 6 07:59:35 2023 .. Initializing 'y' in chain 1 with 'split'
#> Mon Nov 6 07:59:38 2023 .... Completed iteration: 1 | logLik: -1215542.98684529
#> Mon Nov 6 07:59:38 2023 .... Completed iteration: 2 | logLik: -1215541.0958389
#> Mon Nov 6 07:59:38 2023 .... Completed iteration: 3 | logLik: -1215541.0958389
#> Mon Nov 6 07:59:38 2023 .... Completed iteration: 4 | logLik: -1215542.98684529
#> Mon Nov 6 07:59:38 2023 .... Completed iteration: 5 | logLik: -1215541.0958389
#> Mon Nov 6 07:59:38 2023 .... Completed iteration: 6 | logLik: -1215541.0958389
#> Mon Nov 6 07:59:38 2023 .... Completed iteration: 7 | logLik: -1215541.0958389
#> Mon Nov 6 07:59:38 2023 .... Completed iteration: 8 | logLik: -1215541.0958389
#> Mon Nov 6 07:59:38 2023 .... Completed iteration: 9 | logLik: -1215541.0958389
#> Mon Nov 6 07:59:38 2023 .... Determining if any gene clusters should be split.
#> Mon Nov 6 07:59:38 2023 .... No additional splitting was performed.
#> Mon Nov 6 07:59:38 2023 .... Determining if any cell clusters should be split.
#> Mon Nov 6 07:59:38 2023 .... No additional splitting was performed.
#> Mon Nov 6 07:59:38 2023 .... Completed iteration: 10 | logLik: -1215541.0958389
#> Mon Nov 6 07:59:38 2023 .... Determining if any cell clusters should be split.
#> Mon Nov 6 07:59:38 2023 .... No additional splitting was performed.
#> Mon Nov 6 07:59:38 2023 .... Completed iteration: 11 | logLik: -1215541.0958389
#> Mon Nov 6 07:59:38 2023 .. Finished chain 1
#> --------------------------------------------------
#> Completed Celda_CG. Total time: 3.065807 secs
#> --------------------------------------------------