Uses the celda_G model to cluster features into modules for a range of possible L's. The module labels of the previous "L-1" model are used as the initial values in the current model with L modules. The best split of an existing module is found to create the L-th module. This procedure is much faster than randomly initializing each model with a different L.
recursiveSplitModule(
x,
useAssay = "counts",
altExpName = "featureSubset",
initialL = 10,
maxL = 100,
tempK = 100,
zInit = NULL,
sampleLabel = NULL,
alpha = 1,
beta = 1,
delta = 1,
gamma = 1,
minFeature = 3,
reorder = TRUE,
seed = 12345,
perplexity = TRUE,
doResampling = FALSE,
numResample = 5,
verbose = TRUE,
logfile = NULL
)
# S4 method for SingleCellExperiment
recursiveSplitModule(
x,
useAssay = "counts",
altExpName = "featureSubset",
initialL = 10,
maxL = 100,
tempK = 100,
zInit = NULL,
sampleLabel = NULL,
alpha = 1,
beta = 1,
delta = 1,
gamma = 1,
minFeature = 3,
reorder = TRUE,
seed = 12345,
perplexity = TRUE,
doResampling = FALSE,
numResample = 5,
verbose = TRUE,
logfile = NULL
)
# S4 method for matrix
recursiveSplitModule(
x,
useAssay = "counts",
altExpName = "featureSubset",
initialL = 10,
maxL = 100,
tempK = 100,
zInit = NULL,
sampleLabel = NULL,
alpha = 1,
beta = 1,
delta = 1,
gamma = 1,
minFeature = 3,
reorder = TRUE,
seed = 12345,
perplexity = TRUE,
doResampling = FALSE,
numResample = 5,
verbose = TRUE,
logfile = NULL
)
A numeric matrix of counts or a
SingleCellExperiment
with the matrix located in the assay slot under useAssay
.
Rows represent features and columns represent cells.
A string specifying which assay
slot to use if x
is a
SingleCellExperiment object. Default "counts".
The name for the altExp slot to use. Default "featureSubset".
Integer. Initial number of modules.
Integer. Maximum number of modules.
Integer. Number of temporary cell populations to identify and
use in module splitting. Only used if zInit = NULL
Collapsing cells
to a relatively smaller number of cell popluations will increase the
speed of module clustering and tend to produce better modules. This number
should be larger than the number of true cell populations expected in the
dataset. Default 100
.
Integer vector. Collapse cells to cell populations based on
labels in zInit
and then perform module splitting. If NULL, no
collapsing will be performed unless tempK
is specified.
Default NULL
.
Vector or factor. Denotes the sample label for each cell
(column) in the count matrix. Default NULL
.
Numeric. Concentration parameter for Theta. Adds a pseudocount
to each cell population in each sample. Only used if zInit
is set.
Default 1
.
Numeric. Concentration parameter for Phi. Adds a pseudocount to each feature module in each cell. Default 1.
Numeric. Concentration parameter for Psi. Adds a pseudocount to each feature in each module. Default 1.
Numeric. Concentration parameter for Eta. Adds a pseudocount to the number of features in each module. Default 1.
Integer. Only attempt to split modules with at least this many features.
Logical. Whether to reorder modules using hierarchical clustering after each model has been created. If FALSE, module numbers will correspond to the split which created the module (i.e. 'L15' was created at split 15, 'L16' was created at split 16, etc.). Default TRUE.
Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made.
Logical. Whether to calculate perplexity for each model.
If FALSE, then perplexity can be calculated later with
resamplePerplexity. Default TRUE
.
Boolean. If TRUE
, then each cell in the counts
matrix will be resampled according to a multinomial distribution to introduce
noise before calculating perplexity. Default FALSE
.
Integer. The number of times to resample the counts matrix
for evaluating perplexity if doResampling
is set to TRUE
.
Default 5
.
Logical. Whether to print log messages. Default TRUE.
Character. Messages will be redirected to a file named "logfile". If NULL, messages will be printed to stdout. Default NULL.
A SingleCellExperiment object. Function parameter settings and celda model results are stored in the
metadata
"celda_grid_search"
slot. The models in
the list will be of class celda_G if zInit = NULL
or
celda_CG if zInit
is set.
recursiveSplitCell
for recursive splitting of cell
populations.
data(sceCeldaCG)
## Create models that range from L=3 to L=20 by recursively splitting modules
## into two
moduleSplit <- recursiveSplitModule(sceCeldaCG, initialL = 3, maxL = 20)
#> ==================================================
#> Starting recursive module splitting.
#> ==================================================
#> Mon Nov 6 08:01:12 2023 .. Collapsing to 100 temporary cell populations
#> Mon Nov 6 08:01:13 2023 .. Initializing with 3 modules
#> Mon Nov 6 08:01:13 2023 .. Created module 4 | logLik: -1241379.90928455
#> Mon Nov 6 08:01:13 2023 .. Created module 5 | logLik: -1235212.7977535
#> Mon Nov 6 08:01:13 2023 .. Created module 6 | logLik: -1232789.9817561
#> Mon Nov 6 08:01:13 2023 .. Created module 7 | logLik: -1227246.66090571
#> Mon Nov 6 08:01:13 2023 .. Created module 8 | logLik: -1223898.757694
#> Mon Nov 6 08:01:13 2023 .. Created module 9 | logLik: -1221848.26936098
#> Mon Nov 6 08:01:13 2023 .. Created module 10 | logLik: -1220147.96681948
#> Mon Nov 6 08:01:13 2023 .. Created module 11 | logLik: -1220818.37022325
#> Mon Nov 6 08:01:14 2023 .. Created module 12 | logLik: -1221489.07685946
#> Mon Nov 6 08:01:14 2023 .. Created module 13 | logLik: -1222032.53497571
#> Mon Nov 6 08:01:14 2023 .. Created module 14 | logLik: -1222712.17543857
#> Mon Nov 6 08:01:14 2023 .. Created module 15 | logLik: -1223268.97596756
#> Mon Nov 6 08:01:14 2023 .. Created module 16 | logLik: -1223841.4834406
#> Mon Nov 6 08:01:14 2023 .. Created module 17 | logLik: -1224394.02513994
#> Mon Nov 6 08:01:14 2023 .. Created module 18 | logLik: -1224863.41435811
#> Mon Nov 6 08:01:14 2023 .. Created module 19 | logLik: -1225480.30453125
#> Mon Nov 6 08:01:14 2023 .. Created module 20 | logLik: -1226156.47078695
#> Mon Nov 6 08:01:14 2023 .. Calculating perplexity
#> ==================================================
#> Completed recursive module splitting. Total time: 2.64532 secs
#> ==================================================
## Example results with perplexity
plotGridSearchPerplexity(moduleSplit)
## Select model for downstream analysis
celdaMod <- subsetCeldaList(moduleSplit, list(L = 10))
data(celdaCGSim)
## Create models that range from L=3 to L=20 by recursively splitting modules
## into two
moduleSplit <- recursiveSplitModule(celdaCGSim$counts,
initialL = 3, maxL = 20)
#> ==================================================
#> Starting recursive module splitting.
#> ==================================================
#> Mon Nov 6 08:01:15 2023 .. Collapsing to 100 temporary cell populations
#> Mon Nov 6 08:01:16 2023 .. Initializing with 3 modules
#> Mon Nov 6 08:01:16 2023 .. Created module 4 | logLik: -1243396.62348886
#> Mon Nov 6 08:01:16 2023 .. Created module 5 | logLik: -1237610.11790137
#> Mon Nov 6 08:01:16 2023 .. Created module 6 | logLik: -1232128.87013396
#> Mon Nov 6 08:01:16 2023 .. Created module 7 | logLik: -1227611.8250329
#> Mon Nov 6 08:01:16 2023 .. Created module 8 | logLik: -1225618.06184004
#> Mon Nov 6 08:01:16 2023 .. Created module 9 | logLik: -1223967.77531912
#> Mon Nov 6 08:01:17 2023 .. Created module 10 | logLik: -1222801.11395987
#> Mon Nov 6 08:01:17 2023 .. Created module 11 | logLik: -1223402.66903597
#> Mon Nov 6 08:01:17 2023 .. Created module 12 | logLik: -1224026.19892208
#> Mon Nov 6 08:01:17 2023 .. Created module 13 | logLik: -1224675.63005464
#> Mon Nov 6 08:01:17 2023 .. Created module 14 | logLik: -1225317.91966369
#> Mon Nov 6 08:01:17 2023 .. Created module 15 | logLik: -1225971.50555157
#> Mon Nov 6 08:01:17 2023 .. Created module 16 | logLik: -1226557.7881506
#> Mon Nov 6 08:01:17 2023 .. Created module 17 | logLik: -1227080.13473523
#> Mon Nov 6 08:01:17 2023 .. Created module 18 | logLik: -1227603.99622355
#> Mon Nov 6 08:01:17 2023 .. Created module 19 | logLik: -1228247.84169741
#> Mon Nov 6 08:01:17 2023 .. Created module 20 | logLik: -1228828.70617002
#> Mon Nov 6 08:01:17 2023 .. Calculating perplexity
#> ==================================================
#> Completed recursive module splitting. Total time: 2.611869 secs
#> ==================================================
## Example results with perplexity
plotGridSearchPerplexity(moduleSplit)
## Select model for downstream analysis
celdaMod <- subsetCeldaList(moduleSplit, list(L = 10))