Mutational signatures and exposures will be discovered using methods such as Latent Dirichlet Allocation (lda) or Non-Negative Matrix Factorization (nmf). These algorithms will deconvolute a matrix of counts for mutation types in each sample to two matrices: 1) a "signature" matrix containing the probability of each mutation type in each sample and 2) an "exposure" matrix containing the estimated counts for each signature in each sample. Before mutational discovery can be performed, variants from samples first need to be stored in a musica object using the create_musica function and mutation count tables need to be created using functions such as build_standard_table.

discover_signatures(
  musica,
  table_name,
  num_signatures,
  algorithm = "lda",
  seed = 1,
  nstart = 10,
  par_cores = 1
)

Arguments

musica

A musica object.

table_name

Name of the table to use for signature discovery. Needs to be the same name supplied to the table building functions such as build_standard_table.

num_signatures

Number of signatures to discover.

algorithm

Method to use for mutational signature discovery. One of "lda" or "nmf". Default "lda".

seed

Seed to be used for the random number generators in the signature discovery algorithms. Default 1.

nstart

Number of independent random starts used in the mutational signature algorithms. Default 10.

par_cores

Number of parallel cores to use. Only used if method = "nmf". Default 1.

Value

Returns a A musica_result object containing signatures and exposures.

Examples

data(musica) g <- select_genome("19") build_standard_table(musica, g, "SBS96", overwrite = TRUE)
#> Building count table from SBS with SBS96 schema
#> Warning: Overwriting counts table: SBS96
discover_signatures(musica = musica, table_name = "SBS96", num_signatures = 3, algorithm = "lda", seed = 12345, nstart = 1)
#> An object of class "musica_result" #> Slot "signatures": #> Signature1 Signature2 Signature3 #> C>A_ACA 3.214412e-22 4.257622e-02 4.320778e-02 #> C>A_ACC 7.068686e-72 3.378749e-02 7.273860e-57 #> C>A_ACG 1.696120e-76 5.631249e-03 3.801190e-79 #> C>A_ACT 1.439528e-70 1.689375e-02 3.139490e-54 #> C>A_CCA 2.443956e-69 2.815624e-02 1.256298e-02 #> C>A_CCC 1.554471e-69 2.815624e-02 2.512596e-02 #> C>A_CCG 6.265378e-03 1.622837e-02 5.081727e-05 #> C>A_CCT 9.750270e-19 3.378747e-02 3.962780e-08 #> C>A_GCA 6.265689e-03 1.623538e-02 3.449016e-05 #> C>A_GCC 5.661380e-72 1.126250e-02 1.256298e-02 #> C>A_GCG 1.441197e-67 1.126250e-02 4.896510e-48 #> C>A_GCT 4.251600e-71 3.941874e-02 8.310749e-55 #> C>A_TCA 1.079587e-75 2.815624e-02 2.030294e-77 #> C>A_TCC 6.227899e-72 1.689375e-02 2.512596e-02 #> C>A_TCG 2.917076e-72 2.252499e-02 2.383283e-56 #> C>A_TCT 1.040964e-63 3.378749e-02 1.256298e-02 #> C>G_ACA 8.982685e-72 1.689375e-02 4.340776e-55 #> C>G_ACC 3.332204e-14 1.023482e-02 2.292690e-03 #> C>G_ACG 3.720076e-44 3.720076e-44 3.720076e-44 #> C>G_ACT 4.790181e-16 1.602187e-02 2.707106e-02 #> C>G_CCA 5.966718e-72 5.631249e-03 1.256298e-02 #> C>G_CCC 9.067051e-17 2.252476e-02 5.134283e-07 #> C>G_CCG 5.623647e-03 5.631249e-03 2.985272e-45 #> C>G_CCT 2.341534e-13 2.052206e-02 4.468438e-03 #> C>G_GCA 3.673864e-72 2.815624e-02 4.208045e-57 #> C>G_GCC 5.477960e-80 1.236855e-78 1.256298e-02 #> C>G_GCG 3.720076e-44 3.720076e-44 3.720076e-44 #> C>G_GCT 1.298768e-79 3.551297e-78 5.025193e-02 #> C>G_TCA 3.448376e-76 1.126250e-02 8.959530e-78 #> C>G_TCC 8.532596e-21 1.315044e-02 3.347705e-02 #> C>G_TCG 3.720076e-44 3.720076e-44 3.720076e-44 #> C>G_TCT 1.559492e-17 7.339826e-03 2.131423e-02 #> C>T_ACA 9.283284e-02 1.860131e-02 1.493970e-02 #> C>T_ACC 1.687094e-02 4.660678e-76 1.256298e-02 #> C>T_ACG 1.687094e-02 5.605620e-76 2.512596e-02 #> C>T_ACT 6.951015e-02 1.850334e-02 2.957119e-02 #> C>T_CCA 7.873106e-02 1.689375e-02 6.554712e-69 #> C>T_CCC 6.318646e-02 3.604767e-02 4.224660e-02 #> C>T_CCG 1.124729e-02 5.631814e-84 3.067592e-77 #> C>T_CCT 1.007431e-01 2.657566e-02 1.716718e-02 #> C>T_GCA 6.186012e-02 2.252499e-02 2.512596e-02 #> C>T_GCC 5.789264e-02 1.231471e-02 4.420470e-02 #> C>T_GCG 5.623647e-03 2.469049e-75 1.256298e-02 #> C>T_GCT 4.210611e-02 1.396361e-02 3.810356e-02 #> C>T_TCA 4.634952e-02 3.704481e-02 1.482014e-02 #> C>T_TCC 1.008866e-01 3.337747e-02 5.192405e-02 #> C>T_TCG 1.124729e-02 5.346482e-78 4.813211e-68 #> C>T_TCT 6.186012e-02 1.689375e-02 1.256298e-02 #> T>A_ATA 4.222375e-70 1.126250e-02 1.127285e-52 #> T>A_ATC 3.720076e-44 3.720076e-44 3.720076e-44 #> T>A_ATG 1.244484e-69 1.126250e-02 1.799990e-51 #> T>A_ATT 5.623647e-03 5.631249e-03 1.256298e-02 #> T>A_CTA 4.540720e-68 1.126250e-02 1.559780e-48 #> T>A_CTC 2.329443e-68 1.126250e-02 1.256298e-02 #> T>A_CTG 1.583880e-69 1.126250e-02 2.512596e-02 #> T>A_CTT 5.623647e-03 5.631249e-03 1.256298e-02 #> T>A_GTA 6.742502e-71 2.212938e-67 2.512596e-02 #> T>A_GTC 3.720076e-44 3.720076e-44 3.720076e-44 #> T>A_GTG 5.623647e-03 5.631249e-03 8.389862e-71 #> T>A_GTT 3.720076e-44 3.720076e-44 3.720076e-44 #> T>A_TTA 3.534605e-76 1.689375e-02 1.112954e-76 #> T>A_TTC 5.623647e-03 5.806495e-76 1.256298e-02 #> T>A_TTG 5.623647e-03 1.485272e-75 1.256298e-02 #> T>A_TTT 1.687094e-02 1.689375e-02 2.512596e-02 #> T>C_ATA 1.100756e-13 6.228549e-08 1.256284e-02 #> T>C_ATC 2.279139e-68 1.126250e-02 1.256298e-02 #> T>C_ATG 1.296000e-02 9.303956e-03 5.432740e-04 #> T>C_ATT 5.837458e-03 2.529910e-02 3.102243e-02 #> T>C_CTA 4.866067e-76 5.631249e-03 6.309342e-77 #> T>C_CTC 1.124729e-02 5.631249e-03 1.439249e-71 #> T>C_CTG 6.659905e-66 5.631249e-03 2.512596e-02 #> T>C_CTT 1.230529e-02 2.999234e-12 2.276246e-02 #> T>C_GTA 3.619894e-71 2.772732e-66 2.512596e-02 #> T>C_GTC 3.720076e-44 3.720076e-44 3.720076e-44 #> T>C_GTG 3.720076e-44 3.720076e-44 3.720076e-44 #> T>C_GTT 5.623647e-03 7.333095e-76 1.256298e-02 #> T>C_TTA 1.687094e-02 5.631249e-03 4.987590e-72 #> T>C_TTC 6.074431e-03 7.222243e-03 2.056952e-02 #> T>C_TTG 1.979686e-68 5.631249e-03 6.494058e-49 #> T>C_TTT 5.623647e-03 8.877562e-76 1.256298e-02 #> T>G_ATA 3.720076e-44 3.720076e-44 3.720076e-44 #> T>G_ATC 5.623647e-03 3.239622e-73 1.256298e-02 #> T>G_ATG 3.720076e-44 3.720076e-44 3.720076e-44 #> T>G_ATT 3.720076e-44 3.720076e-44 3.720076e-44 #> T>G_CTA 2.440947e-78 1.427290e-77 1.256298e-02 #> T>G_CTC 3.720076e-44 3.720076e-44 3.720076e-44 #> T>G_CTG 5.623647e-03 1.120463e-83 2.578236e-76 #> T>G_CTT 9.044721e-72 5.631249e-03 2.512596e-02 #> T>G_GTA 8.002118e-77 5.631249e-03 7.430059e-78 #> T>G_GTC 3.720076e-44 3.720076e-44 3.720076e-44 #> T>G_GTG 3.720076e-44 3.720076e-44 3.720076e-44 #> T>G_GTT 7.320200e-69 5.631249e-03 1.047583e-49 #> T>G_TTA 1.124729e-02 4.725050e-80 1.226595e-70 #> T>G_TTC 3.720076e-44 3.720076e-44 3.720076e-44 #> T>G_TTG 3.720076e-44 3.720076e-44 3.720076e-44 #> T>G_TTT 3.720076e-44 3.720076e-44 3.720076e-44 #> #> Slot "exposures": #> TCGA-56-7582-01A-11D-2042-08 TCGA-77-7335-01A-11D-2042-08 #> Signature1 0.05077393 0.05080516 #> Signature2 46.89845213 0.05080516 #> Signature3 0.05077393 57.89838968 #> TCGA-94-7557-01A-11D-2122-08 TCGA-97-7938-01A-11D-2167-08 #> Signature1 5.061058 0.05087258 #> Signature2 14.057990 116.89825485 #> Signature3 9.880952 0.05087258 #> TCGA-EE-A3J5-06A-11D-A20D-08 TCGA-ER-A197-06A-32D-A197-08 #> Signature1 121.89824941 0.05024105 #> Signature2 0.05087529 0.05024105 #> Signature3 0.05087529 10.89951790 #> TCGA-ER-A19O-06A-11D-A197-08 #> Signature1 50.89842631 #> Signature2 0.05078684 #> Signature3 0.05078684 #> #> Slot "table_name": #> [1] "SBS96" #> #> Slot "algorithm": #> [1] "LDA" #> #> Slot "musica": #> An object of class "musica" #> Slot "variants": #> chr start end ref alt sample #> 1: chr1 11020563 11020563 C A TCGA-94-7557-01A-11D-2122-08 #> 2: chr1 43430030 43430030 G T TCGA-94-7557-01A-11D-2122-08 #> 3: chr1 58682403 58682403 A G TCGA-94-7557-01A-11D-2122-08 #> 4: chr1 109508295 109508295 C G TCGA-94-7557-01A-11D-2122-08 #> 5: chr1 156384826 156384826 A C TCGA-94-7557-01A-11D-2122-08 #> --- #> 907: chr19 54885292 54885292 G A TCGA-ER-A19O-06A-11D-A197-08 #> 908: chr20 49374328 49374328 A T TCGA-ER-A19O-06A-11D-A197-08 #> 909: chrX 73213768 73213768 T G TCGA-ER-A19O-06A-11D-A197-08 #> 910: chrX 101292834 101292834 G A TCGA-ER-A19O-06A-11D-A197-08 #> 911: chrX 107526690 107526690 C T TCGA-ER-A19O-06A-11D-A197-08 #> Variant_Type #> 1: SBS #> 2: SBS #> 3: SBS #> 4: SBS #> 5: SBS #> --- #> 907: SBS #> 908: SBS #> 909: SBS #> 910: SBS #> 911: SBS #> #> Slot "count_tables": #> $SBS96 #> Count_Table: SBS96 #> Motifs: 96 #> Samples: 7 #> #> **Annotations: #> motif mutation context #> C>A_ACA C>A_ACA C>A ACA #> C>A_ACC C>A_ACC C>A ACC #> C>A_ACG C>A_ACG C>A ACG #> C>A_ACT C>A_ACT C>A ACT #> C>A_CCA C>A_CCA C>A CCA #> C>A_CCC C>A_CCC C>A CCC #> 7 ... ... ... #> #> **Features: #> mutation #> 1 C>A_CCG #> 2 C>A_GCA #> 3 T>C_ATG #> 4 C>G_TGC #> 5 T>G_AGC #> 6 C>G_CCT #> 7 ... #> #> **Types: #> SBS #> #> **Color Variable: #> mutation #> #> **Color Mapping: #> #5ABCEBFF #> #050708FF #> #D33C32FF #> #CBCACBFF #> #ABCD72FF #> #E7C9C6FF #> #> **Descriptions: #> Single Base Substitution table with one base upstream and downstream #> #> #> Slot "sample_annotations": #> Samples Tumor_Subtypes #> 1: TCGA-94-7557-01A-11D-2122-08 Lung #> 2: TCGA-56-7582-01A-11D-2042-08 Lung #> 3: TCGA-77-7335-01A-11D-2042-08 Lung #> 4: TCGA-97-7938-01A-11D-2167-08 Lung #> 5: TCGA-EE-A3J5-06A-11D-A20D-08 Breast #> 6: TCGA-ER-A197-06A-32D-A197-08 Breast #> 7: TCGA-ER-A19O-06A-11D-A197-08 Breast #> #> #> Slot "umap": #> <0 x 0 matrix> #>