Mutational signatures and exposures will be discovered using
methods such as Latent Dirichlet Allocation (lda) or Non-Negative
Matrix Factorization (nmf). These algorithms will deconvolute a matrix of
counts for mutation types in each sample to two matrices: 1) a "signature"
matrix containing the probability of each mutation type in each sample and
2) an "exposure" matrix containing the estimated counts for each signature
in each sample. Before mutational discovery can be performed,
variants from samples first need to be stored in a
musica
object using the create_musica function
and mutation count tables need to be created using functions such as
build_standard_table.
discover_signatures( musica, table_name, num_signatures, algorithm = "lda", seed = 1, nstart = 10, par_cores = 1 )
musica | A |
---|---|
table_name | Name of the table to use for signature discovery. Needs to be the same name supplied to the table building functions such as build_standard_table. |
num_signatures | Number of signatures to discover. |
algorithm | Method to use for mutational signature discovery. One of
|
seed | Seed to be used for the random number generators in the
signature discovery algorithms. Default |
nstart | Number of independent random starts used in the mutational
signature algorithms. Default |
par_cores | Number of parallel cores to use. Only used if
|
Returns a A musica_result
object containing
signatures and exposures.
#>#> Warning: Overwriting counts table: SBS96discover_signatures(musica = musica, table_name = "SBS96", num_signatures = 3, algorithm = "lda", seed = 12345, nstart = 1)#> An object of class "musica_result" #> Slot "signatures": #> Signature1 Signature2 Signature3 #> C>A_ACA 3.214412e-22 4.257622e-02 4.320778e-02 #> C>A_ACC 7.068686e-72 3.378749e-02 7.273860e-57 #> C>A_ACG 1.696120e-76 5.631249e-03 3.801190e-79 #> C>A_ACT 1.439528e-70 1.689375e-02 3.139490e-54 #> C>A_CCA 2.443956e-69 2.815624e-02 1.256298e-02 #> C>A_CCC 1.554471e-69 2.815624e-02 2.512596e-02 #> C>A_CCG 6.265378e-03 1.622837e-02 5.081727e-05 #> C>A_CCT 9.750270e-19 3.378747e-02 3.962780e-08 #> C>A_GCA 6.265689e-03 1.623538e-02 3.449016e-05 #> C>A_GCC 5.661380e-72 1.126250e-02 1.256298e-02 #> C>A_GCG 1.441197e-67 1.126250e-02 4.896510e-48 #> C>A_GCT 4.251600e-71 3.941874e-02 8.310749e-55 #> C>A_TCA 1.079587e-75 2.815624e-02 2.030294e-77 #> C>A_TCC 6.227899e-72 1.689375e-02 2.512596e-02 #> C>A_TCG 2.917076e-72 2.252499e-02 2.383283e-56 #> C>A_TCT 1.040964e-63 3.378749e-02 1.256298e-02 #> C>G_ACA 8.982685e-72 1.689375e-02 4.340776e-55 #> C>G_ACC 3.332204e-14 1.023482e-02 2.292690e-03 #> C>G_ACG 3.720076e-44 3.720076e-44 3.720076e-44 #> C>G_ACT 4.790181e-16 1.602187e-02 2.707106e-02 #> C>G_CCA 5.966718e-72 5.631249e-03 1.256298e-02 #> C>G_CCC 9.067051e-17 2.252476e-02 5.134283e-07 #> C>G_CCG 5.623647e-03 5.631249e-03 2.985272e-45 #> C>G_CCT 2.341534e-13 2.052206e-02 4.468438e-03 #> C>G_GCA 3.673864e-72 2.815624e-02 4.208045e-57 #> C>G_GCC 5.477960e-80 1.236855e-78 1.256298e-02 #> C>G_GCG 3.720076e-44 3.720076e-44 3.720076e-44 #> C>G_GCT 1.298768e-79 3.551297e-78 5.025193e-02 #> C>G_TCA 3.448376e-76 1.126250e-02 8.959530e-78 #> C>G_TCC 8.532596e-21 1.315044e-02 3.347705e-02 #> C>G_TCG 3.720076e-44 3.720076e-44 3.720076e-44 #> C>G_TCT 1.559492e-17 7.339826e-03 2.131423e-02 #> C>T_ACA 9.283284e-02 1.860131e-02 1.493970e-02 #> C>T_ACC 1.687094e-02 4.660678e-76 1.256298e-02 #> C>T_ACG 1.687094e-02 5.605620e-76 2.512596e-02 #> C>T_ACT 6.951015e-02 1.850334e-02 2.957119e-02 #> C>T_CCA 7.873106e-02 1.689375e-02 6.554712e-69 #> C>T_CCC 6.318646e-02 3.604767e-02 4.224660e-02 #> C>T_CCG 1.124729e-02 5.631814e-84 3.067592e-77 #> C>T_CCT 1.007431e-01 2.657566e-02 1.716718e-02 #> C>T_GCA 6.186012e-02 2.252499e-02 2.512596e-02 #> C>T_GCC 5.789264e-02 1.231471e-02 4.420470e-02 #> C>T_GCG 5.623647e-03 2.469049e-75 1.256298e-02 #> C>T_GCT 4.210611e-02 1.396361e-02 3.810356e-02 #> C>T_TCA 4.634952e-02 3.704481e-02 1.482014e-02 #> C>T_TCC 1.008866e-01 3.337747e-02 5.192405e-02 #> C>T_TCG 1.124729e-02 5.346482e-78 4.813211e-68 #> C>T_TCT 6.186012e-02 1.689375e-02 1.256298e-02 #> T>A_ATA 4.222375e-70 1.126250e-02 1.127285e-52 #> T>A_ATC 3.720076e-44 3.720076e-44 3.720076e-44 #> T>A_ATG 1.244484e-69 1.126250e-02 1.799990e-51 #> T>A_ATT 5.623647e-03 5.631249e-03 1.256298e-02 #> T>A_CTA 4.540720e-68 1.126250e-02 1.559780e-48 #> T>A_CTC 2.329443e-68 1.126250e-02 1.256298e-02 #> T>A_CTG 1.583880e-69 1.126250e-02 2.512596e-02 #> T>A_CTT 5.623647e-03 5.631249e-03 1.256298e-02 #> T>A_GTA 6.742502e-71 2.212938e-67 2.512596e-02 #> T>A_GTC 3.720076e-44 3.720076e-44 3.720076e-44 #> T>A_GTG 5.623647e-03 5.631249e-03 8.389862e-71 #> T>A_GTT 3.720076e-44 3.720076e-44 3.720076e-44 #> T>A_TTA 3.534605e-76 1.689375e-02 1.112954e-76 #> T>A_TTC 5.623647e-03 5.806495e-76 1.256298e-02 #> T>A_TTG 5.623647e-03 1.485272e-75 1.256298e-02 #> T>A_TTT 1.687094e-02 1.689375e-02 2.512596e-02 #> T>C_ATA 1.100756e-13 6.228549e-08 1.256284e-02 #> T>C_ATC 2.279139e-68 1.126250e-02 1.256298e-02 #> T>C_ATG 1.296000e-02 9.303956e-03 5.432740e-04 #> T>C_ATT 5.837458e-03 2.529910e-02 3.102243e-02 #> T>C_CTA 4.866067e-76 5.631249e-03 6.309342e-77 #> T>C_CTC 1.124729e-02 5.631249e-03 1.439249e-71 #> T>C_CTG 6.659905e-66 5.631249e-03 2.512596e-02 #> T>C_CTT 1.230529e-02 2.999234e-12 2.276246e-02 #> T>C_GTA 3.619894e-71 2.772732e-66 2.512596e-02 #> T>C_GTC 3.720076e-44 3.720076e-44 3.720076e-44 #> T>C_GTG 3.720076e-44 3.720076e-44 3.720076e-44 #> T>C_GTT 5.623647e-03 7.333095e-76 1.256298e-02 #> T>C_TTA 1.687094e-02 5.631249e-03 4.987590e-72 #> T>C_TTC 6.074431e-03 7.222243e-03 2.056952e-02 #> T>C_TTG 1.979686e-68 5.631249e-03 6.494058e-49 #> T>C_TTT 5.623647e-03 8.877562e-76 1.256298e-02 #> T>G_ATA 3.720076e-44 3.720076e-44 3.720076e-44 #> T>G_ATC 5.623647e-03 3.239622e-73 1.256298e-02 #> T>G_ATG 3.720076e-44 3.720076e-44 3.720076e-44 #> T>G_ATT 3.720076e-44 3.720076e-44 3.720076e-44 #> T>G_CTA 2.440947e-78 1.427290e-77 1.256298e-02 #> T>G_CTC 3.720076e-44 3.720076e-44 3.720076e-44 #> T>G_CTG 5.623647e-03 1.120463e-83 2.578236e-76 #> T>G_CTT 9.044721e-72 5.631249e-03 2.512596e-02 #> T>G_GTA 8.002118e-77 5.631249e-03 7.430059e-78 #> T>G_GTC 3.720076e-44 3.720076e-44 3.720076e-44 #> T>G_GTG 3.720076e-44 3.720076e-44 3.720076e-44 #> T>G_GTT 7.320200e-69 5.631249e-03 1.047583e-49 #> T>G_TTA 1.124729e-02 4.725050e-80 1.226595e-70 #> T>G_TTC 3.720076e-44 3.720076e-44 3.720076e-44 #> T>G_TTG 3.720076e-44 3.720076e-44 3.720076e-44 #> T>G_TTT 3.720076e-44 3.720076e-44 3.720076e-44 #> #> Slot "exposures": #> TCGA-56-7582-01A-11D-2042-08 TCGA-77-7335-01A-11D-2042-08 #> Signature1 0.05077393 0.05080516 #> Signature2 46.89845213 0.05080516 #> Signature3 0.05077393 57.89838968 #> TCGA-94-7557-01A-11D-2122-08 TCGA-97-7938-01A-11D-2167-08 #> Signature1 5.061058 0.05087258 #> Signature2 14.057990 116.89825485 #> Signature3 9.880952 0.05087258 #> TCGA-EE-A3J5-06A-11D-A20D-08 TCGA-ER-A197-06A-32D-A197-08 #> Signature1 121.89824941 0.05024105 #> Signature2 0.05087529 0.05024105 #> Signature3 0.05087529 10.89951790 #> TCGA-ER-A19O-06A-11D-A197-08 #> Signature1 50.89842631 #> Signature2 0.05078684 #> Signature3 0.05078684 #> #> Slot "table_name": #> [1] "SBS96" #> #> Slot "algorithm": #> [1] "LDA" #> #> Slot "musica": #> An object of class "musica" #> Slot "variants": #> chr start end ref alt sample #> 1: chr1 11020563 11020563 C A TCGA-94-7557-01A-11D-2122-08 #> 2: chr1 43430030 43430030 G T TCGA-94-7557-01A-11D-2122-08 #> 3: chr1 58682403 58682403 A G TCGA-94-7557-01A-11D-2122-08 #> 4: chr1 109508295 109508295 C G TCGA-94-7557-01A-11D-2122-08 #> 5: chr1 156384826 156384826 A C TCGA-94-7557-01A-11D-2122-08 #> --- #> 907: chr19 54885292 54885292 G A TCGA-ER-A19O-06A-11D-A197-08 #> 908: chr20 49374328 49374328 A T TCGA-ER-A19O-06A-11D-A197-08 #> 909: chrX 73213768 73213768 T G TCGA-ER-A19O-06A-11D-A197-08 #> 910: chrX 101292834 101292834 G A TCGA-ER-A19O-06A-11D-A197-08 #> 911: chrX 107526690 107526690 C T TCGA-ER-A19O-06A-11D-A197-08 #> Variant_Type #> 1: SBS #> 2: SBS #> 3: SBS #> 4: SBS #> 5: SBS #> --- #> 907: SBS #> 908: SBS #> 909: SBS #> 910: SBS #> 911: SBS #> #> Slot "count_tables": #> $SBS96 #> Count_Table: SBS96 #> Motifs: 96 #> Samples: 7 #> #> **Annotations: #> motif mutation context #> C>A_ACA C>A_ACA C>A ACA #> C>A_ACC C>A_ACC C>A ACC #> C>A_ACG C>A_ACG C>A ACG #> C>A_ACT C>A_ACT C>A ACT #> C>A_CCA C>A_CCA C>A CCA #> C>A_CCC C>A_CCC C>A CCC #> 7 ... ... ... #> #> **Features: #> mutation #> 1 C>A_CCG #> 2 C>A_GCA #> 3 T>C_ATG #> 4 C>G_TGC #> 5 T>G_AGC #> 6 C>G_CCT #> 7 ... #> #> **Types: #> SBS #> #> **Color Variable: #> mutation #> #> **Color Mapping: #> #5ABCEBFF #> #050708FF #> #D33C32FF #> #CBCACBFF #> #ABCD72FF #> #E7C9C6FF #> #> **Descriptions: #> Single Base Substitution table with one base upstream and downstream #> #> #> Slot "sample_annotations": #> Samples Tumor_Subtypes #> 1: TCGA-94-7557-01A-11D-2122-08 Lung #> 2: TCGA-56-7582-01A-11D-2042-08 Lung #> 3: TCGA-77-7335-01A-11D-2042-08 Lung #> 4: TCGA-97-7938-01A-11D-2167-08 Lung #> 5: TCGA-EE-A3J5-06A-11D-A20D-08 Breast #> 6: TCGA-ER-A197-06A-32D-A197-08 Breast #> 7: TCGA-ER-A19O-06A-11D-A197-08 Breast #> #> #> Slot "umap": #> <0 x 0 matrix> #>