This function creates a musica object from a variant table or matrix. The musica class stores variants information, variant-level annotations, sample-level annotations, and count tables and is used as input to the mutational signature discovery and prediction algorithms. The input variant table or matrix must have columns for chromosome, start position, end position, reference allele, alternate allele, and sample names. The column names in the variant table can be mapped using the chromosome_col, start_col, end_col, ref_col, alt_col, and sample_col parameters.

create_musica(
  x,
  genome,
  check_ref_chromosomes = TRUE,
  check_ref_bases = TRUE,
  chromosome_col = "chr",
  start_col = "start",
  end_col = "end",
  ref_col = "ref",
  alt_col = "alt",
  sample_col = "sample",
  extra_fields = NULL,
  standardize_indels = TRUE,
  convert_dbs = TRUE,
  verbose = TRUE
)

Arguments

x

A data.table, matrix, or data.frame that contains columns with the variant information.

genome

A BSgenome object indicating which genome reference the variants and their coordinates were derived from.

check_ref_chromosomes

Whether to peform a check to ensure that the chromosomes in the variant object match the reference chromosomes in the genome object. If there are mismatches, this may cause errors in downstream generation of count tables. If mismatches occur, an attept to be automatically fix these with the seqlevelsStyle function will be made. Default TRUE.

check_ref_bases

Whether to check if the reference bases in the variant object match the reference bases in the genome object. Default TRUE.

chromosome_col

The name of the column that contains the chromosome reference for each variant. Default "chr".

start_col

The name of the column that contains the start position for each variant. Default "start".

end_col

The name of the column that contains the end position for each variant. Default "end".

ref_col

The name of the column that contains the reference base(s) for each variant. Default "ref".

alt_col

The name of the column that contains the alternative base(s) for each variant. Default "alt".

sample_col

The name of the column that contains the sample id for each variant. Default "sample".

extra_fields

Which additional fields to extract and include in the musica object. Default NULL.

standardize_indels

Flag to convert indel style (e.g. `C > CAT` becomes `- > AT` and `GCACA > G` becomes `CACA > -`)

convert_dbs

Flag to convert adjacent SBS into DBS (original SBS are removed)

verbose

Whether to print status messages during error checking. Default TRUE.

Value

Returns a musica object

Examples

maf_file <- system.file("extdata", "public_TCGA.LUSC.maf", package = "musicatk") variants <- extract_variants_from_maf_file(maf_file) g <- select_genome("38") musica <- create_musica(x = variants, genome = g)
#> Checking that chromosomes in the 'variant' object match chromosomes in the 'genome' object.
#> Checking that the reference bases in the 'variant' object match the reference bases in the 'genome' object.
#> Standardizing INS/DEL style
#> Converting adjacent SBS into DBS
#> 4 SBS converted to DBS