Chooses the correct function to extract variants from input based on the class of the object or the file extension. Different types of objects can be mixed within the list. For example, the list can include VCF files and maf objects. Certain parameters such as id and rename only apply to VCF objects or files and need to be individually specified for each VCF. Therefore, these parameters should be suppied as a vector that is the same length as the number of inputs. If other types of objects are in the input list, then the value of id and rename will be ignored for these items.

extract_variants(
inputs,
id = NULL,
rename = NULL,
sample_field = NULL,
filename_as_id = FALSE,
strip_extension = c(".vcf", ".vcf.gz", ".gz"),
filter = TRUE,
multiallele = c("expand", "exclude"),
fix_vcf_errors = TRUE,
extra_fields = NULL,
chromosome_col = "chr",
start_col = "start",
end_col = "end",
ref_col = "ref",
alt_col = "alt",
sample_col = "sample",
verbose = TRUE
)

## Arguments

inputs A vector or list of objects or file names. Objects can be CollapsedVCF, ExpandedVCF, MAF, an object that inherits from matrix or data.frame, or character strings that denote the path to a vcf or maf file. A character vector the same length as inputs denoting the sample to extract from a vcf. See extract_variants_from_vcf for more details. Only used if the input is a vcf object or file. Default NULL. A character vector the same length as inputs denoting what the same will be renamed to. See extract_variants_from_vcf for more details. Only used if the input is a vcf object or file. Default NULL. Some algoriths will save the name of the sample in the ##SAMPLE portion of header in the VCF. See extract_variants_from_vcf for more details. Default NULL. If set to TRUE, the file name will be used as the sample name. See extract_variants_from_vcf_file for more details. Only used if the input is a vcf file. Default TRUE. Only used if filename_as_id is set to TRUE. If set to TRUE, the file extention will be stripped from the filename before setting the sample name. See extract_variants_from_vcf_file for more details. Only used if the input is a vcf file. Default c(".vcf",".vcf.gz",".gz") Exclude variants that do not have a PASS in the FILTER column of VCF inputs. Multialleles are when multiple alternative variants are listed in the same row in the vcf. See extract_variants_from_vcf for more details. Only used if the input is a vcf object or file. Default "expand". Attempt to automatically fix VCF file formatting errors. See extract_variants_from_vcf_file for more details. Only used if the input is a vcf file. Default TRUE. Optionally extract additional fields from all input objects. Default NULL. The name of the column that contains the chromosome reference for each variant. Only used if the input is a matrix or data.frame. Default "Chromosome". The name of the column that contains the start position for each variant. Only used if the input is a matrix or data.frame. Default "Start_Position". The name of the column that contains the end position for each variant. Only used if the input is a matrix or data.frame. Default "End_Position". The name of the column that contains the reference base(s) for each variant. Only used if the input is a matrix or data.frame. Default "Tumor_Seq_Allele1". The name of the column that contains the alternative base(s) for each variant. Only used if the input is a matrix or data.frame. Default "Tumor_Seq_Allele2". The name of the column that contains the sample id for each variant. Only used if the input is a matrix or data.frame. Default "sample". Show progress of variant extraction. Default TRUE.

## Value

Returns a data.table of variants from a vcf

## Examples

# Get loations of two vcf files and a maf file
package = "musicatk")
lusc_maf_file <- system.file("extdata", "public_TCGA.LUSC.maf",
package = "musicatk")
melanoma_vcfs <- list.files(system.file("extdata", package = "musicatk"),
pattern = glob2rx("*SKCM*vcf"), full.names = TRUE)

# Read all files in at once
variants <- extract_variants(inputs = inputs)
#>
|
|                                                                      |   0%
|
|==============                                                        |  20%#> Extracted 1 out of 5 inputs: /private/var/folders/8g/zr_0d8wd23762jsqlwm5r_6w0000gn/T/RtmpqEjdyu/temp_libpathfbd337016ff7/musicatk/extdata/public_LUAD_TCGA-97-7938.vcf#>
|
|============================                                          |  40%#> Extracted 2 out of 5 inputs: /private/var/folders/8g/zr_0d8wd23762jsqlwm5r_6w0000gn/T/RtmpqEjdyu/temp_libpathfbd337016ff7/musicatk/extdata/public_SKCM_TCGA-EE-A3J5-06A-11D-A20D-08.vcf#>
|
|==========================================                            |  60%#> Extracted 3 out of 5 inputs: /private/var/folders/8g/zr_0d8wd23762jsqlwm5r_6w0000gn/T/RtmpqEjdyu/temp_libpathfbd337016ff7/musicatk/extdata/public_SKCM_TCGA-ER-A197-06A-32D-A197-08.vcf#>
|
|========================================================              |  80%#> Extracted 4 out of 5 inputs: /private/var/folders/8g/zr_0d8wd23762jsqlwm5r_6w0000gn/T/RtmpqEjdyu/temp_libpathfbd337016ff7/musicatk/extdata/public_SKCM_TCGA-ER-A19O-06A-11D-A197-08.vcf#>
|
|======================================================================| 100%#> Extracted 5 out of 5 inputs: /private/var/folders/8g/zr_0d8wd23762jsqlwm5r_6w0000gn/T/RtmpqEjdyu/temp_libpathfbd337016ff7/musicatk/extdata/public_TCGA.LUSC.maftable(variants$sample) #> #> TCGA-97-7938-01A-11D-2167-08 TCGA-EE-A3J5-06A-11D-A20D-08 #> 121 123 #> TCGA-ER-A197-06A-32D-A197-08 TCGA-ER-A19O-06A-11D-A197-08 #> 13 52 #> TCGA-56-7582-01A-11D-2042-08 TCGA-77-7335-01A-11D-2042-08 #> 199 283 #> TCGA-94-7557-01A-11D-2122-08 #> 120 # Run again but renaming samples in first four vcfs new_name <- c(paste0("Sample", 1:4), NA) variants <- extract_variants(inputs = inputs, rename = new_name) #> | | | 0% | |============== | 20%#> Extracted 1 out of 5 inputs: /private/var/folders/8g/zr_0d8wd23762jsqlwm5r_6w0000gn/T/RtmpqEjdyu/temp_libpathfbd337016ff7/musicatk/extdata/public_LUAD_TCGA-97-7938.vcf#> | |============================ | 40%#> Extracted 2 out of 5 inputs: /private/var/folders/8g/zr_0d8wd23762jsqlwm5r_6w0000gn/T/RtmpqEjdyu/temp_libpathfbd337016ff7/musicatk/extdata/public_SKCM_TCGA-EE-A3J5-06A-11D-A20D-08.vcf#> | |========================================== | 60%#> Extracted 3 out of 5 inputs: /private/var/folders/8g/zr_0d8wd23762jsqlwm5r_6w0000gn/T/RtmpqEjdyu/temp_libpathfbd337016ff7/musicatk/extdata/public_SKCM_TCGA-ER-A197-06A-32D-A197-08.vcf#> | |======================================================== | 80%#> Extracted 4 out of 5 inputs: /private/var/folders/8g/zr_0d8wd23762jsqlwm5r_6w0000gn/T/RtmpqEjdyu/temp_libpathfbd337016ff7/musicatk/extdata/public_SKCM_TCGA-ER-A19O-06A-11D-A197-08.vcf#> | |======================================================================| 100%#> Extracted 5 out of 5 inputs: /private/var/folders/8g/zr_0d8wd23762jsqlwm5r_6w0000gn/T/RtmpqEjdyu/temp_libpathfbd337016ff7/musicatk/extdata/public_TCGA.LUSC.maftable(variants$sample)
#>
#>                      Sample1                      Sample2
#>                          121                          123
#>                      Sample3                      Sample4
#>                           13                           52
#> TCGA-56-7582-01A-11D-2042-08 TCGA-77-7335-01A-11D-2042-08
#>                          199                          283
#> TCGA-94-7557-01A-11D-2122-08
#>                          120