Aaron - Need to describe differnce between ID, and name in the header, and rename in
terms of naming the sample. Need to describe differences in multiallelic
choices. Also need to describe the automatic error fixing
extract_variants_from_vcf(
vcf,
id = NULL,
rename = NULL,
sample_field = NULL,
filter = TRUE,
multiallele = c("expand", "exclude"),
extra_fields = NULL
)
Arguments
vcf |
Location of vcf file |
id |
ID of the sample to select from VCF. If NULL , then the
first sample will be selected. Default NULL . |
rename |
Rename the sample to this value when extracting variants.
If NULL , then the sample will be named according to ID . |
sample_field |
Some algoriths will save the name of the
sample in the ##SAMPLE portion of header in the VCF (e.g.
##SAMPLE=<ID=TUMOR,SampleName=TCGA-01-0001>). If the ID is specified via the
id parameter ("TUMOR" in this example), then sample_field can
be used to specify the name of the tag ("SampleName" in this example).
Default NULL . |
filter |
Exclude variants that do not have a PASS in the
FILTER column of the VCF. Default TRUE . |
multiallele |
Multialleles are when multiple alternative variants
are listed in the same row in the vcf. One of "expand" or
"exclude" . If "expand" is selected, then each
alternate allele will be given their own rows. If "exclude" is
selected, then these rows will be removed. Default "expand" . |
extra_fields |
Optionally extract additional fields from the INFO
section of the VCF. Default NULL . |
Value
Returns a data.table of variants from a vcf
Examples
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: ‘matrixStats’
#> The following objects are masked from ‘package:Biobase’:
#>
#> anyMissing, rowMedians
#>
#> Attaching package: ‘MatrixGenerics’
#> The following objects are masked from ‘package:matrixStats’:
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> The following object is masked from ‘package:Biobase’:
#>
#> rowMedians
#> Loading required package: GenomeInfoDb
#> Loading required package: S4Vectors
#> Loading required package: stats4
#>
#> Attaching package: ‘S4Vectors’
#> The following object is masked from ‘package:NMF’:
#>
#> nrun
#> The following object is masked from ‘package:pkgmaker’:
#>
#> new2
#> The following object is masked from ‘package:base’:
#>
#> expand.grid
#> Loading required package: IRanges
#> Loading required package: GenomicRanges
#> Loading required package: SummarizedExperiment
#> Loading required package: Rsamtools
#> Loading required package: Biostrings
#> Loading required package: XVector
#>
#> Attaching package: ‘Biostrings’
#> The following object is masked from ‘package:base’:
#>
#> strsplit
#>
#> Attaching package: ‘VariantAnnotation’
#> The following object is masked from ‘package:base’:
#>
#> tabulate