Converts a list of gene sets stored in a GMT file into a GeneSetCollection and stores it in the metadata of the SingleCellExperiment object. These gene sets can be used in downstream quality control and analysis functions in singleCellTK.

importGeneSetsFromGMT(
  inSCE,
  file,
  collectionName = "GeneSetCollection",
  by = "rownames",
  sep = "\t",
  noMatchError = TRUE
)

Arguments

inSCE

Input SingleCellExperiment object.

file

Character. Path to GMT file. See getGmt for more information on reading GMT files.

collectionName

Character. Name of collection to add gene sets to. If this collection already exists in inSCE, then these gene sets will be added to that collection. Any gene sets within the collection with the same name will be overwritten. Default GeneSetCollection.

by

Character, character vector, or NULL. Describes the location within inSCE where the gene identifiers in geneSetList should be mapped. If set to "rownames" then the features will be searched for among rownames(inSCE). This can also be set to one of the column names of rowData(inSCE) in which case the gene identifies will be mapped to that column in the rowData of inSCE. by can be a vector the same length as the number of gene sets in the GMT file and the elements of the vector can point to different locations within inSCE. Finally, by can be NULL. In this case, the location of the gene identifiers in inSCE should be saved in the description (2nd column) of the GMT file. See featureIndex for more information. Default "rownames".

sep

Character. Delimiter of the GMT file. Default "\t".

noMatchError

Boolean. Show an error if a collection does not have any matching features. Default TRUE.

Value

A SingleCellExperiment object with gene set from collectionName output stored to the

metadata slot.

Details

The gene identifiers in gene sets in the GMT file will be mapped to the rownames of inSCE using the by parameter and stored in a GeneSetCollection object from package GSEABase. This object is stored in metadata(inSCE)$sctk$genesets, which can be accessed in downstream analysis functions such as runCellQC.

See also

importGeneSetsFromList for importing from lists, importGeneSetsFromCollection for importing from GeneSetCollection objects, and importGeneSetsFromMSigDB for importing MSigDB gene sets.

Author

Joshua D. Campbell

Examples

data(scExample)

# GMT file containing gene symbols for a subset of human mitochondrial genes
gmt <- system.file("extdata/mito_subset.gmt", package = "singleCellTK")

# "feature_name" is the second column in the GMT file, so the ids will
# be mapped using this column in the 'rowData' of 'sce'. This
# could also be accomplished by setting by = "feature_name" in the
# function call.
sce <- importGeneSetsFromGMT(inSCE = sce, file = gmt, by = NULL)