Author Archives: admin

Celda

“Celda” stands for “CEllular Latent Dirichlet Allocation”. It is a suite of Bayesian hierarchical models and supporting functions to perform bi-clustering of features and cells for single-cell genomic data. This algorithm is an extension of the Latent Dirichlet Allocation (LDA) topic modeling framework that has been popular in text mining applications.

DecontX

DecontX is a Bayesian hierarchical model to estimate and remove cross-contamination from ambient RNA in single-cell genomic data. DecontX will take the count matrix, estimate the contamination levels within each individual cell, and deliver a decontaminted count matrix for downstream analysis.

Scruff

Scruff (Single Cell RNA-Seq UMI Filtering Facilitator) is a package for processing single cell RNA-seq (scRNA-seq) FASTQ reads generated by CEL-Seq and CEL-Seq2 protocols. It demultiplexes scRNA-seq FASTQ files, aligns reads to reference genome using Rsubread, and generates UMI filtered count matrix.​​

Musicatk

Musicatk (Mutational Signature Comprehensive Analysis Toolkit) is a comprehensive toolkit for analysis of mutational signatures. It has utilities for extracting variants from a variety of file formats, contains multiple methods for discovery of novel signatures or prediction of known signatures, as well as many types of downstream visualizations for exploratory analysis. This package has the ……

ExperimentSubset

ExperimentSubset is an R package to manage subsets of Bioconductor Experiment objects. ExperimentSubset package enables users to perform flexible subsetting of Single-Cell data that comes from the same experiment as well as the consequent storage of these subsets back into the same object. In general, it offers the same interface to the users as the ……

Computational Biology and Bioinformatics

High-throughput genomic technologies are rapidly evolving including the areas of DNA and RNA sequencing. Novel types of complex data are being quickly generated and require novel methods for quality control and analysis. We are currently focused on developing and/or applying methods for identifying genomic alterations in cancer, quantifying the mutagenic effect of carcinogens, and characterizing ……

Identifying Early Drivers of Lung Cancer

Lung adenocarcinomas and lung squamous cell carcinomas are the most common types of lung cancer and remain major causes of death worldwide despite advances in smoking cessation, early detection, and targeted and immunological therapies. Many patients have lung cancers that do not harbor a known activating mutation and therefore cannot be given targeted therapies. In ……

Therapeutic Development and Pathogenesis of COPD

Chronic Obstructive Pulmonary Disease (COPD) is the 4th leading cause of death in the world. Our understanding of the molecular mechanisms responsible for the initiation and progression of this disease are limited. By examining expression differences between individuals with and without COPD or differences within a person along a gradient of disease, we hope to ……

ExperimentSubset accepted for publication in Bioinformatics

ExperimentSubset is an R package to manage subsets of Bioconductor Experiment objects during an analysis workflow. The package boasts features such as efficient memory management and provenance tracking while keeping data redundancy at minimum. This work was led by lab member Irzam Sarfraz and was recently accepted for publication in Oxford Bioinformatics. The package is ……

Unique vulnerabilities in tumors that have undergone whole-genome doubling

A study led by Neil Ganem and his graduate student Ryan Quinton was published in Nature. Genomic characteristics of tumors that have undergone whole-genome doubling (WGD) were explored and a novel vulnerability was identified in the gene KIF18A using the Broad’s Dependency Map data. KIF18A encodes a mitotic kinesin protein and was required for viability of tumors with WGD. Yusuke Koga and Josh Campbell contributed to the study by aiding ……