This dataset focuses on Illumina-based, RNA-Sequencing data within The Cancer Genome Atlas (TCGA). TCGA contains clinical and molecular data for 11,000+ tumor samples across many tumor types. This dataset contains gene-expression data for 9264 of these samples across 24 tissue types.

These data have been prepared using a computational pipeline that uses the Rsubread package for aligning the data to the human reference genome. The data values are summarized at the gene level. You can read more about the process that was used to generate the data in the papers cited below. Note: These values have been normalized using the transcripts-per-million (TPM) approach. Some differential-expression tools require counts rather TPM values, so if you want to perform a differential-expression analysis, look for a similarly named dataset with count values.

Data source(s):

Citation(s):