This dataset focuses on Illumina-based, RNA-Sequencing data within The Cancer Genome Atlas (TCGA). TCGA contains clinical and molecular data for 11,000+ tumor samples across many tumor types. This dataset contains gene-expression data for 9264 of these samples across 24 tissue types.

These data have been prepared using a computational pipeline that uses the Rsubread package for aligning the data to the human reference genome. The data values are summarized at the gene level. You can read more about the process that was used to generate the data in the papers cited below. These values are semi-raw counts, which you can typically use for differential expression analyses. If you want to use the data for other purposes, look for a similarly named dataset with TPM values.

Data source(s):

Citation(s):