Hi,
I'm currently using TCGA Data in my project. I'm trying to establish a pipeline to analyse these data but I have a question regarding to some statistical problem that I am not aware of.
Before downloading the TCGA Data, I could check on TCGA website that the available data (regarding hg19 genome - I'm using data level 3) the gene counts are estimated by RSEM:
- TCGA (1) : https://wiki.nci.nih.gov/display/tcga/rnaseq+version+2
- TCGA (2): https://wiki.nci.nih.gov/display/TCGA/Data+Levels+and+Data+Types
- RSEM: https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/1471-2105-12-323?site=bmcbioinformatics.biomedcentral.com
I just want to know, once these data are preprocessed using RSEM, if I can put these read counts table into DESEQ2. I wanna know either if could happen some statistical inconsistence using these data on DESeq2.
* I alread checked this posts:
But I still confued.
Thank you all.
---------------------------------------------------------------------------------------------------------------------------
To download data, I'm using TCGAbiolinks as following:
if (!require("TCGAbiolinks")) {
source("https://bioconductor.org/biocLite.R")
biocLite("TCGAbiolinks")
library("TCGAbiolinks")
}
if (!require("SummarizedExperiment")) {
source("https://bioconductor.org/biocLite.R")
biocLite("SummarizedExperiment")
library("SummarizedExperiment")
}
i = "TCGA-LUSC"
# Downloading data
query.exp.proj.gene = GDCquery(project = i,
legacy = TRUE,
data.category = "Gene expression",
data.type = "Gene expression quantification",
platform = "Illumina HiSeq",
file.type = "results")
GDCdownload(query.exp.proj.gene, directory = '~/GDCdata/')
setwd('~/GDCdata/RDAFiles')
exp.proj.mrna = GDCprepare(query = query.exp.proj.gene, save = TRUE, save.filename = paste0(i, "-mRNA.rda"), directory = '~/GDCdata')
# Loading RDA file
load(file = paste0('~/GDCdata/RDAFiles/', i, '-mRNA.rda'))
# Count Table \/
exp.matrix = assay(data)