Question

TCGA data analysis

0

Entering edit mode

lily ▴ 20

@lily-11438

Last seen 3.9 years ago

India

I have RSEM-normalized-log2 transform data downloaded from Firehose and I found that there are number of missing data and filled as NAs. However, when I checked the raw counts for the same datasets, it was given as 0. So, for downstream analysis can I convert all the NAs as 0. Please guide me.

RSEM • 1.8k views

ADD COMMENT • link 4.0 years ago lily ▴ 20

score 0 · Answer 1 · 2021-04-08

0

Entering edit mode

Kevin Blighe ★ 4.0k

@kevin

Last seen 18 days ago

Republic of Ireland

I would check the accompanying notes to see exactly what post-processing has been performed on these by the Broad Institute. It would seem likely, based on the information that you provide, that they decided to convert values of 0 to NA to avoid producing a 'negative infinity' (log2(0) == -Inf).

However, if you have raw counts already, then why not use those? - these can easily be used with EdgeR or DESeq2.

TCGA raw HTSeq counts are also held at UCSC's Xena Browser.

Kevin

ADD COMMENT • link 4.0 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

Thank you for the response. I have taken the normalised data so that I can proceed with the feature selection and machine learning approach directly. But the problem here is there are so many missing values and I am not able to discriminate the two classes with better accuracy, sensitivity and specificity. Also, I have done the imputation method (mean), here I got the very high accuracy. So please suggest me should I take the raw counts data and perform the pre-processing steps.

ADD REPLY • link 4.0 years ago lily ▴ 20

0

Entering edit mode

So please suggest me should I take the raw counts data and perform the pre-processing steps.

You could try it, if you have time, and then come back with the answer if possible. There will likely be a difference between using the RSEM values and those values produced via a standard EdgeR or DESeq2 normalisation + transformation.