I have RSEM-normalized-log2 transform data downloaded from Firehose and I found that there are number of missing data and filled as NAs. However, when I checked the raw counts for the same datasets, it was given as 0. So, for downstream analysis can I convert all the NAs as 0. Please guide me.
I would check the accompanying notes to see exactly what post-processing has been performed on these by the Broad Institute. It would seem likely, based on the information that you provide, that they decided to convert values of 0 to NA to avoid producing a 'negative infinity' (log2(0) == -Inf).
However, if you have raw counts already, then why not use those? - these can easily be used with EdgeR or DESeq2.
TCGA raw HTSeq counts are also held at UCSC's Xena Browser.
Thank you for the response. I have taken the normalised data so that I can proceed with the feature selection and machine learning approach directly. But the problem here is there are so many missing values and I am not able to discriminate the two classes with better accuracy, sensitivity and specificity. Also, I have done the imputation method (mean), here I got the very high accuracy. So please suggest me should I take the raw counts data and perform the pre-processing steps.
So please suggest me should I take the raw counts data and perform the
pre-processing steps.
You could try it, if you have time, and then come back with the answer if possible. There will likely be a difference between using the RSEM values and those values produced via a standard EdgeR or DESeq2 normalisation + transformation.
Thank you for the response. I have taken the normalised data so that I can proceed with the feature selection and machine learning approach directly. But the problem here is there are so many missing values and I am not able to discriminate the two classes with better accuracy, sensitivity and specificity. Also, I have done the imputation method (mean), here I got the very high accuracy. So please suggest me should I take the raw counts data and perform the pre-processing steps.
You could try it, if you have time, and then come back with the answer if possible. There will likely be a difference between using the RSEM values and those values produced via a standard EdgeR or DESeq2 normalisation + transformation.
Let me try.