Hi guys,
I'm relatively new to any type of analytics work and Rstudio. However, when inputting my TCGA data and doing all of the basic steps, my volcano plot and log2foldchange values seem to all be the opposite of what was expected. I understand perhaps this could just be the results of the data I have for my study, however I feel uncertain about it since every significant gene I am checking has previous known (without 100% certainty) logFC opposite to my study. I have attached the code below just to ask if it seems I have messed up my parameters or anything of the sorts. Much appreciated.
Code should be placed in three backticks as shown below
annotables_grch38 <- read_csv("annotables_grch38.csv")
## File ID to TCGA ID ##
# file_list.csv contains UUIDs
PatientEnsemblID<-read.csv("file_list.csv",header=FALSE)
colnames(PatientEnsemblID)<-"file_id" #provides a column header -> file_id
for (file in PatientEnsemblID){
print(file)
barcode<-UUIDtoBarcode(file, from_type = "file_id")
}
PatientIDSubtypeStatus<-read.csv("colDataTCGA.csv") #reading in colDataTCGA
colnames(barcode) <- c("UUIG", "TCGA")
TCGA_raw_count_matrix <- read.csv("TCGA_raw_count_matrix.csv")
TCGA_CDR_SupplementalTableS1 <- read_excel("TCGA-CDR-SupplementalTableS1.xlsx")
colnames(barcode) <-c("file_id", "ID")
MergedByFileIDandPatientID <- left_join(barcode, PatientIDSubtypeStatus, by = "ID")
MergedByFileIDandPatientID$bcr_patient_barcode = substr(MergedByFileIDandPatientID$ID, 1, 12)
PatientBarcodes <- t(MergedByFileIDandPatientID$bcr_patient_barcode)
colnames(TCGA_raw_count_matrix)
colnames(TCGA_raw_count_matrix) <- c("ID", patient)
BarcodeMerged <- left_join(barcode, MergedByFileIDandPatientID, by = "ID")
BarcodeMerged$bcr_patient_barcode = substr(BarcodeMerged$ID, 1, 12)
capFirst <- function(s) {
paste(toupper(substring(s, 1, 1)), substring(s, 2), sep = "")
}
BarcodeMerged$Subtype <- capFirst(BarcodeMerged$Subtype)
FilterForNeed <- filter(BarcodeMerged, Subtype == "Basal")
MetaDataFiltered <- select(FilterForNeed, bcr_patient_barcode, TP53)
MetaDataFiltered <- data.frame(FilterForNeed[, c("bcr_patient_barcode", "TP53")])
TCGAcorrelatingPatientBarcode <- TCGA_raw_count_matrix[, c("ID", MetaDataFiltered$bcr_patient_barcode)]
dfForMetaDataFiltered <- as.data.frame(MetaDataFiltered)
colnames(dfForMetaDataFiltered) <- c("Patient", "TP53")
rownames(TCGAcorrelatingPatientBarcode) <- TCGAcorrelatingPatientBarcode[,1]
TCGAcorrelatingPatientBarcode <- TCGAcorrelatingPatientBarcode[,-1]
ddsChanged <- DESeqDataSetFromMatrix(countData = TCGAcorrelatingPatientBarcode,
colData = dfForMetaDataFiltered,
design= ~ TP53 )
results
ddsChanged$TP53 <- factor(ddsChanged$TP53, levels = c("WT","MUT"))
dim(ddsChanged)
dim(ddsChanged[rowSums(counts(ddsChanged) >=1 ) >= 50])
ddsChanged <- ddsChanged[rowSums(counts(ddsChanged) >=1 ) >= 50]
vsdChanged <- vst(ddsChanged, blind=FALSE)
colnames(vsdChanged)
plotPCA(vsdChanged, intgroup=c("TP53"))
plotPCA(vsdChanged, intgroup=c("TP53")) +
ggtitle("Basal TP53 - WT (29) vs MUT (142)")
ddsChanged <- DESeq(ddsChanged)
resultsNames(ddsChanged)
MUTvsWTChanged <-results(ddsChanged, tidy=TRUE, contrast=c("TP53","MUT","WT"), independentFiltering = TRUE , alpha = 0.05)
MUTvsWTChanged <- as_tibble(MUTvsWTChanged)
Many thanks
Your question is very unclear and you may improve your post. I do not understand how can we help you, because you ask us to why your results are different from what you expect (but we do not know what is expected, is it based on previous data ?). You do not mention the comparison you want to perform, maybe you just inverted the groups compared, but you don't provide the information on which comparisons should be made.