Hi!
I am using goseq on some bovine data and I am using Ensembl IDs for bovine genome UMD3.1, which is not supported by goseq, so I had to input lenght and GO information manually. I tried obtaining length information 2 ways: 1) from featCounts files results; 2) from biomaRt.
Both gave me weirdly shaped plots.
Not sure if I'm doing something wrong. The graphs are in totally opposite direction then I expected them to be ( instead of going up and stabilizing as I expected, they are going down).
Below are the codes of how I obtained length info and plots, as well as the plots
1) Using featureCounts results:
featCountsData <- read.table("charolais_145A_TGACCA.featCounts.txt", header=TRUE, comment.char = "#", sep="\t")
featCountsLengthData <- as.numeric(featCountsData[,"Length"])
names(featCountsLengthData) <- featCountsData$Geneid
LenData <- names(geneLengthData) %in% names(genes)
LenData <- geneLengthData[LenData]
pwf <- nullp(DEgenes = genes, bias.data = LenData, plot.fit = TRUE)
2) Using Biomart:
txsData <- makeTxDbFromBiomart(biomart ="ensembl", dataset = "btaurus_gene_ensembl")
txsByGene <- transcriptsBy(txsData,"gene")
geneLengthData <- median(width(txsByGene))
BiomartLenData <- names(geneLengthData) %in% names(genes)
BiomartLenData <- geneLengthData[testLenData]
pwf <- nullp(DEgenes = genes, bias.data = testLenData, plot.fit = TRUE)
PLOT 1: Length info obtained from featureCounts:
https://drive.google.com/file/d/0B6Ho35U9KAepTGxuR0FxdXc5UUU/view?usp=sharing
PLOT 2: Length info obtained from biomaRt
https://drive.google.com/file/d/0B6Ho35U9KAepTnk2UGdvQ0VxeUU/view?usp=sharing
Any help is appreciated!
Hi Nadia
Thank you so much for your response and e-mail.
Regarding my analysis previously to go seq, I analyzed my RNA-seq data for differential expression using sva (due to the presence of unknown batch effects) and edgeR. I obtained 507 differentially expressed genes at a 0.01 FDR.
And all genes inputed into goseq were genes significant at 0.25 FDR for an anova analysis (cut-off which I previously used for module detection in WGCNA).
I am not sure if that could have anything to do with the analysis.
Any thoughts?
Thank you!!!
Hi,
What happens if you input all genes and not just those from the anova analysis? Do you get a regular looking plot. Your analysis sounds a bit different from the one in the vignette, so it seems possible to me that the biases could be different.
Cheers, Nadia.
Hi Nadia!
I tried including all genes in the analysis (without any previous filtering) and now I get a regular plot, like in the tutorial.
It seems like the problem has been solved. Thank you so much for your help!