Question

goseq pwf length bias plot: help interpreting plot

0

Entering edit mode

mrodrigues.fernanda ▴ 10

@mrodriguesfernanda-9433

Last seen 7.7 years ago

University of Illinois, Urbana-Champaign

Hi!
I am using goseq on some bovine data and I am using Ensembl IDs for bovine genome UMD3.1, which is not supported by goseq, so I had to input lenght and GO information manually. I tried obtaining length information 2 ways: 1) from featCounts files results; 2) from biomaRt.

Both gave me weirdly shaped plots.
Not sure if I'm doing something wrong. The graphs are in totally opposite direction then I expected them to be ( instead of going up and stabilizing as I expected, they are going down).

Below are the codes of how I obtained length info and plots, as well as the plots

1) Using featureCounts results:

featCountsData <- read.table("charolais_145A_TGACCA.featCounts.txt", header=TRUE, comment.char = "#", sep="\t")

featCountsLengthData <- as.numeric(featCountsData[,"Length"])
names(featCountsLengthData) <- featCountsData$Geneid

LenData <- names(geneLengthData) %in% names(genes)
LenData <- geneLengthData[LenData]

pwf <- nullp(DEgenes = genes, bias.data = LenData, plot.fit = TRUE)

2) Using Biomart:

txsData <- makeTxDbFromBiomart(biomart ="ensembl", dataset = "btaurus_gene_ensembl")
txsByGene <- transcriptsBy(txsData,"gene")
geneLengthData <- median(width(txsByGene))

BiomartLenData <- names(geneLengthData) %in% names(genes)
BiomartLenData <- geneLengthData[testLenData]

pwf <- nullp(DEgenes = genes, bias.data = testLenData, plot.fit = TRUE)

PLOT 1: Length info obtained from featureCounts:

https://drive.google.com/file/d/0B6Ho35U9KAepTGxuR0FxdXc5UUU/view?usp=sharing

PLOT 2: Length info obtained from biomaRt

https://drive.google.com/file/d/0B6Ho35U9KAepTnk2UGdvQ0VxeUU/view?usp=sharing

Any help is appreciated!

goseq • 2.5k views

ADD COMMENT • link updated 8.7 years ago by Nadia Davidson ▴ 320 • written 8.7 years ago by mrodrigues.fernanda ▴ 10

score 1 · Answer 1 · 2016-04-07

1

Entering edit mode

Nadia Davidson ▴ 320

@nadia-davidson-5739

Last seen 5.7 years ago

Australia

Hi,

I think your gene lengths and what you've done with goseq should be fine. One thing you could check, is if you get the same plots when you order the genes in "LenData" the same as in "genes". Otherwise, to work out why your plot is inverted, I think you'll need to go back to the DE analysis. Can you think of any reason why short genes would be more likely to be DE from the biology of the experiment? Perhaps a class of gene? Some details about the DE analysis might also help us to work out the source of the bias. For example what program did you use? What sort of cut-offs did you apply? How many genes do you get as DE?

Cheers,

Nadia.

ADD COMMENT • link 8.7 years ago Nadia Davidson ▴ 320

0

Entering edit mode

Hi Nadia
Thank you so much for your response and e-mail.

Regarding my analysis previously to go seq, I analyzed my RNA-seq data for differential expression using sva (due to the presence of unknown batch effects) and edgeR. I obtained 507 differentially expressed genes at a 0.01 FDR.

And all genes inputed into goseq were genes significant at 0.25 FDR for an anova analysis (cut-off which I previously used for module detection in WGCNA).
I am not sure if that could have anything to do with the analysis.

Any thoughts?

Thank you!!!

ADD REPLY • link 8.7 years ago mrodrigues.fernanda ▴ 10

1

Entering edit mode

Hi,

What happens if you input all genes and not just those from the anova analysis? Do you get a regular looking plot. Your analysis sounds a bit different from the one in the vignette, so it seems possible to me that the biases could be different.

Cheers, Nadia.

ADD REPLY • link 8.7 years ago Nadia Davidson ▴ 320

0

Entering edit mode

Hi Nadia!
I tried including all genes in the analysis (without any previous filtering) and now I get a regular plot, like in the tutorial.

It seems like the problem has been solved. Thank you so much for your help!

ADD REPLY • link 8.7 years ago mrodrigues.fernanda ▴ 10