Help With RNA-seq

0

Entering edit mode

Valerie Obenchain ★ 6.8k

@valerie-obenchain-4275

Last seen 3.2 years ago

United States

Hi Tina, I'm pasting in your message below so we can keep all communication on the mailing list. The 'samplesize' argument looks wrong in your call to compute the priors. CDP.Poi<- getPriors.Pois(CD, samplesize = 2, takemean = TRUE, cl = cl) Why have you set this to 2? It should be a much larger number. Try using the default (1e5), CDP.Poi<- getPriors.Pois(CD, cl = cl) Does this speed it up? Valerie ## ------------------------------------------------------------------- ---- Hi Valerie, Thank you once again for all your help. As you requested in the previous email for a clearer explanation to the problem I am encountering at present. head(D) R1L1Kidney R1L2Liver R1L3Kidney R1L4Liver R1L6Liver R1L7Kidney R1L8Liver 10 0 0 0 0 0 2 1 15 4 35 7 32 31 3 29 17 0 2 0 0 0 1 0 18 110 177 131 135 141 149 148 19 12685 9246 13204 9312 8746 12403 8496 22 0 1 0 0 0 0 0 R2L2Kidney R2L3Liver R2L6Kidney 10 4 0 3 15 6 34 7 17 1 0 0 18 112 145 118 19 13031 9070 13268 22 1 0 0 names(D) [1] "R1L1Kidney" "R1L2Liver" "R1L3Kidney" "R1L4Liver" "R1L6Liver" [6] "R1L7Kidney" "R1L8Liver" "R2L2Kidney" "R2L3Liver" "R2L6Kidney" dim(D) [1] 22490 10 g<- gsub("R[1-2]L[1-8]", "", colnames(D)) > g [1] "Kidney" "Liver" "Kidney" "Liver" "Liver" "Kidney" "Liver" "Kidney" [9] "Liver" "Kidney" > d<- DGEList(counts = D, group = substr(colnames(D), 5, 30)) Calculating library sizes from column totals. > d An object of class "DGEList" $samples group lib.size norm.factors R1L1Kidney Kidney 1804977 1 R1L2Liver Liver 1691734 1 R1L3Kidney Kidney 1855190 1 R1L4Liver Liver 1696308 1 R1L6Liver Liver 1630795 1 R1L7Kidney Kidney 1742426 1 R1L8Liver Liver 1575425 1 R2L2Kidney Kidney 1927517 1 R2L3Liver Liver 1767339 1 R2L6Kidney Kidney 1963420 1 $counts R1L1Kidney R1L2Liver R1L3Kidney R1L4Liver R1L6Liver R1L7Kidney R1L8Liver 10 0 0 0 0 0 2 1 15 4 35 7 32 31 3 29 17 0 2 0 0 0 1 0 18 110 177 131 135 141 149 148 19 12685 9246 13204 9312 8746 12403 8496 R2L2Kidney R2L3Liver R2L6Kidney 10 4 0 3 15 6 34 7 17 1 0 0 18 112 145 118 19 13031 9070 13268 22485 more rows ... $all.zeros 10 15 17 18 19 FALSE FALSE FALSE FALSE FALSE 22485 more elements ... d$samples group lib.size norm.factors R1L1Kidney Kidney 1804977 1 R1L2Liver Liver 1691734 1 R1L3Kidney Kidney 1855190 1 R1L4Liver Liver 1696308 1 R1L6Liver Liver 1630795 1 R1L7Kidney Kidney 1742426 1 R1L8Liver Liver 1575425 1 R2L2Kidney Kidney 1927517 1 R2L3Liver Liver 1767339 1 R2L6Kidney Kidney 1963420 1 > names(d) [1] "samples" "counts" "all.zeros" > dim(d) [1] 22490 10 Yes the plot does work . However, it is the call to CDP.Poi at priors that is taking that long. With regards to the rest of the code it will follow the same approach with as the one above with slight changes. Also the call to getPriors.NB() is also taking very long as well. These two calls are in essence the main contributions if the rest of the code is to work. Thank you so much for your help. Have a pleasant day. ## ------------------------------------------------------------------- ---- On 01/29/12 20:54, Valerie Obenchain wrote: > On 01/29/12 07:49, Tina Asante Boahene wrote: >> Hi all, >> >> I am still having problems with bayseq >> >> Having followed the pdf document associated with it and also >> tailoring it to the Marioni et al data I am using, it seems that the >> code has been running for over two days without any results. >> >> I am wondering this code be down to my code. >> >> I have therefore attached my code to this email hoping that someone >> can help me solve this problem, thank you. >> >> >> library(baySeq) >> library(edgeR) >> library(limma) >> library(snow) >> >> cl<- makeCluster(4, "SOCK") >> >> >> ##Calculating normalization factors## >> D=MA.subsetA$M >> head(D) >> names(D) >> dim(D) > Please provide the output of these (i.e., head, names, dim). >> >> g<- gsub("R[1-2]L[1-8]", "", colnames(D)) >> d<- DGEList(counts = D, group = substr(colnames(D), 5, 30)) >> d$samples >> names(d) >> dim(d) > These values would be helpful too. >> >> >> CD<- new("countData", data = as.matrix(MA.subsetA$M), libsizes = >> as.integer(d$samples$lib.size), replicates = g) >> groups(CD)<- list(rep(1, ncol(CD)), g) >> >> CD at libsizes<- getLibsizes(CD) >> >> plotMA.CD(CD, samplesA = c(1,3,6,8,10), samplesB = c(2,4,5,7,9), col >> = c(rep("red", >> 100), rep("black", 900))) > > Did this work? Your original question was about plotMA.CD not > recognizing your groups. Does the plot work for you now? >> >> ## Optionally adding annotation details to the @annotation slot of >> the countData object. ## >> CD at annotation<- data.frame(name = paste("gene", 1:1000, sep = "_")) >> >> >> >> >> ### Poisson-Gamma Approach ### >> >> CDP.Poi<- getPriors.Pois(CD, samplesize = 2, takemean = TRUE, cl = cl) >> >> CDP.Poi at priors ## This takes time ### > Is the call to getPriors.Pois() that has been running for over 2 days? > If not, please specify which function call is taking so long. Did the > rest of the code below work for you? > > Valerie >> >> CDPost.Poi<- getLikelihoods.Pois(CDP.Poi, pET = "BIC", cl = cl) >> CDPost.Poi at estProps >> >> CDPost.Poi at posteriors[1:10, ] ## A list of the posterior likelihoods >> each model for the first 10 genes ## >> CDPost.Poi at posteriors[101:110, ] ## A list of the posterior >> likelihoods each model for the genes from 101 to 110 ## >> >> >> ### Negative-Binomial Approach ### >> >> CDP.NBML<- getPriors.NB(CD, samplesize = 1000, estimation = "QL", cl >> = cl) >> >> CDPost.NBML<- getLikelihoods.NB(CDP.NBML, pET = 'BIC', cl = cl) >> >> CDPost.NBML at estProps >> >> CDPost.NBML at posteriors[1:10,] >> >> CDPost.NBML at posteriors[101:110,] >> >> >> Kind Regards >> >> Tina >> ________________________________________ >> From: Valerie Obenchain [vobencha at fhcrc.org] >> Sent: 25 January 2012 18:14 >> To: Tina Asante Boahene >> Cc: bioconductor at stat.math.ethz.ch >> Subject: Re: [BioC] Help With RNA-seq >> >> Hi Tina, >> >> It's difficult to help without knowing what your data look like or what >> error message you are seeing. Both pieces of information would be >> helpful. >> >> For starters I think you need to provide 'replicate' and 'groups' >> arguments when you create your new "countData" object. Depending on what >> order your data are in you need something like, >> >> groups<- list(NDE = c(1,1,1,1,1,1,1,1,1,1), DE = >> c(1,2,1,2,2,1,2,1,2,1)) >> replicates<- c("Kidney", "Liver", "Kidney", "Liver", "Liver', >> "Kidney", "Liver", "Kidney", "Liver", "Kidney") >> >> Then create your "countData" with these variables, >> >> CD<- new("countData", data = as.matrix(MA.subsetA$M), libsizes = >> as.integer(d$samples$lib.size), >> replicates = replicates, groups = groups) >> >> Now look at the CD object and make sure the columns are labeled as they >> should be and the other slot values make sense. The MA plot call would >> look something like, >> >> plotMA.CD(CD, samplesA = "Kidney", samplesB = "Liver") >> >> The author used the red and black colors for the vignette plot because >> there was a known structure to the data; the first 100 counts showed >> differential expression and the last 900 did not. You probably have a >> different situation in your data so using the same color scheme may not >> make sense. >> >> Valerie >> >> >> On 01/23/2012 06:13 AM, Tina Asante Boahene wrote: >>> Hi all, >>> >>> I am conducting some analysis using the Marioni et al data. >>> >>> However, I am having a bit of trouble using my data to conduct the >>> analysis based on the baySeq package. >>> >>> And I was wondering if you could stir me in the right direction. >>> >>> I have already used edgeR to find the library sizes for the ten >>> libraries I have for my data as well as for the groups (Liver and >>> Kidney) as stated below. >>> >>> >>> library(baySeq) >>> library(edgeR) >>> library(limma) >>> library(snow) >>> >>> cl<- makeCluster(4, "SOCK") >>> >>> >>> ##Calculating normalization factors## >>> D=MA.subsetA$M >>> head(D) >>> names(D) >>> dim(D) >>> >>> g<- gsub("R[1-2]L[1-8]", "", colnames(D)) >>> d<- DGEList(counts = D, group = substr(colnames(D), 5, 30)) >>> d$samples >>> names(d) >>> dim(d) >>> >>> >>> I will like to know how to model my code in order to produce the MA >>> plot for count data >>> >>> >>> This is what I have, however it runs with the wrong response. >>> >>> Can someone help me fix this please. >>> >>> CD<- new("countData", data = as.matrix(MA.subsetA$M), libsizes = >>> as.integer(d$samples$lib.size)) >>> >>> plotMA.CD(CD, samplesA = 1:5, samplesB = 6:10, col = c(rep("red", >>> 100), rep("black", 900))) >>> >>> >>> How can I get it to recognise the "groups" as "g" (Library and Kidney) >>> >>> This is the output for the groups [1] "Kidney" "Liver" "Kidney" >>> "Liver" "Liver" "Kidney" "Liver" "Kidney" "Liver" "Kidney" >>> >>> thank you. >>> >>> >>> >>> >>> >>> >>> Kind Regards >>> >>> Tina >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

Annotation Normalization Kidney edgeR baySeq Annotation Normalization Kidney edgeR baySeq • 1.5k views

ADD COMMENT • link 13.2 years ago Valerie Obenchain ★ 6.8k

0

Entering edit mode

Valerie Obenchain ★ 6.8k

@valerie-obenchain-4275

Last seen 3.2 years ago

United States

Thinking about this a bit more ... Have you tried modifying the 'maxit' argument to getPriors.Pois() ? It's possible this threshold could be reduced; it would also confirm that the algorithm is converging (if you reduced it to a point where you saw an error). If I understand correctly, you are not seeing any error messages but getPirors.Pois() and getPrior.NB() are taking a long time to run. (fyi per your comment below, CDP.Poi at priors is just accessing the data in the slot of the object; it is not a function). Without having your data to test on it is difficult to know what is going on. It would be useful to know what kind of machine you are working on, how may cpus, how much memory etc. I'm cc'ing Thomas (package author) in case he has ideas. Valerie On 01/31/2012 07:41 PM, Valerie Obenchain wrote: > Hi Tina, > > I'm pasting in your message below so we can keep all communication on > the mailing list. > > The 'samplesize' argument looks wrong in your call to compute the priors. > > CDP.Poi<- getPriors.Pois(CD, samplesize = 2, takemean = TRUE, cl = > cl) > > Why have you set this to 2? It should be a much larger number. Try > using the default (1e5), > > CDP.Poi<- getPriors.Pois(CD, cl = cl) > > Does this speed it up? > > Valerie > > ## > -------------------------------------------------------------------- --- > > Hi Valerie, > > Thank you once again for all your help. > > As you requested in the previous email for a clearer explanation to > the problem I am encountering at present. > > head(D) > R1L1Kidney R1L2Liver R1L3Kidney R1L4Liver R1L6Liver R1L7Kidney > R1L8Liver > 10 0 0 0 0 0 > 2 1 > 15 4 35 7 32 31 > 3 29 > 17 0 2 0 0 0 > 1 0 > 18 110 177 131 135 141 > 149 148 > 19 12685 9246 13204 9312 8746 12403 > 8496 > 22 0 1 0 0 0 > 0 0 > R2L2Kidney R2L3Liver R2L6Kidney > 10 4 0 3 > 15 6 34 7 > 17 1 0 0 > 18 112 145 118 > 19 13031 9070 13268 > 22 1 0 0 > > names(D) > [1] "R1L1Kidney" "R1L2Liver" "R1L3Kidney" "R1L4Liver" "R1L6Liver" > [6] "R1L7Kidney" "R1L8Liver" "R2L2Kidney" "R2L3Liver" "R2L6Kidney" > > dim(D) > [1] 22490 10 > > g<- gsub("R[1-2]L[1-8]", "", colnames(D)) > >> g > [1] "Kidney" "Liver" "Kidney" "Liver" "Liver" "Kidney" "Liver" > "Kidney" > [9] "Liver" "Kidney" > > >> d<- DGEList(counts = D, group = substr(colnames(D), 5, 30)) > Calculating library sizes from column totals. > >> d > An object of class "DGEList" > $samples > group lib.size norm.factors > R1L1Kidney Kidney 1804977 1 > R1L2Liver Liver 1691734 1 > R1L3Kidney Kidney 1855190 1 > R1L4Liver Liver 1696308 1 > R1L6Liver Liver 1630795 1 > R1L7Kidney Kidney 1742426 1 > R1L8Liver Liver 1575425 1 > R2L2Kidney Kidney 1927517 1 > R2L3Liver Liver 1767339 1 > R2L6Kidney Kidney 1963420 1 > > $counts > R1L1Kidney R1L2Liver R1L3Kidney R1L4Liver R1L6Liver R1L7Kidney > R1L8Liver > 10 0 0 0 0 0 > 2 1 > 15 4 35 7 32 31 > 3 29 > 17 0 2 0 0 0 > 1 0 > 18 110 177 131 135 141 > 149 148 > 19 12685 9246 13204 9312 8746 12403 > 8496 > R2L2Kidney R2L3Liver R2L6Kidney > 10 4 0 3 > 15 6 34 7 > 17 1 0 0 > 18 112 145 118 > 19 13031 9070 13268 > 22485 more rows ... > > $all.zeros > 10 15 17 18 19 > FALSE FALSE FALSE FALSE FALSE > 22485 more elements ... > > d$samples > group lib.size norm.factors > R1L1Kidney Kidney 1804977 1 > R1L2Liver Liver 1691734 1 > R1L3Kidney Kidney 1855190 1 > R1L4Liver Liver 1696308 1 > R1L6Liver Liver 1630795 1 > R1L7Kidney Kidney 1742426 1 > R1L8Liver Liver 1575425 1 > R2L2Kidney Kidney 1927517 1 > R2L3Liver Liver 1767339 1 > R2L6Kidney Kidney 1963420 1 > >> names(d) > [1] "samples" "counts" "all.zeros" > > >> dim(d) > [1] 22490 10 > > Yes the plot does work . > > However, it is the call to CDP.Poi at priors that is taking that long. > > With regards to the rest of the code it will follow the same approach > with as the one above with slight changes. > > Also the call to getPriors.NB() is also taking very long as well. > > These two calls are in essence the main contributions if the rest of > the code is to work. > > Thank you so much for your help. > > Have a pleasant day. > > > ## > -------------------------------------------------------------------- --- > > > > > > On 01/29/12 20:54, Valerie Obenchain wrote: >> On 01/29/12 07:49, Tina Asante Boahene wrote: >>> Hi all, >>> >>> I am still having problems with bayseq >>> >>> Having followed the pdf document associated with it and also >>> tailoring it to the Marioni et al data I am using, it seems that the >>> code has been running for over two days without any results. >>> >>> I am wondering this code be down to my code. >>> >>> I have therefore attached my code to this email hoping that someone >>> can help me solve this problem, thank you. >>> >>> >>> library(baySeq) >>> library(edgeR) >>> library(limma) >>> library(snow) >>> >>> cl<- makeCluster(4, "SOCK") >>> >>> >>> ##Calculating normalization factors## >>> D=MA.subsetA$M >>> head(D) >>> names(D) >>> dim(D) >> Please provide the output of these (i.e., head, names, dim). >>> >>> g<- gsub("R[1-2]L[1-8]", "", colnames(D)) >>> d<- DGEList(counts = D, group = substr(colnames(D), 5, 30)) >>> d$samples >>> names(d) >>> dim(d) >> These values would be helpful too. >>> >>> >>> CD<- new("countData", data = as.matrix(MA.subsetA$M), libsizes = >>> as.integer(d$samples$lib.size), replicates = g) >>> groups(CD)<- list(rep(1, ncol(CD)), g) >>> >>> CD at libsizes<- getLibsizes(CD) >>> >>> plotMA.CD(CD, samplesA = c(1,3,6,8,10), samplesB = c(2,4,5,7,9), col >>> = c(rep("red", >>> 100), rep("black", 900))) >> >> Did this work? Your original question was about plotMA.CD not >> recognizing your groups. Does the plot work for you now? >>> >>> ## Optionally adding annotation details to the @annotation slot of >>> the countData object. ## >>> CD at annotation<- data.frame(name = paste("gene", 1:1000, sep = "_")) >>> >>> >>> >>> >>> ### Poisson-Gamma Approach ### >>> >>> CDP.Poi<- getPriors.Pois(CD, samplesize = 2, takemean = TRUE, cl = cl) >>> >>> CDP.Poi at priors ## This takes time ### >> Is the call to getPriors.Pois() that has been running for over 2 >> days? If not, please specify which function call is taking so long. >> Did the rest of the code below work for you? >> >> Valerie >>> >>> CDPost.Poi<- getLikelihoods.Pois(CDP.Poi, pET = "BIC", cl = cl) >>> CDPost.Poi at estProps >>> >>> CDPost.Poi at posteriors[1:10, ] ## A list of the posterior >>> likelihoods each model for the first 10 genes ## >>> CDPost.Poi at posteriors[101:110, ] ## A list of the posterior >>> likelihoods each model for the genes from 101 to 110 ## >>> >>> >>> ### Negative-Binomial Approach ### >>> >>> CDP.NBML<- getPriors.NB(CD, samplesize = 1000, estimation = "QL", cl >>> = cl) >>> >>> CDPost.NBML<- getLikelihoods.NB(CDP.NBML, pET = 'BIC', cl = cl) >>> >>> CDPost.NBML at estProps >>> >>> CDPost.NBML at posteriors[1:10,] >>> >>> CDPost.NBML at posteriors[101:110,] >>> >>> >>> Kind Regards >>> >>> Tina >>> ________________________________________ >>> From: Valerie Obenchain [vobencha at fhcrc.org] >>> Sent: 25 January 2012 18:14 >>> To: Tina Asante Boahene >>> Cc: bioconductor at stat.math.ethz.ch >>> Subject: Re: [BioC] Help With RNA-seq >>> >>> Hi Tina, >>> >>> It's difficult to help without knowing what your data look like or what >>> error message you are seeing. Both pieces of information would be >>> helpful. >>> >>> For starters I think you need to provide 'replicate' and 'groups' >>> arguments when you create your new "countData" object. Depending on >>> what >>> order your data are in you need something like, >>> >>> groups<- list(NDE = c(1,1,1,1,1,1,1,1,1,1), DE = >>> c(1,2,1,2,2,1,2,1,2,1)) >>> replicates<- c("Kidney", "Liver", "Kidney", "Liver", "Liver', >>> "Kidney", "Liver", "Kidney", "Liver", "Kidney") >>> >>> Then create your "countData" with these variables, >>> >>> CD<- new("countData", data = as.matrix(MA.subsetA$M), libsizes = >>> as.integer(d$samples$lib.size), >>> replicates = replicates, groups = groups) >>> >>> Now look at the CD object and make sure the columns are labeled as they >>> should be and the other slot values make sense. The MA plot call would >>> look something like, >>> >>> plotMA.CD(CD, samplesA = "Kidney", samplesB = "Liver") >>> >>> The author used the red and black colors for the vignette plot because >>> there was a known structure to the data; the first 100 counts showed >>> differential expression and the last 900 did not. You probably have a >>> different situation in your data so using the same color scheme may not >>> make sense. >>> >>> Valerie >>> >>> >>> On 01/23/2012 06:13 AM, Tina Asante Boahene wrote: >>>> Hi all, >>>> >>>> I am conducting some analysis using the Marioni et al data. >>>> >>>> However, I am having a bit of trouble using my data to conduct the >>>> analysis based on the baySeq package. >>>> >>>> And I was wondering if you could stir me in the right direction. >>>> >>>> I have already used edgeR to find the library sizes for the ten >>>> libraries I have for my data as well as for the groups (Liver and >>>> Kidney) as stated below. >>>> >>>> >>>> library(baySeq) >>>> library(edgeR) >>>> library(limma) >>>> library(snow) >>>> >>>> cl<- makeCluster(4, "SOCK") >>>> >>>> >>>> ##Calculating normalization factors## >>>> D=MA.subsetA$M >>>> head(D) >>>> names(D) >>>> dim(D) >>>> >>>> g<- gsub("R[1-2]L[1-8]", "", colnames(D)) >>>> d<- DGEList(counts = D, group = substr(colnames(D), 5, 30)) >>>> d$samples >>>> names(d) >>>> dim(d) >>>> >>>> >>>> I will like to know how to model my code in order to produce the MA >>>> plot for count data >>>> >>>> >>>> This is what I have, however it runs with the wrong response. >>>> >>>> Can someone help me fix this please. >>>> >>>> CD<- new("countData", data = as.matrix(MA.subsetA$M), libsizes = >>>> as.integer(d$samples$lib.size)) >>>> >>>> plotMA.CD(CD, samplesA = 1:5, samplesB = 6:10, col = c(rep("red", >>>> 100), rep("black", 900))) >>>> >>>> >>>> How can I get it to recognise the "groups" as "g" (Library and Kidney) >>>> >>>> This is the output for the groups [1] "Kidney" "Liver" "Kidney" >>>> "Liver" "Liver" "Kidney" "Liver" "Kidney" "Liver" "Kidney" >>>> >>>> thank you. >>>> >>>> >>>> >>>> >>>> >>>> >>>> Kind Regards >>>> >>>> Tina >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 13.2 years ago Valerie Obenchain ★ 6.8k

0

Entering edit mode

Well, this is embarrassing... A bug in the code for getPriors.Pois meant that it tripped into an infinite loop. I've posted a fix to the release track of Bioconductor; version 1.8.2 should allow getPriors.Pois to work. I suppose that nobody finding this before means that most people are using the negative binomial approach - as they should be! Tom On 01/02/12 16:50, Valerie Obenchain wrote: > Thinking about this a bit more ... > > Have you tried modifying the 'maxit' argument to getPriors.Pois() ? > It's possible this threshold could be reduced; it would also confirm > that the algorithm is converging (if you reduced it to a point where > you saw an error). > > If I understand correctly, you are not seeing any error messages but > getPirors.Pois() and getPrior.NB() are taking a long time to run. (fyi > per your comment below, CDP.Poi at priors is just accessing the data in > the slot of the object; it is not a function). Without having your > data to test on it is difficult to know what is going on. It would be > useful to know what kind of machine you are working on, how may cpus, > how much memory etc. > > I'm cc'ing Thomas (package author) in case he has ideas. > > Valerie > > > On 01/31/2012 07:41 PM, Valerie Obenchain wrote: >> Hi Tina, >> >> I'm pasting in your message below so we can keep all communication on >> the mailing list. >> >> The 'samplesize' argument looks wrong in your call to compute the >> priors. >> >> CDP.Poi<- getPriors.Pois(CD, samplesize = 2, takemean = TRUE, cl >> = cl) >> >> Why have you set this to 2? It should be a much larger number. Try >> using the default (1e5), >> >> CDP.Poi<- getPriors.Pois(CD, cl = cl) >> >> Does this speed it up? >> >> Valerie >> >> ## >> ------------------------------------------------------------------- ---- >> >> Hi Valerie, >> >> Thank you once again for all your help. >> >> As you requested in the previous email for a clearer explanation to >> the problem I am encountering at present. >> >> head(D) >> R1L1Kidney R1L2Liver R1L3Kidney R1L4Liver R1L6Liver R1L7Kidney >> R1L8Liver >> 10 0 0 0 0 0 >> 2 1 >> 15 4 35 7 32 31 >> 3 29 >> 17 0 2 0 0 0 >> 1 0 >> 18 110 177 131 135 141 >> 149 148 >> 19 12685 9246 13204 9312 8746 >> 12403 8496 >> 22 0 1 0 0 0 >> 0 0 >> R2L2Kidney R2L3Liver R2L6Kidney >> 10 4 0 3 >> 15 6 34 7 >> 17 1 0 0 >> 18 112 145 118 >> 19 13031 9070 13268 >> 22 1 0 0 >> >> names(D) >> [1] "R1L1Kidney" "R1L2Liver" "R1L3Kidney" "R1L4Liver" "R1L6Liver" >> [6] "R1L7Kidney" "R1L8Liver" "R2L2Kidney" "R2L3Liver" "R2L6Kidney" >> >> dim(D) >> [1] 22490 10 >> >> g<- gsub("R[1-2]L[1-8]", "", colnames(D)) >> >>> g >> [1] "Kidney" "Liver" "Kidney" "Liver" "Liver" "Kidney" "Liver" >> "Kidney" >> [9] "Liver" "Kidney" >> >> >>> d<- DGEList(counts = D, group = substr(colnames(D), 5, 30)) >> Calculating library sizes from column totals. >> >>> d >> An object of class "DGEList" >> $samples >> group lib.size norm.factors >> R1L1Kidney Kidney 1804977 1 >> R1L2Liver Liver 1691734 1 >> R1L3Kidney Kidney 1855190 1 >> R1L4Liver Liver 1696308 1 >> R1L6Liver Liver 1630795 1 >> R1L7Kidney Kidney 1742426 1 >> R1L8Liver Liver 1575425 1 >> R2L2Kidney Kidney 1927517 1 >> R2L3Liver Liver 1767339 1 >> R2L6Kidney Kidney 1963420 1 >> >> $counts >> R1L1Kidney R1L2Liver R1L3Kidney R1L4Liver R1L6Liver R1L7Kidney >> R1L8Liver >> 10 0 0 0 0 0 >> 2 1 >> 15 4 35 7 32 31 >> 3 29 >> 17 0 2 0 0 0 >> 1 0 >> 18 110 177 131 135 141 >> 149 148 >> 19 12685 9246 13204 9312 8746 >> 12403 8496 >> R2L2Kidney R2L3Liver R2L6Kidney >> 10 4 0 3 >> 15 6 34 7 >> 17 1 0 0 >> 18 112 145 118 >> 19 13031 9070 13268 >> 22485 more rows ... >> >> $all.zeros >> 10 15 17 18 19 >> FALSE FALSE FALSE FALSE FALSE >> 22485 more elements ... >> >> d$samples >> group lib.size norm.factors >> R1L1Kidney Kidney 1804977 1 >> R1L2Liver Liver 1691734 1 >> R1L3Kidney Kidney 1855190 1 >> R1L4Liver Liver 1696308 1 >> R1L6Liver Liver 1630795 1 >> R1L7Kidney Kidney 1742426 1 >> R1L8Liver Liver 1575425 1 >> R2L2Kidney Kidney 1927517 1 >> R2L3Liver Liver 1767339 1 >> R2L6Kidney Kidney 1963420 1 >> >>> names(d) >> [1] "samples" "counts" "all.zeros" >> >> >>> dim(d) >> [1] 22490 10 >> >> Yes the plot does work . >> >> However, it is the call to CDP.Poi at priors that is taking that long. >> >> With regards to the rest of the code it will follow the same approach >> with as the one above with slight changes. >> >> Also the call to getPriors.NB() is also taking very long as well. >> >> These two calls are in essence the main contributions if the rest of >> the code is to work. >> >> Thank you so much for your help. >> >> Have a pleasant day. >> >> >> ## >> ------------------------------------------------------------------- ---- >> >> >> >> >> >> On 01/29/12 20:54, Valerie Obenchain wrote: >>> On 01/29/12 07:49, Tina Asante Boahene wrote: >>>> Hi all, >>>> >>>> I am still having problems with bayseq >>>> >>>> Having followed the pdf document associated with it and also >>>> tailoring it to the Marioni et al data I am using, it seems that >>>> the code has been running for over two days without any results. >>>> >>>> I am wondering this code be down to my code. >>>> >>>> I have therefore attached my code to this email hoping that someone >>>> can help me solve this problem, thank you. >>>> >>>> >>>> library(baySeq) >>>> library(edgeR) >>>> library(limma) >>>> library(snow) >>>> >>>> cl<- makeCluster(4, "SOCK") >>>> >>>> >>>> ##Calculating normalization factors## >>>> D=MA.subsetA$M >>>> head(D) >>>> names(D) >>>> dim(D) >>> Please provide the output of these (i.e., head, names, dim). >>>> >>>> g<- gsub("R[1-2]L[1-8]", "", colnames(D)) >>>> d<- DGEList(counts = D, group = substr(colnames(D), 5, 30)) >>>> d$samples >>>> names(d) >>>> dim(d) >>> These values would be helpful too. >>>> >>>> >>>> CD<- new("countData", data = as.matrix(MA.subsetA$M), libsizes = >>>> as.integer(d$samples$lib.size), replicates = g) >>>> groups(CD)<- list(rep(1, ncol(CD)), g) >>>> >>>> CD at libsizes<- getLibsizes(CD) >>>> >>>> plotMA.CD(CD, samplesA = c(1,3,6,8,10), samplesB = c(2,4,5,7,9), >>>> col = c(rep("red", >>>> 100), rep("black", 900))) >>> >>> Did this work? Your original question was about plotMA.CD not >>> recognizing your groups. Does the plot work for you now? >>>> >>>> ## Optionally adding annotation details to the @annotation slot of >>>> the countData object. ## >>>> CD at annotation<- data.frame(name = paste("gene", 1:1000, sep = "_")) >>>> >>>> >>>> >>>> >>>> ### Poisson-Gamma Approach ### >>>> >>>> CDP.Poi<- getPriors.Pois(CD, samplesize = 2, takemean = TRUE, cl = cl) >>>> >>>> CDP.Poi at priors ## This takes time ### >>> Is the call to getPriors.Pois() that has been running for over 2 >>> days? If not, please specify which function call is taking so long. >>> Did the rest of the code below work for you? >>> >>> Valerie >>>> >>>> CDPost.Poi<- getLikelihoods.Pois(CDP.Poi, pET = "BIC", cl = cl) >>>> CDPost.Poi at estProps >>>> >>>> CDPost.Poi at posteriors[1:10, ] ## A list of the posterior >>>> likelihoods each model for the first 10 genes ## >>>> CDPost.Poi at posteriors[101:110, ] ## A list of the posterior >>>> likelihoods each model for the genes from 101 to 110 ## >>>> >>>> >>>> ### Negative-Binomial Approach ### >>>> >>>> CDP.NBML<- getPriors.NB(CD, samplesize = 1000, estimation = "QL", >>>> cl = cl) >>>> >>>> CDPost.NBML<- getLikelihoods.NB(CDP.NBML, pET = 'BIC', cl = cl) >>>> >>>> CDPost.NBML at estProps >>>> >>>> CDPost.NBML at posteriors[1:10,] >>>> >>>> CDPost.NBML at posteriors[101:110,] >>>> >>>> >>>> Kind Regards >>>> >>>> Tina >>>> ________________________________________ >>>> From: Valerie Obenchain [vobencha at fhcrc.org] >>>> Sent: 25 January 2012 18:14 >>>> To: Tina Asante Boahene >>>> Cc: bioconductor at stat.math.ethz.ch >>>> Subject: Re: [BioC] Help With RNA-seq >>>> >>>> Hi Tina, >>>> >>>> It's difficult to help without knowing what your data look like or >>>> what >>>> error message you are seeing. Both pieces of information would be >>>> helpful. >>>> >>>> For starters I think you need to provide 'replicate' and 'groups' >>>> arguments when you create your new "countData" object. Depending on >>>> what >>>> order your data are in you need something like, >>>> >>>> groups<- list(NDE = c(1,1,1,1,1,1,1,1,1,1), DE = >>>> c(1,2,1,2,2,1,2,1,2,1)) >>>> replicates<- c("Kidney", "Liver", "Kidney", "Liver", "Liver', >>>> "Kidney", "Liver", "Kidney", "Liver", "Kidney") >>>> >>>> Then create your "countData" with these variables, >>>> >>>> CD<- new("countData", data = as.matrix(MA.subsetA$M), libsizes = >>>> as.integer(d$samples$lib.size), >>>> replicates = replicates, groups = groups) >>>> >>>> Now look at the CD object and make sure the columns are labeled as >>>> they >>>> should be and the other slot values make sense. The MA plot call would >>>> look something like, >>>> >>>> plotMA.CD(CD, samplesA = "Kidney", samplesB = "Liver") >>>> >>>> The author used the red and black colors for the vignette plot because >>>> there was a known structure to the data; the first 100 counts showed >>>> differential expression and the last 900 did not. You probably have a >>>> different situation in your data so using the same color scheme may >>>> not >>>> make sense. >>>> >>>> Valerie >>>> >>>> >>>> On 01/23/2012 06:13 AM, Tina Asante Boahene wrote: >>>>> Hi all, >>>>> >>>>> I am conducting some analysis using the Marioni et al data. >>>>> >>>>> However, I am having a bit of trouble using my data to conduct the >>>>> analysis based on the baySeq package. >>>>> >>>>> And I was wondering if you could stir me in the right direction. >>>>> >>>>> I have already used edgeR to find the library sizes for the ten >>>>> libraries I have for my data as well as for the groups (Liver and >>>>> Kidney) as stated below. >>>>> >>>>> >>>>> library(baySeq) >>>>> library(edgeR) >>>>> library(limma) >>>>> library(snow) >>>>> >>>>> cl<- makeCluster(4, "SOCK") >>>>> >>>>> >>>>> ##Calculating normalization factors## >>>>> D=MA.subsetA$M >>>>> head(D) >>>>> names(D) >>>>> dim(D) >>>>> >>>>> g<- gsub("R[1-2]L[1-8]", "", colnames(D)) >>>>> d<- DGEList(counts = D, group = substr(colnames(D), 5, 30)) >>>>> d$samples >>>>> names(d) >>>>> dim(d) >>>>> >>>>> >>>>> I will like to know how to model my code in order to produce the >>>>> MA plot for count data >>>>> >>>>> >>>>> This is what I have, however it runs with the wrong response. >>>>> >>>>> Can someone help me fix this please. >>>>> >>>>> CD<- new("countData", data = as.matrix(MA.subsetA$M), libsizes = >>>>> as.integer(d$samples$lib.size)) >>>>> >>>>> plotMA.CD(CD, samplesA = 1:5, samplesB = 6:10, col = c(rep("red", >>>>> 100), rep("black", 900))) >>>>> >>>>> >>>>> How can I get it to recognise the "groups" as "g" (Library and >>>>> Kidney) >>>>> >>>>> This is the output for the groups [1] "Kidney" "Liver" "Kidney" >>>>> "Liver" "Liver" "Kidney" "Liver" "Kidney" "Liver" "Kidney" >>>>> >>>>> thank you. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Kind Regards >>>>> >>>>> Tina >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Dr. Thomas J. Hardcastle Department of Plant Sciences University of Cambridge Downing Street Cambridge, CB2 3EA United Kingdom

ADD REPLY • link 13.2 years ago Thomas J Hardcastle ▴ 180

Login before adding your answer.