BitSeq transcript IDs
2
0
Entering edit mode
Peter Glaus ▴ 70
@peter-glaus-5589
Last seen 10.3 years ago
Hi Maayan, if you want to work with files, there should be a file 'A08485_gr4_mini.run1.tr' which has information about transcripts, keeps the same order as .thetaMeans file and the second column contains transcript names. (all BitSeq output keeps the same order of reference sequences) Also, I think the first element of the result 'res_A08485$exp' contains mean expression and variance and should contain the transcript names as well. Try: head(res_A08485$exp) Peter. On 04/04/13 23:12, Maayan Kreitzman wrote: > Hi Peter, > I think I've successfully run BitSeq on a few mini bam files. When I > look in the resulting thetaMeans file, the transcript ID column is > just numbers. Is there a way to have the proper IDs from the > annotation? I'm copying the top of the file: > > # T => Mrows > # M 183985 > # file containing the mean value of theta - realtive abundace of > fragments and counts > # (overall mean, overall counts, mean of saved samples, and mean > from every chain are reported) > # columns: > # <transcriptid> <meanthetaoverall> <meanreadcountoverall> > <meanthetasaved> <chain1mean> <chain2mean> <chain3mean> <chain4mean> > #thetaAct: 0.000000000e+00 0 0.000000000e+00 0.000000000e+00 > 0.000000000e+00 0.000000000e+00 0.000000000e+00 > 1 2.329622950e-06 1 2.336304311e-06 2.384341203e-06 > 2.274251443e-06 2.328343218e-06 2.331555935e-06 > 2 4.591043926e-06 1 4.591127729e-06 4.730587366e-06 > 4.525049801e-06 4.538165770e-06 4.570372768e-06 > 3 2.394661546e-06 1 2.414399085e-06 2.505416202e-06 > 2.208339897e-06 2.389828455e-06 2.475061630e-06 > 4 2.328241183e-06 1 2.353625902e-06 2.319018416e-06 > 2.315240398e-06 2.373995323e-06 2.304710595e-06 > 5 2.300820499e-06 1 2.362273440e-06 2.228251927e-06 > 2.281075630e-06 2.309094754e-06 2.384859685e-06 > 6 2.389348690e-06 1 2.410694464e-06 2.348631392e-06 > 2.522953884e-06 2.443492278e-06 2.242317204e-06 > 7 2.400561999e-06 1 2.419266200e-06 2.475096269e-06 > 2.441005711e-06 2.306151611e-06 2.379994405e-06 > 8 2.307687449e-06 1 2.195070192e-06 2.316539235e-06 > 2.437629395e-06 2.250309118e-06 2.226272048e-06 > 9 2.399691624e-06 1 2.452207949e-06 2.380327529e-06 > 2.509800276e-06 2.263119465e-06 2.445519224e-06 > > > > This is the command that I ran: > >res_A08485 <- getExpression("A08485_gr4_mini.sam", > "/projects/mkreitzman_prj/expression_quantification_testing/testing/ test_data/strand_specific/transcriptome/Homo_sapiens.GRCh37.69.cdna.al l.fa", > + outPrefix="A08485_gr4_mini.run1", log=TRUE, seed=47) > > thanks for your help, > > Maayan > > On 03/26/2013 06:28 AM, Peter Glaus wrote: >> Dear Maayan, >> not sure what is the problem as I was not able to replicate this error. >> Could you please try: >> 1) running with verbose=TRUE option, so there will be a bit more output. >> 2) use option outPrefix. With this option, BitSeq does not use temporary >> directories. >> Examples: assuming your working direcotry in R is: >> /projects/mkreitzman_prj/expression_quantification_testing/testing/ BitSeq >> you could run: >> getExpression("A08485_gr4.sam_mini.sam", >> "../test_data/strand_specific/transcriptome/Homo_sapiens.GRCh37.69. cdna.all.fa", >> outPrefix="A08485_gr4.sam_mini.run1", log=TRUE, seed=47, verbose=TRUE); >> >> Which should create 3 file in the directory: >> A08485_gr4.sam_mini.run1.rpkm, A08485_gr4.sam_mini.run1.thetaMeans and >> A08485_gr4.sam_mini.run1.mean. >> >> (You can also use relative/absolute paths in the outPrefix, just make >> sure the directories exist.) >> >> If the run fails again, you can have a look into the directory and see >> what kind of files were created. >> >> Regards, >> Peter. >> >> On 25/03/13 22:00, Maayan Kreitzman wrote: >>> Hi Peter, >>> I'm still having some difficulty with the first step of BitSeq. >>> I made myself a mini dataset with just 5 million reads so that I could run everything in my R console without writing and R script and submitting to the cluster (just until I know what I'm doing) >>> >>> but, I got this error: >>> Error in getMeanVariance(c(outFile), meanFile, log = log, pretend = pretend) : >>> Conditions: file /tmp/RtmpRHbr1N/A08485_gr4.sam_mini-BS- 27c66ebf4a21.rpkm failed to open. >>> >>> since it seemed to be an issue with the temp directory, I tried to change my TMPDIR in the shell to somewhere with plenty of space: >>> [mkreitzman@xhost09 ~]$ echo $TMPDIR >>> /projects/wtss_scratch/maayan >>> >>> but, this did not make a difference to where the temp files were created. >>> I copied the whole session below. >>> >>> thanks, >>> Maayan >>> >>> >>>> res1 <- getExpression("/projects/mkreitzman_prj/expression_quanti fication_testing/testing/BitSeq/A08485_gr4.sam_mini.sam", "/projects/m kreitzman_prj/expression_quantification_testing/testing/test_data/stra nd_specific/transcriptome/Homo_sapiens.GRCh37.69.cdna.all.fa", >>> + log = TRUE, seed=47) >>> ## Computing alignment probabilities. >>> [time: +1.283333 m] >>> [time: +0.333333 m] >>> [time: +0.000000 m] >>> [time: +1.200000 m] >>> [time: +0.500000 m] >>> [time: +0.000000 m] >>> ## Estimating transcript expression levels. >>> Mappings: 1606295 >>> Ntotal: 2408007 >>> 10000 [time: +0.000000 s] >>> 100000 [time: +0.000000 s] >>> 1000000 [time: +2.000000 s] >>> Finished Reading! >>> Total hits = 3212590 >>> Isoforms: 183986 >>> Burn in: 1000 DONE. [time: +6.633333 m] >>> >>> Sampling DONE. [time: +9.600000 m] >>> rHat (for 1000 samples) >>> rHat (rHat from subset | tid | mean theta) >>> 1.0040 ( 1.0040 | 36651 | 0.0000) >>> 1.0040 ( 1.0072 | 178210 | 0.0000) >>> 1.0036 ( 1.0088 | 148680 | 0.0000) >>> Mean rHat of worst 10 transcripts: 1.003527 >>> Mean C0: (50 50 50 50 ). Nunmap: 801712 >>> >>> Producing 649 final samples. >>> >>> Sampling DONE. [time: +6.833333 m] >>> rHat (for 649 samples) >>> rHat (rHat from subset | tid | mean theta) >>> 1.0061 ( 1.0051 | 157831 | 0.0000) >>> 1.0059 ( 1.0048 | 104363 | 0.0000) >>> 1.0058 ( 1.0068 | 71659 | 0.0000) >>> Mean rHat of worst 10 transcripts: 1.005543 >>> Mean C0: (50 50 50 50 ). Nunmap: 801712 >>> >>> Total samples: 6596 >>> ## Computing means. >>> Error in getMeanVariance(c(outFile), meanFile, log = log, pretend = pretend) : >>> Conditions: file /tmp/RtmpRHbr1N/A08485_gr4.sam_mini-BS- 27c66ebf4a21.rpkm failed to open. >>> ________________________________________ >>> From: Peter Glaus [glaus@cs.man.ac.uk] >>> Sent: Friday, March 22, 2013 5:32 AM >>> To: Maayan Kreitzman >>> Subject: Re: BitSeq getExpression crash >>> >>> Hi Maayan, >>> I believe the error is caused by process running out of memory. I am not >>> 100% sure, but when I saw this kinds of errors before, it was caused by >>> lack of memory. The estimation can be quite CPU and memory intensive, so >>> I advice running it on a computing cluster instead of using regular >>> desktop/notebook machine. >>> >>> Regarding your function call, when running actual analysis (not just >>> testing/trying out), please use higher values for MCMC_burnIn, >>> MCMC_samplesN and MCMC_samplesSave (the default when leaving these blank >>> is 1000 and is usually "good enough"), the computation will take longer, >>> however the estimates will be much more accurate as well. >>> (The values 200, 200, 50 are used in the vignette because the example >>> data is very small, and the vignette has to run within time limit.) >>> >>> Also, for future reference when you have questions regarding >>> Bioconductor packages, please post to Bioconductor user mailing list >>> (and CC package author), as you might sometimes get replies from other >>> users and also your post might help some other users if they encounter >>> similar problem in the future. >>> >>> Best regards, >>> Peter. >>> >>> On 21/03/13 21:46, Maayan Kreitzman wrote: >>>> Dear Peter, >>>> I'm trying to run BitSeq, and am running into a problem after several hours of the getExpression function running. >>>> This same thing happened twice, on different servers. What weird is that not only does the function crash, it actually exits R. >>>> this is the error message: >>>> >>>> terminate called after throwing an instance of 'std::bad_alloc' >>>> what(): St9bad_alloc >>>> Aborted >>>> >>>> I have no experience whatsoever with R, so this may be a novice mistake, but your help would be greatly appreciated. >>>> I've copied the whole session below. >>>> >>>> thanks in advance, >>>> Maayan >>>> >>>> >>>>> library("BitSeq") >>>> Loading required package: Rsamtools >>>> Loading required package: IRanges >>>> Loading required package: BiocGenerics >>>> >>>> Attaching package: ‘BiocGenerics’ >>>> >>>> The following object(s) are masked from ‘package:stats’: >>>> >>>> xtabs >>>> >>>> The following object(s) are masked from ‘package:base’: >>>> >>>> anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find, >>>> get, intersect, lapply, Map, mapply, mget, order, paste, pmax, >>>> pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int, >>>> rownames, sapply, setdiff, table, tapply, union, unique >>>> >>>> Loading required package: GenomicRanges >>>> Loading required package: Biostrings >>>> Loading required package: zlibbioc >>>>> res1 <- getExpression("/projects/mkreitzman_prj/expression_quant ification_testing/testing/test_data/strand_specific/transcriptome/bowt ie2transcriptome/A08473_gr4.sam", >>>> + "/projects/mkreitzman_prj/expression_quantification_testing/tes ting/test_data/strand_specific/transcriptome/Homo_sapiens.GRCh37.69.cd na.all.fa", >>>> + log = TRUE, MCMC_burnIn=200,MCMC_samplesN=200,MCMC_samplesSave=50,seed=47) >>>> ## Computing alignment probabilities. >>>> [time: +1.866667 m] >>>> [time: +36.400000 m] >>>> [time: +0.000000 m] >>>> [time: +117.050000 m] >>>> [time: +0.500000 m] >>>> [time: +0.000000 m] >>>> ## Estimating transcript expression levels. >>>> Mappings: 71092830 >>>> Ntotal: 123098679 >>>> 10000 [time: +1.000000 s] >>>> 100000 [time: +0.000000 s] >>>> 1000000 [time: +3.000000 s] >>>> 10000000 [time: +25.000000 s] >>>> Read only 14186178 reads. >>>> Finished Reading! >>>> Total hits = 28372355 >>>> Isoforms: 183985 >>>> Burn in: 200 DONE. [time: +12.016667 m] >>>> >>>> Sampling DONE. [time: +12.850000 m] >>>> rHat (for 200 samples) >>>> rHat (rHat from subset | tid | mean theta) >>>> 1.0252 ( 1.1173 | 89080 | 0.0000) >>>> 1.0216 ( 1.1351 | 126802 | 0.0000) >>>> 1.0183 ( 1.0151 | 183201 | 0.0000) >>>> Mean rHat of worst 10 transcripts: 1.018596 >>>> Mean C0: (3516 3520 3529 3518 ). Nunmap: 52005849 >>>> >>>> Producing 33 final samples. >>>> >>>> Sampling DONE. [time: +2.166667 m] >>>> rHat (for 33 samples) >>>> rHat (rHat from subset | tid | mean theta) >>>> 1.1193 ( 1.1332 | 117458 | 0.0000) >>>> 1.1181 ( 1.1229 | 158878 | 0.0000) >>>> 1.1074 ( 1.1004 | 43840 | 0.0000) >>>> Mean rHat of worst 10 transcripts: 1.108279 >>>> Mean C0: (3528 3512 3523 3520 ). Nunmap: 52005849 >>>> >>>> Total samples: 932 >>>> terminate called after throwing an instance of 'std::bad_alloc' >>>> what(): St9bad_alloc >>>> Aborted > [[alternative HTML version deleted]]
Alignment PROcess BitSeq Alignment PROcess BitSeq • 1.4k views
ADD COMMENT
0
Entering edit mode
@maayan-kreitzman-5853
Last seen 10.3 years ago
Hi Peter, I think I've successfully run BitSeq on a few mini bam files. When I look in the resulting thetaMeans file, the transcript ID column is just numbers. Is there a way to have the proper IDs from the annotation? I'm copying the top of the file: # T => Mrows # M 183985 # file containing the mean value of theta - realtive abundace of fragments and counts # (overall mean, overall counts, mean of saved samples, and mean from every chain are reported) # columns: # <transcriptid> <meanthetaoverall> <meanreadcountoverall> <meanthetasaved> <chain1mean> <chain2mean> <chain3mean> <chain4mean> #thetaAct: 0.000000000e+00 0 0.000000000e+00 0.000000000e+00 0.000000000e+00 0.000000000e+00 0.000000000e+00 1 2.329622950e-06 1 2.336304311e-06 2.384341203e-06 2.274251443e-06 2.328343218e-06 2.331555935e-06 2 4.591043926e-06 1 4.591127729e-06 4.730587366e-06 4.525049801e-06 4.538165770e-06 4.570372768e-06 3 2.394661546e-06 1 2.414399085e-06 2.505416202e-06 2.208339897e-06 2.389828455e-06 2.475061630e-06 4 2.328241183e-06 1 2.353625902e-06 2.319018416e-06 2.315240398e-06 2.373995323e-06 2.304710595e-06 5 2.300820499e-06 1 2.362273440e-06 2.228251927e-06 2.281075630e-06 2.309094754e-06 2.384859685e-06 6 2.389348690e-06 1 2.410694464e-06 2.348631392e-06 2.522953884e-06 2.443492278e-06 2.242317204e-06 7 2.400561999e-06 1 2.419266200e-06 2.475096269e-06 2.441005711e-06 2.306151611e-06 2.379994405e-06 8 2.307687449e-06 1 2.195070192e-06 2.316539235e-06 2.437629395e-06 2.250309118e-06 2.226272048e-06 9 2.399691624e-06 1 2.452207949e-06 2.380327529e-06 2.509800276e-06 2.263119465e-06 2.445519224e-06 This is the command that I ran: >res_A08485 <- getExpression("A08485_gr4_mini.sam", "/projects/mkreitzman_prj/expression_quantification_testing/testing/te st_data/strand_specific/transcriptome/Homo_sapiens.GRCh37.69.cdna.all. fa", + outPrefix="A08485_gr4_mini.run1", log=TRUE, seed=47) thanks for your help, Maayan On 03/26/2013 06:28 AM, Peter Glaus wrote: > Dear Maayan, > not sure what is the problem as I was not able to replicate this error. > Could you please try: > 1) running with verbose=TRUE option, so there will be a bit more output. > 2) use option outPrefix. With this option, BitSeq does not use temporary > directories. > Examples: assuming your working direcotry in R is: > /projects/mkreitzman_prj/expression_quantification_testing/testing/B itSeq > you could run: > getExpression("A08485_gr4.sam_mini.sam", > "../test_data/strand_specific/transcriptome/Homo_sapiens.GRCh37.69.c dna.all.fa", > outPrefix="A08485_gr4.sam_mini.run1", log=TRUE, seed=47, verbose=TRUE); > > Which should create 3 file in the directory: > A08485_gr4.sam_mini.run1.rpkm, A08485_gr4.sam_mini.run1.thetaMeans and > A08485_gr4.sam_mini.run1.mean. > > (You can also use relative/absolute paths in the outPrefix, just make > sure the directories exist.) > > If the run fails again, you can have a look into the directory and see > what kind of files were created. > > Regards, > Peter. > > On 25/03/13 22:00, Maayan Kreitzman wrote: >> Hi Peter, >> I'm still having some difficulty with the first step of BitSeq. >> I made myself a mini dataset with just 5 million reads so that I could run everything in my R console without writing and R script and submitting to the cluster (just until I know what I'm doing) >> >> but, I got this error: >> Error in getMeanVariance(c(outFile), meanFile, log = log, pretend = pretend) : >> Conditions: file /tmp/RtmpRHbr1N/A08485_gr4.sam_mini-BS- 27c66ebf4a21.rpkm failed to open. >> >> since it seemed to be an issue with the temp directory, I tried to change my TMPDIR in the shell to somewhere with plenty of space: >> [mkreitzman@xhost09 ~]$ echo $TMPDIR >> /projects/wtss_scratch/maayan >> >> but, this did not make a difference to where the temp files were created. >> I copied the whole session below. >> >> thanks, >> Maayan >> >> >>> res1 <- getExpression("/projects/mkreitzman_prj/expression_quantif ication_testing/testing/BitSeq/A08485_gr4.sam_mini.sam", "/projects/mk reitzman_prj/expression_quantification_testing/testing/test_data/stran d_specific/transcriptome/Homo_sapiens.GRCh37.69.cdna.all.fa", >> + log = TRUE, seed=47) >> ## Computing alignment probabilities. >> [time: +1.283333 m] >> [time: +0.333333 m] >> [time: +0.000000 m] >> [time: +1.200000 m] >> [time: +0.500000 m] >> [time: +0.000000 m] >> ## Estimating transcript expression levels. >> Mappings: 1606295 >> Ntotal: 2408007 >> 10000 [time: +0.000000 s] >> 100000 [time: +0.000000 s] >> 1000000 [time: +2.000000 s] >> Finished Reading! >> Total hits = 3212590 >> Isoforms: 183986 >> Burn in: 1000 DONE. [time: +6.633333 m] >> >> Sampling DONE. [time: +9.600000 m] >> rHat (for 1000 samples) >> rHat (rHat from subset | tid | mean theta) >> 1.0040 ( 1.0040 | 36651 | 0.0000) >> 1.0040 ( 1.0072 | 178210 | 0.0000) >> 1.0036 ( 1.0088 | 148680 | 0.0000) >> Mean rHat of worst 10 transcripts: 1.003527 >> Mean C0: (50 50 50 50 ). Nunmap: 801712 >> >> Producing 649 final samples. >> >> Sampling DONE. [time: +6.833333 m] >> rHat (for 649 samples) >> rHat (rHat from subset | tid | mean theta) >> 1.0061 ( 1.0051 | 157831 | 0.0000) >> 1.0059 ( 1.0048 | 104363 | 0.0000) >> 1.0058 ( 1.0068 | 71659 | 0.0000) >> Mean rHat of worst 10 transcripts: 1.005543 >> Mean C0: (50 50 50 50 ). Nunmap: 801712 >> >> Total samples: 6596 >> ## Computing means. >> Error in getMeanVariance(c(outFile), meanFile, log = log, pretend = pretend) : >> Conditions: file /tmp/RtmpRHbr1N/A08485_gr4.sam_mini-BS- 27c66ebf4a21.rpkm failed to open. >> ________________________________________ >> From: Peter Glaus [glaus@cs.man.ac.uk] >> Sent: Friday, March 22, 2013 5:32 AM >> To: Maayan Kreitzman >> Subject: Re: BitSeq getExpression crash >> >> Hi Maayan, >> I believe the error is caused by process running out of memory. I am not >> 100% sure, but when I saw this kinds of errors before, it was caused by >> lack of memory. The estimation can be quite CPU and memory intensive, so >> I advice running it on a computing cluster instead of using regular >> desktop/notebook machine. >> >> Regarding your function call, when running actual analysis (not just >> testing/trying out), please use higher values for MCMC_burnIn, >> MCMC_samplesN and MCMC_samplesSave (the default when leaving these blank >> is 1000 and is usually "good enough"), the computation will take longer, >> however the estimates will be much more accurate as well. >> (The values 200, 200, 50 are used in the vignette because the example >> data is very small, and the vignette has to run within time limit.) >> >> Also, for future reference when you have questions regarding >> Bioconductor packages, please post to Bioconductor user mailing list >> (and CC package author), as you might sometimes get replies from other >> users and also your post might help some other users if they encounter >> similar problem in the future. >> >> Best regards, >> Peter. >> >> On 21/03/13 21:46, Maayan Kreitzman wrote: >>> Dear Peter, >>> I'm trying to run BitSeq, and am running into a problem after several hours of the getExpression function running. >>> This same thing happened twice, on different servers. What weird is that not only does the function crash, it actually exits R. >>> this is the error message: >>> >>> terminate called after throwing an instance of 'std::bad_alloc' >>> what(): St9bad_alloc >>> Aborted >>> >>> I have no experience whatsoever with R, so this may be a novice mistake, but your help would be greatly appreciated. >>> I've copied the whole session below. >>> >>> thanks in advance, >>> Maayan >>> >>> >>>> library("BitSeq") >>> Loading required package: Rsamtools >>> Loading required package: IRanges >>> Loading required package: BiocGenerics >>> >>> Attaching package: ‘BiocGenerics’ >>> >>> The following object(s) are masked from ‘package:stats’: >>> >>> xtabs >>> >>> The following object(s) are masked from ‘package:base’: >>> >>> anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find, >>> get, intersect, lapply, Map, mapply, mget, order, paste, pmax, >>> pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int, >>> rownames, sapply, setdiff, table, tapply, union, unique >>> >>> Loading required package: GenomicRanges >>> Loading required package: Biostrings >>> Loading required package: zlibbioc >>>> res1 <- getExpression("/projects/mkreitzman_prj/expression_quanti fication_testing/testing/test_data/strand_specific/transcriptome/bowti e2transcriptome/A08473_gr4.sam", >>> + "/projects/mkreitzman_prj/expression_quantification_testing/test ing/test_data/strand_specific/transcriptome/Homo_sapiens.GRCh37.69.cdn a.all.fa", >>> + log = TRUE, MCMC_burnIn=200,MCMC_samplesN=200,MCMC_samplesSave=50,seed=47) >>> ## Computing alignment probabilities. >>> [time: +1.866667 m] >>> [time: +36.400000 m] >>> [time: +0.000000 m] >>> [time: +117.050000 m] >>> [time: +0.500000 m] >>> [time: +0.000000 m] >>> ## Estimating transcript expression levels. >>> Mappings: 71092830 >>> Ntotal: 123098679 >>> 10000 [time: +1.000000 s] >>> 100000 [time: +0.000000 s] >>> 1000000 [time: +3.000000 s] >>> 10000000 [time: +25.000000 s] >>> Read only 14186178 reads. >>> Finished Reading! >>> Total hits = 28372355 >>> Isoforms: 183985 >>> Burn in: 200 DONE. [time: +12.016667 m] >>> >>> Sampling DONE. [time: +12.850000 m] >>> rHat (for 200 samples) >>> rHat (rHat from subset | tid | mean theta) >>> 1.0252 ( 1.1173 | 89080 | 0.0000) >>> 1.0216 ( 1.1351 | 126802 | 0.0000) >>> 1.0183 ( 1.0151 | 183201 | 0.0000) >>> Mean rHat of worst 10 transcripts: 1.018596 >>> Mean C0: (3516 3520 3529 3518 ). Nunmap: 52005849 >>> >>> Producing 33 final samples. >>> >>> Sampling DONE. [time: +2.166667 m] >>> rHat (for 33 samples) >>> rHat (rHat from subset | tid | mean theta) >>> 1.1193 ( 1.1332 | 117458 | 0.0000) >>> 1.1181 ( 1.1229 | 158878 | 0.0000) >>> 1.1074 ( 1.1004 | 43840 | 0.0000) >>> Mean rHat of worst 10 transcripts: 1.108279 >>> Mean C0: (3528 3512 3523 3520 ). Nunmap: 52005849 >>> >>> Total samples: 932 >>> terminate called after throwing an instance of 'std::bad_alloc' >>> what(): St9bad_alloc >>> Aborted [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@maayan-kreitzman-5853
Last seen 10.3 years ago
Hi Peter, thanks for that. head(res_A08485$exp) is great. BitSeq does produce the .tr files for each of the input libraries. Another question, regarding getDE: which tr file should I provide to getDE? all of them? how do I specify them? Also, I got the error below when running getDE. The libraries are just the first 1 million reads from name-sorted alignments from bowtie2, so they are quite small (I've noticed the beginning of name-sorted files have a lot of unaligned reads in them). Not sure if that may have caused a problem. I looked for " lowess-f" in the manual but didn't find anything. thanks again, maayan > group4Files = c(res_A08473$fn,res_A08485$fn) > group3Files = c(res_A08606$fn,res_A08616$fn) > de1 <- getDE(list(group3Files, group4Files), samples=TRUE) ## No trInfoFile provided, will try using A08606_gr3_mini.run1.tr for result's row names. Computing overall mean. Estimating hyperparameters. [time: +1.000000 s] Error in estimateHyperPar(parFile, conditions, meanFile = meanFile, norm = norm, : Main: Unable to produce smooth beta >0. Try adjusting the parameter lowess-f. On 04/05/2013 03:06 AM, Peter Glaus wrote: > Hi Maayan, > if you want to work with files, there should be a file > 'A08485_gr4_mini.run1.tr' which has information about transcripts, > keeps the same order as .thetaMeans file and the second column > contains transcript names. (all BitSeq output keeps the same order of > reference sequences) > > Also, I think the first element of the result 'res_A08485$exp' > contains mean expression and variance and should contain the > transcript names as well. Try: > head(res_A08485$exp) > > Peter. > > On 04/04/13 23:12, Maayan Kreitzman wrote: >> Hi Peter, >> I think I've successfully run BitSeq on a few mini bam files. When I >> look in the resulting thetaMeans file, the transcript ID column is >> just numbers. Is there a way to have the proper IDs from the >> annotation? I'm copying the top of the file: >> >> # T => Mrows >> # M 183985 >> # file containing the mean value of theta - realtive abundace of >> fragments and counts >> # (overall mean, overall counts, mean of saved samples, and mean >> from every chain are reported) >> # columns: >> # <transcriptid> <meanthetaoverall> <meanreadcountoverall> >> <meanthetasaved> <chain1mean> <chain2mean> <chain3mean> <chain4mean> >> #thetaAct: 0.000000000e+00 0 0.000000000e+00 0.000000000e+00 >> 0.000000000e+00 0.000000000e+00 0.000000000e+00 >> 1 2.329622950e-06 1 2.336304311e-06 2.384341203e-06 >> 2.274251443e-06 2.328343218e-06 2.331555935e-06 >> 2 4.591043926e-06 1 4.591127729e-06 4.730587366e-06 >> 4.525049801e-06 4.538165770e-06 4.570372768e-06 >> 3 2.394661546e-06 1 2.414399085e-06 2.505416202e-06 >> 2.208339897e-06 2.389828455e-06 2.475061630e-06 >> 4 2.328241183e-06 1 2.353625902e-06 2.319018416e-06 >> 2.315240398e-06 2.373995323e-06 2.304710595e-06 >> 5 2.300820499e-06 1 2.362273440e-06 2.228251927e-06 >> 2.281075630e-06 2.309094754e-06 2.384859685e-06 >> 6 2.389348690e-06 1 2.410694464e-06 2.348631392e-06 >> 2.522953884e-06 2.443492278e-06 2.242317204e-06 >> 7 2.400561999e-06 1 2.419266200e-06 2.475096269e-06 >> 2.441005711e-06 2.306151611e-06 2.379994405e-06 >> 8 2.307687449e-06 1 2.195070192e-06 2.316539235e-06 >> 2.437629395e-06 2.250309118e-06 2.226272048e-06 >> 9 2.399691624e-06 1 2.452207949e-06 2.380327529e-06 >> 2.509800276e-06 2.263119465e-06 2.445519224e-06 >> >> >> >> This is the command that I ran: >> >res_A08485 <- getExpression("A08485_gr4_mini.sam", >> "/projects/mkreitzman_prj/expression_quantification_testing/testing /test_data/strand_specific/transcriptome/Homo_sapiens.GRCh37.69.cdna.a ll.fa", >> + outPrefix="A08485_gr4_mini.run1", log=TRUE, seed=47) >> >> thanks for your help, >> >> Maayan >> >> On 03/26/2013 06:28 AM, Peter Glaus wrote: >>> Dear Maayan, >>> not sure what is the problem as I was not able to replicate this error. >>> Could you please try: >>> 1) running with verbose=TRUE option, so there will be a bit more output. >>> 2) use option outPrefix. With this option, BitSeq does not use temporary >>> directories. >>> Examples: assuming your working direcotry in R is: >>> /projects/mkreitzman_prj/expression_quantification_testing/testing /BitSeq >>> you could run: >>> getExpression("A08485_gr4.sam_mini.sam", >>> "../test_data/strand_specific/transcriptome/Homo_sapiens.GRCh37.69 .cdna.all.fa", >>> outPrefix="A08485_gr4.sam_mini.run1", log=TRUE, seed=47, verbose=TRUE); >>> >>> Which should create 3 file in the directory: >>> A08485_gr4.sam_mini.run1.rpkm, A08485_gr4.sam_mini.run1.thetaMeans and >>> A08485_gr4.sam_mini.run1.mean. >>> >>> (You can also use relative/absolute paths in the outPrefix, just make >>> sure the directories exist.) >>> >>> If the run fails again, you can have a look into the directory and see >>> what kind of files were created. >>> >>> Regards, >>> Peter. >>> >>> On 25/03/13 22:00, Maayan Kreitzman wrote: >>>> Hi Peter, >>>> I'm still having some difficulty with the first step of BitSeq. >>>> I made myself a mini dataset with just 5 million reads so that I could run everything in my R console without writing and R script and submitting to the cluster (just until I know what I'm doing) >>>> >>>> but, I got this error: >>>> Error in getMeanVariance(c(outFile), meanFile, log = log, pretend = pretend) : >>>> Conditions: file /tmp/RtmpRHbr1N/A08485_gr4.sam_mini-BS- 27c66ebf4a21.rpkm failed to open. >>>> >>>> since it seemed to be an issue with the temp directory, I tried to change my TMPDIR in the shell to somewhere with plenty of space: >>>> [mkreitzman@xhost09 ~]$ echo $TMPDIR >>>> /projects/wtss_scratch/maayan >>>> >>>> but, this did not make a difference to where the temp files were created. >>>> I copied the whole session below. >>>> >>>> thanks, >>>> Maayan >>>> >>>> >>>>> res1 <- getExpression("/projects/mkreitzman_prj/expression_quant ification_testing/testing/BitSeq/A08485_gr4.sam_mini.sam", "/projects/ mkreitzman_prj/expression_quantification_testing/testing/test_data/str and_specific/transcriptome/Homo_sapiens.GRCh37.69.cdna.all.fa", >>>> + log = TRUE, seed=47) >>>> ## Computing alignment probabilities. >>>> [time: +1.283333 m] >>>> [time: +0.333333 m] >>>> [time: +0.000000 m] >>>> [time: +1.200000 m] >>>> [time: +0.500000 m] >>>> [time: +0.000000 m] >>>> ## Estimating transcript expression levels. >>>> Mappings: 1606295 >>>> Ntotal: 2408007 >>>> 10000 [time: +0.000000 s] >>>> 100000 [time: +0.000000 s] >>>> 1000000 [time: +2.000000 s] >>>> Finished Reading! >>>> Total hits = 3212590 >>>> Isoforms: 183986 >>>> Burn in: 1000 DONE. [time: +6.633333 m] >>>> >>>> Sampling DONE. [time: +9.600000 m] >>>> rHat (for 1000 samples) >>>> rHat (rHat from subset | tid | mean theta) >>>> 1.0040 ( 1.0040 | 36651 | 0.0000) >>>> 1.0040 ( 1.0072 | 178210 | 0.0000) >>>> 1.0036 ( 1.0088 | 148680 | 0.0000) >>>> Mean rHat of worst 10 transcripts: 1.003527 >>>> Mean C0: (50 50 50 50 ). Nunmap: 801712 >>>> >>>> Producing 649 final samples. >>>> >>>> Sampling DONE. [time: +6.833333 m] >>>> rHat (for 649 samples) >>>> rHat (rHat from subset | tid | mean theta) >>>> 1.0061 ( 1.0051 | 157831 | 0.0000) >>>> 1.0059 ( 1.0048 | 104363 | 0.0000) >>>> 1.0058 ( 1.0068 | 71659 | 0.0000) >>>> Mean rHat of worst 10 transcripts: 1.005543 >>>> Mean C0: (50 50 50 50 ). Nunmap: 801712 >>>> >>>> Total samples: 6596 >>>> ## Computing means. >>>> Error in getMeanVariance(c(outFile), meanFile, log = log, pretend = pretend) : >>>> Conditions: file /tmp/RtmpRHbr1N/A08485_gr4.sam_mini-BS- 27c66ebf4a21.rpkm failed to open. >>>> ________________________________________ >>>> From: Peter Glaus [glaus@cs.man.ac.uk] >>>> Sent: Friday, March 22, 2013 5:32 AM >>>> To: Maayan Kreitzman >>>> Subject: Re: BitSeq getExpression crash >>>> >>>> Hi Maayan, >>>> I believe the error is caused by process running out of memory. I am not >>>> 100% sure, but when I saw this kinds of errors before, it was caused by >>>> lack of memory. The estimation can be quite CPU and memory intensive, so >>>> I advice running it on a computing cluster instead of using regular >>>> desktop/notebook machine. >>>> >>>> Regarding your function call, when running actual analysis (not just >>>> testing/trying out), please use higher values for MCMC_burnIn, >>>> MCMC_samplesN and MCMC_samplesSave (the default when leaving these blank >>>> is 1000 and is usually "good enough"), the computation will take longer, >>>> however the estimates will be much more accurate as well. >>>> (The values 200, 200, 50 are used in the vignette because the example >>>> data is very small, and the vignette has to run within time limit.) >>>> >>>> Also, for future reference when you have questions regarding >>>> Bioconductor packages, please post to Bioconductor user mailing list >>>> (and CC package author), as you might sometimes get replies from other >>>> users and also your post might help some other users if they encounter >>>> similar problem in the future. >>>> >>>> Best regards, >>>> Peter. >>>> >>>> On 21/03/13 21:46, Maayan Kreitzman wrote: >>>>> Dear Peter, >>>>> I'm trying to run BitSeq, and am running into a problem after several hours of the getExpression function running. >>>>> This same thing happened twice, on different servers. What weird is that not only does the function crash, it actually exits R. >>>>> this is the error message: >>>>> >>>>> terminate called after throwing an instance of 'std::bad_alloc' >>>>> what(): St9bad_alloc >>>>> Aborted >>>>> >>>>> I have no experience whatsoever with R, so this may be a novice mistake, but your help would be greatly appreciated. >>>>> I've copied the whole session below. >>>>> >>>>> thanks in advance, >>>>> Maayan >>>>> >>>>> >>>>>> library("BitSeq") >>>>> Loading required package: Rsamtools >>>>> Loading required package: IRanges >>>>> Loading required package: BiocGenerics >>>>> >>>>> Attaching package: ‘BiocGenerics’ >>>>> >>>>> The following object(s) are masked from ‘package:stats’: >>>>> >>>>> xtabs >>>>> >>>>> The following object(s) are masked from ‘package:base’: >>>>> >>>>> anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find, >>>>> get, intersect, lapply, Map, mapply, mget, order, paste, pmax, >>>>> pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int, >>>>> rownames, sapply, setdiff, table, tapply, union, unique >>>>> >>>>> Loading required package: GenomicRanges >>>>> Loading required package: Biostrings >>>>> Loading required package: zlibbioc >>>>>> res1 <- getExpression("/projects/mkreitzman_prj/expression_quan tification_testing/testing/test_data/strand_specific/transcriptome/bow tie2transcriptome/A08473_gr4.sam", >>>>> + "/projects/mkreitzman_prj/expression_quantification_testing/te sting/test_data/strand_specific/transcriptome/Homo_sapiens.GRCh37.69.c dna.all.fa", >>>>> + log = TRUE, MCMC_burnIn=200,MCMC_samplesN=200,MCMC_samplesSave=50,seed=47) >>>>> ## Computing alignment probabilities. >>>>> [time: +1.866667 m] >>>>> [time: +36.400000 m] >>>>> [time: +0.000000 m] >>>>> [time: +117.050000 m] >>>>> [time: +0.500000 m] >>>>> [time: +0.000000 m] >>>>> ## Estimating transcript expression levels. >>>>> Mappings: 71092830 >>>>> Ntotal: 123098679 >>>>> 10000 [time: +1.000000 s] >>>>> 100000 [time: +0.000000 s] >>>>> 1000000 [time: +3.000000 s] >>>>> 10000000 [time: +25.000000 s] >>>>> Read only 14186178 reads. >>>>> Finished Reading! >>>>> Total hits = 28372355 >>>>> Isoforms: 183985 >>>>> Burn in: 200 DONE. [time: +12.016667 m] >>>>> >>>>> Sampling DONE. [time: +12.850000 m] >>>>> rHat (for 200 samples) >>>>> rHat (rHat from subset | tid | mean theta) >>>>> 1.0252 ( 1.1173 | 89080 | 0.0000) >>>>> 1.0216 ( 1.1351 | 126802 | 0.0000) >>>>> 1.0183 ( 1.0151 | 183201 | 0.0000) >>>>> Mean rHat of worst 10 transcripts: 1.018596 >>>>> Mean C0: (3516 3520 3529 3518 ). Nunmap: 52005849 >>>>> >>>>> Producing 33 final samples. >>>>> >>>>> Sampling DONE. [time: +2.166667 m] >>>>> rHat (for 33 samples) >>>>> rHat (rHat from subset | tid | mean theta) >>>>> 1.1193 ( 1.1332 | 117458 | 0.0000) >>>>> 1.1181 ( 1.1229 | 158878 | 0.0000) >>>>> 1.1074 ( 1.1004 | 43840 | 0.0000) >>>>> Mean rHat of worst 10 transcripts: 1.108279 >>>>> Mean C0: (3528 3512 3523 3520 ). Nunmap: 52005849 >>>>> >>>>> Total samples: 932 >>>>> terminate called after throwing an instance of 'std::bad_alloc' >>>>> what(): St9bad_alloc >>>>> Aborted >> > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 765 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6