Entering edit mode
Peter Glaus
▴
70
@peter-glaus-5589
Last seen 10.3 years ago
Hi Maayan,
if you want to work with files, there should be a file
'A08485_gr4_mini.run1.tr' which has information about transcripts,
keeps
the same order as .thetaMeans file and the second column contains
transcript names. (all BitSeq output keeps the same order of reference
sequences)
Also, I think the first element of the result 'res_A08485$exp'
contains
mean expression and variance and should contain the transcript names
as
well. Try:
head(res_A08485$exp)
Peter.
On 04/04/13 23:12, Maayan Kreitzman wrote:
> Hi Peter,
> I think I've successfully run BitSeq on a few mini bam files. When I
> look in the resulting thetaMeans file, the transcript ID column is
> just numbers. Is there a way to have the proper IDs from the
> annotation? I'm copying the top of the file:
>
> # T => Mrows
> # M 183985
> # file containing the mean value of theta - realtive abundace of
> fragments and counts
> # (overall mean, overall counts, mean of saved samples, and mean
> from every chain are reported)
> # columns:
> # <transcriptid> <meanthetaoverall> <meanreadcountoverall>
> <meanthetasaved> <chain1mean> <chain2mean> <chain3mean>
<chain4mean>
> #thetaAct: 0.000000000e+00 0 0.000000000e+00 0.000000000e+00
> 0.000000000e+00 0.000000000e+00 0.000000000e+00
> 1 2.329622950e-06 1 2.336304311e-06 2.384341203e-06
> 2.274251443e-06 2.328343218e-06 2.331555935e-06
> 2 4.591043926e-06 1 4.591127729e-06 4.730587366e-06
> 4.525049801e-06 4.538165770e-06 4.570372768e-06
> 3 2.394661546e-06 1 2.414399085e-06 2.505416202e-06
> 2.208339897e-06 2.389828455e-06 2.475061630e-06
> 4 2.328241183e-06 1 2.353625902e-06 2.319018416e-06
> 2.315240398e-06 2.373995323e-06 2.304710595e-06
> 5 2.300820499e-06 1 2.362273440e-06 2.228251927e-06
> 2.281075630e-06 2.309094754e-06 2.384859685e-06
> 6 2.389348690e-06 1 2.410694464e-06 2.348631392e-06
> 2.522953884e-06 2.443492278e-06 2.242317204e-06
> 7 2.400561999e-06 1 2.419266200e-06 2.475096269e-06
> 2.441005711e-06 2.306151611e-06 2.379994405e-06
> 8 2.307687449e-06 1 2.195070192e-06 2.316539235e-06
> 2.437629395e-06 2.250309118e-06 2.226272048e-06
> 9 2.399691624e-06 1 2.452207949e-06 2.380327529e-06
> 2.509800276e-06 2.263119465e-06 2.445519224e-06
>
>
>
> This is the command that I ran:
> >res_A08485 <- getExpression("A08485_gr4_mini.sam",
> "/projects/mkreitzman_prj/expression_quantification_testing/testing/
test_data/strand_specific/transcriptome/Homo_sapiens.GRCh37.69.cdna.al
l.fa",
> + outPrefix="A08485_gr4_mini.run1", log=TRUE, seed=47)
>
> thanks for your help,
>
> Maayan
>
> On 03/26/2013 06:28 AM, Peter Glaus wrote:
>> Dear Maayan,
>> not sure what is the problem as I was not able to replicate this
error.
>> Could you please try:
>> 1) running with verbose=TRUE option, so there will be a bit more
output.
>> 2) use option outPrefix. With this option, BitSeq does not use
temporary
>> directories.
>> Examples: assuming your working direcotry in R is:
>> /projects/mkreitzman_prj/expression_quantification_testing/testing/
BitSeq
>> you could run:
>> getExpression("A08485_gr4.sam_mini.sam",
>> "../test_data/strand_specific/transcriptome/Homo_sapiens.GRCh37.69.
cdna.all.fa",
>> outPrefix="A08485_gr4.sam_mini.run1", log=TRUE, seed=47,
verbose=TRUE);
>>
>> Which should create 3 file in the directory:
>> A08485_gr4.sam_mini.run1.rpkm, A08485_gr4.sam_mini.run1.thetaMeans
and
>> A08485_gr4.sam_mini.run1.mean.
>>
>> (You can also use relative/absolute paths in the outPrefix, just
make
>> sure the directories exist.)
>>
>> If the run fails again, you can have a look into the directory and
see
>> what kind of files were created.
>>
>> Regards,
>> Peter.
>>
>> On 25/03/13 22:00, Maayan Kreitzman wrote:
>>> Hi Peter,
>>> I'm still having some difficulty with the first step of BitSeq.
>>> I made myself a mini dataset with just 5 million reads so that I
could run everything in my R console without writing and R script and
submitting to the cluster (just until I know what I'm doing)
>>>
>>> but, I got this error:
>>> Error in getMeanVariance(c(outFile), meanFile, log = log, pretend
= pretend) :
>>> Conditions: file /tmp/RtmpRHbr1N/A08485_gr4.sam_mini-BS-
27c66ebf4a21.rpkm failed to open.
>>>
>>> since it seemed to be an issue with the temp directory, I tried to
change my TMPDIR in the shell to somewhere with plenty of space:
>>> [mkreitzman@xhost09 ~]$ echo $TMPDIR
>>> /projects/wtss_scratch/maayan
>>>
>>> but, this did not make a difference to where the temp files were
created.
>>> I copied the whole session below.
>>>
>>> thanks,
>>> Maayan
>>>
>>>
>>>> res1 <- getExpression("/projects/mkreitzman_prj/expression_quanti
fication_testing/testing/BitSeq/A08485_gr4.sam_mini.sam", "/projects/m
kreitzman_prj/expression_quantification_testing/testing/test_data/stra
nd_specific/transcriptome/Homo_sapiens.GRCh37.69.cdna.all.fa",
>>> + log = TRUE, seed=47)
>>> ## Computing alignment probabilities.
>>> [time: +1.283333 m]
>>> [time: +0.333333 m]
>>> [time: +0.000000 m]
>>> [time: +1.200000 m]
>>> [time: +0.500000 m]
>>> [time: +0.000000 m]
>>> ## Estimating transcript expression levels.
>>> Mappings: 1606295
>>> Ntotal: 2408007
>>> 10000 [time: +0.000000 s]
>>> 100000 [time: +0.000000 s]
>>> 1000000 [time: +2.000000 s]
>>> Finished Reading!
>>> Total hits = 3212590
>>> Isoforms: 183986
>>> Burn in: 1000 DONE. [time: +6.633333 m]
>>>
>>> Sampling DONE. [time: +9.600000 m]
>>> rHat (for 1000 samples)
>>> rHat (rHat from subset | tid | mean theta)
>>> 1.0040 ( 1.0040 | 36651 | 0.0000)
>>> 1.0040 ( 1.0072 | 178210 | 0.0000)
>>> 1.0036 ( 1.0088 | 148680 | 0.0000)
>>> Mean rHat of worst 10 transcripts: 1.003527
>>> Mean C0: (50 50 50 50 ). Nunmap: 801712
>>>
>>> Producing 649 final samples.
>>>
>>> Sampling DONE. [time: +6.833333 m]
>>> rHat (for 649 samples)
>>> rHat (rHat from subset | tid | mean theta)
>>> 1.0061 ( 1.0051 | 157831 | 0.0000)
>>> 1.0059 ( 1.0048 | 104363 | 0.0000)
>>> 1.0058 ( 1.0068 | 71659 | 0.0000)
>>> Mean rHat of worst 10 transcripts: 1.005543
>>> Mean C0: (50 50 50 50 ). Nunmap: 801712
>>>
>>> Total samples: 6596
>>> ## Computing means.
>>> Error in getMeanVariance(c(outFile), meanFile, log = log, pretend
= pretend) :
>>> Conditions: file /tmp/RtmpRHbr1N/A08485_gr4.sam_mini-BS-
27c66ebf4a21.rpkm failed to open.
>>> ________________________________________
>>> From: Peter Glaus [glaus@cs.man.ac.uk]
>>> Sent: Friday, March 22, 2013 5:32 AM
>>> To: Maayan Kreitzman
>>> Subject: Re: BitSeq getExpression crash
>>>
>>> Hi Maayan,
>>> I believe the error is caused by process running out of memory. I
am not
>>> 100% sure, but when I saw this kinds of errors before, it was
caused by
>>> lack of memory. The estimation can be quite CPU and memory
intensive, so
>>> I advice running it on a computing cluster instead of using
regular
>>> desktop/notebook machine.
>>>
>>> Regarding your function call, when running actual analysis (not
just
>>> testing/trying out), please use higher values for MCMC_burnIn,
>>> MCMC_samplesN and MCMC_samplesSave (the default when leaving these
blank
>>> is 1000 and is usually "good enough"), the computation will take
longer,
>>> however the estimates will be much more accurate as well.
>>> (The values 200, 200, 50 are used in the vignette because the
example
>>> data is very small, and the vignette has to run within time
limit.)
>>>
>>> Also, for future reference when you have questions regarding
>>> Bioconductor packages, please post to Bioconductor user mailing
list
>>> (and CC package author), as you might sometimes get replies from
other
>>> users and also your post might help some other users if they
encounter
>>> similar problem in the future.
>>>
>>> Best regards,
>>> Peter.
>>>
>>> On 21/03/13 21:46, Maayan Kreitzman wrote:
>>>> Dear Peter,
>>>> I'm trying to run BitSeq, and am running into a problem after
several hours of the getExpression function running.
>>>> This same thing happened twice, on different servers. What weird
is that not only does the function crash, it actually exits R.
>>>> this is the error message:
>>>>
>>>> terminate called after throwing an instance of 'std::bad_alloc'
>>>> what(): St9bad_alloc
>>>> Aborted
>>>>
>>>> I have no experience whatsoever with R, so this may be a novice
mistake, but your help would be greatly appreciated.
>>>> I've copied the whole session below.
>>>>
>>>> thanks in advance,
>>>> Maayan
>>>>
>>>>
>>>>> library("BitSeq")
>>>> Loading required package: Rsamtools
>>>> Loading required package: IRanges
>>>> Loading required package: BiocGenerics
>>>>
>>>> Attaching package: BiocGenerics
>>>>
>>>> The following object(s) are masked from package:stats:
>>>>
>>>> xtabs
>>>>
>>>> The following object(s) are masked from package:base:
>>>>
>>>> anyDuplicated, cbind, colnames, duplicated, eval, Filter,
Find,
>>>> get, intersect, lapply, Map, mapply, mget, order, paste,
pmax,
>>>> pmax.int, pmin, pmin.int, Position, rbind, Reduce,
rep.int,
>>>> rownames, sapply, setdiff, table, tapply, union, unique
>>>>
>>>> Loading required package: GenomicRanges
>>>> Loading required package: Biostrings
>>>> Loading required package: zlibbioc
>>>>> res1 <- getExpression("/projects/mkreitzman_prj/expression_quant
ification_testing/testing/test_data/strand_specific/transcriptome/bowt
ie2transcriptome/A08473_gr4.sam",
>>>> + "/projects/mkreitzman_prj/expression_quantification_testing/tes
ting/test_data/strand_specific/transcriptome/Homo_sapiens.GRCh37.69.cd
na.all.fa",
>>>> + log = TRUE,
MCMC_burnIn=200,MCMC_samplesN=200,MCMC_samplesSave=50,seed=47)
>>>> ## Computing alignment probabilities.
>>>> [time: +1.866667 m]
>>>> [time: +36.400000 m]
>>>> [time: +0.000000 m]
>>>> [time: +117.050000 m]
>>>> [time: +0.500000 m]
>>>> [time: +0.000000 m]
>>>> ## Estimating transcript expression levels.
>>>> Mappings: 71092830
>>>> Ntotal: 123098679
>>>> 10000 [time: +1.000000 s]
>>>> 100000 [time: +0.000000 s]
>>>> 1000000 [time: +3.000000 s]
>>>> 10000000 [time: +25.000000 s]
>>>> Read only 14186178 reads.
>>>> Finished Reading!
>>>> Total hits = 28372355
>>>> Isoforms: 183985
>>>> Burn in: 200 DONE. [time: +12.016667 m]
>>>>
>>>> Sampling DONE. [time: +12.850000 m]
>>>> rHat (for 200 samples)
>>>> rHat (rHat from subset | tid | mean theta)
>>>> 1.0252 ( 1.1173 | 89080 | 0.0000)
>>>> 1.0216 ( 1.1351 | 126802 | 0.0000)
>>>> 1.0183 ( 1.0151 | 183201 | 0.0000)
>>>> Mean rHat of worst 10 transcripts: 1.018596
>>>> Mean C0: (3516 3520 3529 3518 ). Nunmap: 52005849
>>>>
>>>> Producing 33 final samples.
>>>>
>>>> Sampling DONE. [time: +2.166667 m]
>>>> rHat (for 33 samples)
>>>> rHat (rHat from subset | tid | mean theta)
>>>> 1.1193 ( 1.1332 | 117458 | 0.0000)
>>>> 1.1181 ( 1.1229 | 158878 | 0.0000)
>>>> 1.1074 ( 1.1004 | 43840 | 0.0000)
>>>> Mean rHat of worst 10 transcripts: 1.108279
>>>> Mean C0: (3528 3512 3523 3520 ). Nunmap: 52005849
>>>>
>>>> Total samples: 932
>>>> terminate called after throwing an instance of 'std::bad_alloc'
>>>> what(): St9bad_alloc
>>>> Aborted
>
[[alternative HTML version deleted]]