Entering edit mode
Niu, Liang NIH/NIEHS [E]
▴
50
@niu-liang-nihniehs-e-6477
Last seen 10.2 years ago
Dear Herve,
I found the reason for the mate_status error: it is because that the
read 1 in each pair were reverse complemented before the alignment.
Now I have a question: when I use readGAlignmentsList() to read in the
pairs, how can I preserve the order of the two reads, i.e., keep read
1 as the first alignment and read 2 as the second alignment in a
element (with two alignments) of the alignments list? is it the
default order of each element of the list?
Also, if I use grglist() to convert the above GAlignmentsList into a
GRangesList, do I need to set the order.as.in.query=TRUE for the above
purpose?
Best,
Liang
________________________________________
From: Hervé Pagès [hpages@fhcrc.org]
Sent: Sunday, March 30, 2014 4:45 PM
To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org
Subject: Re: A question about the function readGAlignmentPairs in
GenomicRnages package
On 03/30/2014 01:31 PM, Niu, Liang (NIH/NIEHS) [E] wrote:
> Dear Herve,
>
> Here is the code and output:
>
>> bam.path<-"test.bam"
>> bam.file<-BamFile(bam.path,asMates=TRUE)
>>
>> param<-ScanBamParam(flag=scanBamFlag(isPaired=TRUE,hasUnmappedMate=
FALSE,isUnmappedQuery=FALSE,isNotPassingQualityControls=FALSE,isDuplic
ate=FALSE,isNotPrimaryRead=FALSE))
>> alignment<-readGAlignmentsList(bam.file,param=param)
>> length(alignment)
> [1] 18041910
>> table(mcols(unlist(alignment,
use.names=FALSE))$mates[end(PartitioningByEnd(alignment))])
>
> FALSE
> 18041910
>
> What is the problem?
Whaoo. No idea :-/ I'm on week-end right now and have to turn off
the computer due to other obligations, sorry. I'll try to get to this
later. In the mean time it would be great if you could install R-3.1.0
+ BioC 2.14 (you will need the GenomicAlignments package) and try
this again.
Thanks,
H.
>
> Thanks!
>
> Liang
>
> ________________________________________
> From: Hervé Pagès [hpages at fhcrc.org]
> Sent: Sunday, March 30, 2014 4:13 PM
> To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org
> Subject: Re: A question about the function readGAlignmentPairs in
GenomicRnages package
>
> Dear Liang,
>
> Please keep the communication on the mailing list (by using
> the "Reply All" button).
>
> On 03/30/2014 12:13 PM, Niu, Liang (NIH/NIEHS) [E] wrote:
>> Dear Herve,
>>
>> I use the following command to read in the alignments:
>>
>> bam.path<-"test.bam"
>> bam.file<-BamFile(bam.path, yieldSize=1E6, asMates=TRUE)
>>
>> param<-ScanBamParam(flag=scanBamFlag(isPaired=TRUE,hasUnmappedMate=
FALSE,isUnmappedQuery=FALSE,isNotPassingQualityControls=FALSE,isDuplic
ate=FALSE,isNotPrimaryRead=FALSE))
>> alignment<-readGAlignmentsList(bam.file,param=param)
>>
>> However, each element ((a pair of alignments)) of the
GAlignmentsList "alignment" has meta field Mates as "FALSE" for both
reads. Is it a problem?
>
> I'm surprised that *each* list element in your GAlignmentsList
object
> 'alignment' looks like this. How do you know? Why not show us the
> object?
>
> My understanding is that even though your BAM file contains paired-
end
> reads, when you read it with readGAlignmentsList(), you end up with
a
> GAlignmentsList object where some fraction of the list elements have
> the "mates" field set to FALSE. The reads in such list element have
the
> same QNAME but are not mates. Note that all list elements with the
> "mates" field set to TRUE are guaranteed to contain exactly 2 reads.
> But there is no such expectation for list elements with the "mates"
> field set to FALSE: they can contain 1, 2, 3, or more reads.
>
> So if *each* list element in your GAlignmentsList object
> 'alignment' really has the "mates" field set to FALSE, that
> means none of the reads in your file could be mated. That could
> actually happen if your data was not paired-end and if you didn't
> use isPaired=TRUE but that doesn't seem to be the case here.
>
> Here is how you can summarize how many list elements have the
> "mates" field set to FALSE and how many have it set to TRUE:
>
> table(mcols(unlist(alignment,
> use.names=FALSE))$mates[end(PartitioningByEnd(alignment))])
>
> Finally note that in BioC devel, the "mates" field has been replaced
> by the "mate_status" field and that this field is now an attribute
of
> the list elements rather than of the individual reads. This means
that
> it's a top-level metadata column (i.e. directly accessible with
> mcols(alignment)$mate_status) instead of an inner metadata column
> (which are more complicated to access). So it's much easier now to
get
> the above count:
>
> table(mcols(alignment)$mate_status)
>
> This will give you something like:
>
> > table(mcols(galist1)$mate_status)
>
> mated ambiguous unmated
> 75346 0 21286
>
> As you can see, another change is that this field is now a 3-level
> factor (levels are explained in ?readGAlignmentsList).
>
> Hope this helps,
> H.
>
>
>>
>> Liang
>>
>>
>> ________________________________________
>> From: Hervé Pagès [hpages at fhcrc.org]
>> Sent: Sunday, March 30, 2014 2:54 PM
>> To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org
>> Subject: Re: A question about the function readGAlignmentPairs in
GenomicRnages package
>>
>> On 03/30/2014 11:52 AM, Niu, Liang (NIH/NIEHS) [E] wrote:
>>> Sorry for the typo in the previous email. What I mean is "I do
NEED (not NOT) the pairing information".
>>
>> I see. So yes, readGAlignmentsList() should do it.
>>
>> Cheers,
>> H.
>>
>>>
>>> Liang
>>> ________________________________________
>>> From: Hervé Pagès [hpages at fhcrc.org]
>>> Sent: Sunday, March 30, 2014 2:51 PM
>>> To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org
>>> Subject: Re: A question about the function readGAlignmentPairs in
GenomicRnages package
>>>
>>> On 03/30/2014 11:31 AM, Niu, Liang (NIH/NIEHS) [E] wrote:
>>>> Dear Herve,
>>>>
>>>> Thanks for your suggestion. I do not the paring information in
the downstream analysis, therefore, readGAlignmentsList() is good.
>>>
>>> Maybe I was not clear enough but if you don't need the pairing,
then
>>> you can just use readGAlignments(). readGAlignmentsList() will
pair
>>> everything that can be paired and this has a cost (in terms of
>>> performance and memory usage). By using readGAlignments() you
avoid
>>> that cost and you also end up with an object that is a little bit
>>> simpler to manipulate (i.e. a GAlignments instead of
GAlignmentsList
>>> object).
>>>
>>> Hope that makes sense,
>>> H.
>>>
>>>>
>>>> Liang
>>>> ________________________________________
>>>> From: Hervé Pagès [hpages at fhcrc.org]
>>>> Sent: Sunday, March 30, 2014 2:12 PM
>>>> To: Niu, Liang (NIH/NIEHS) [E]
>>>> Cc: bioconductor at r-project.org
>>>> Subject: Re: A question about the function readGAlignmentPairs in
GenomicRnages package
>>>>
>>>> Hi Liang,
>>>>
>>>> Hope you don't mind that I'm cc'ing the Bioconductor mailing
list, so
>>>> other can give suggestions.
>>>>
>>>> On 03/30/2014 09:32 AM, Niu, Liang (NIH/NIEHS) [E] wrote:
>>>>> Dear Mr. Pages,
>>>>>
>>>>> This is Liang Niu, a research fellow at the National Institute
of Environmental Health Sciences.
>>>>>
>>>>> I am using R to read .bam files for chromatin interaction data
sets. Such data sets contains alignments for paired-end reads from
ChIA-PET experiment, thus it has pairs in which two reads on different
chromosomes and/or on same strand, and those pairs are valid pairs. I
want to use your function readGAlignmentPairs in GenomicRnages package
to read the pairs, but the manual says that it will discard those
pairs. Do you have any suggestion?
>>>>
>>>> You could use readGAlignmentsList() instead of
readGAlignmentPairs().
>>>> readGAlignmentsList() will keep these "discordant pairs". It will
also
>>>> keep the reads that cannot be paired.
>>>> Note that, depending on what you will do downstream, you don't
>>>> necessarily need to pair the reads (e.g. if you're going to
compute
>>>> the read coverage). In that case you can just load the reads with
>>>> readGAlignments().
>>>>
>>>> Cheers,
>>>> H.
>>>>
>>>>>
>>>>> Best,
>>>>> Liang
>>>>>
>>>>
>>>> --
>>>> Hervé Pagès
>>>>
>>>> Program in Computational Biology
>>>> Division of Public Health Sciences
>>>> Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N, M1-B514
>>>> P.O. Box 19024
>>>> Seattle, WA 98109-1024
>>>>
>>>> E-mail: hpages at fhcrc.org
>>>> Phone: (206) 667-5791
>>>> Fax: (206) 667-1319
>>>>
>>>
>>> --
>>> Hervé Pagès
>>>
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>> P.O. Box 19024
>>> Seattle, WA 98109-1024
>>>
>>> E-mail: hpages at fhcrc.org
>>> Phone: (206) 667-5791
>>> Fax: (206) 667-1319
>>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone: (206) 667-5791
>> Fax: (206) 667-1319
>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone: (206) 667-5791
> Fax: (206) 667-1319
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319