Hi. say I have a GAlignments Object of several alignments and 5 metadata columns.
GAlignments object with 3 alignments and 5 metadata columns:
seqnames strand cigar qwidth start end width
<Rle> <Rle> <character> <integer> <integer> <integer> <integer>
[1] seq1 + 35M 35 970 1004 35
[2] seq1 + 35M 35 971 1005 35
[3] seq1 + 35M 35 972 1006 35
njunc | rname strand pos qwidth
<integer> | <factor> <factor> <integer> <integer>
[1] 0 | seq1 + 970 35
[2] 0 | seq1 + 971 35
[3] 0 | seq1 + 972 35
seq
<DNAStringSet>
[1] TATTAGGAAATGCTTTACTGTCATAACTATGAAGA
[2] ATTAGGAAATGCTTTACTGTCATAACTATGAAGAG
[3] TTAGGAAATGCTTTACTGTCATAACTATGAAGAGA
-------
seqinfo: 2 sequences from an unspecified genome
|
|
|
Note that an important difference between a GAlignments and a GRanges object is that the former represents alignments (i.e. ranges + CIGARs), whereas the latter only represents ranges. More precisely a range on the reference sequence is generally not enough to describe an alignment: another important aspect of an alignment is its geometry. In the BAM/SAM world, and for GAlignments objects, the geometry of an alignment is described by its CIGAR. So roughly speaking, a GAlignments object can be seen as a GRanges object + the additional CIGAR information for each range. Now when you coerce from GAlignments to GRanges, you loose the CIGAR. In other words you just en up with a single range per alignment and you've lost its exact geometry. When you coerce from GRanges back to GAlignments, that information cannot be restored. So what this coercion does is assign a "fake" CIGAR to each alignment that describes the simplest type of alignment, that is, an alignment with no indels (I or D) or skipped regions (N) or soft or hard clipping (S or H). All this to say that there isn't really much value in going from GRanges back to GAlignments: the original CIGARs are lost anyway and the GAlignments object you end up with doesn't contain more information than the GRanges object.
H.
Any suggestions then for the original question of editing the GAlignments, which I guess is related to removing reads from coverage() calculation. ?
So yes reads can be removed from a GAlignments object by just subsetting the object, but core components of a GAlignments object like the start, end, width, and cigar cannot be modified. This is because the components describing the range of an alignment (stat, end, width) are tightly coupled to its cigar. For example modifying the start would also require to modify the cigar. Even an operation as simple as
shift()
that would preserve the width of an alignment would "break" the cigar in the sense that the original cigar becomes meaningless (not to say wrong) when the alignment is moved to the new position. The only operations we support that alter the start, end, width, and cigar of a GAlignments object arenarrow()
andqnarrow()
(see?GenomicAlignments::narrow
). These are specific intra-range transformations where we actually know what to do with the cigar, that is, we know how to alter the start/end/width and the cigar together so that the modified cigar is an accurate description of the geometry of the narrowed alignment. Maybe we could support other specific intra-range transformations like that. Maybe the OP has a use case where s/he needs a specific intra-range transformation that we could support. But IMO, we should probably avoid having start, end, width, and cigar setters in order to support generic editing of a GAlignments object.Finally note that the user can always use the
GAlignments()
constructor function to make a GAlignments object with the content of her choice. This actually provides an easy workaround for the lack of start, end, width, and cigar setters.H.