Entering edit mode
A 'txLoc' column has been added to the output of predictCoding.
Available in devel version 1.1.57.
Valerie
On 02/28/2012 08:20 AM, Valerie Obenchain wrote:
> Good suggestion. Yes, predictCoding is does this internally. I'll
post
> back here when this has been added.
>
> Valerie
>
>
>
> On 02/28/2012 01:49 AM, Alex Gutteridge wrote:
>> Hi Valerie,
>>
>> Thanks everything works great now. One small feature request -
would
>> it be hard to output the protein sequence position of the coding
>> SNPs? At the moment once I've run predictCoding I'm re-extracting
the
>> cds and working out the position of each coding SNP so I can see
>> where in the protein sequence it is, but it seems like this is
>> probably just replicating what predictCoding must be doing
internally
>> anyway?
>>
>> Alex Gutteridge
>
>
> On 02/24/2012 10:39 AM, Valerie Obenchain wrote:
>> Hi Alex,
>>
>> Thanks for the bug report. The cdsID was taken from an overlap
>> between the query and GRangesList of cds by transcripts. This gave
>> the correct transcript number but (incorrectly) took the first cds
>> number in the list by default. Now fixed in devel 1.1.55.
>>
>> I've also updated the man page.
>>
>> Valerie
>>
>>
>>
>> On 02/24/2012 02:08 AM, Alex Gutteridge wrote:
>>> On 22.02.2012 18:58, Hervé Pagès wrote:
>>>> Hi Alex,
>>>>
>>>> On 02/22/2012 03:56 AM, Alex Gutteridge wrote:
>>>
>>> [...]
>>>
>>>>> But the predictCoding call gives this error:
>>>>>
>>>>> Error in .setSeqNames(x, value) :
>>>>> The replacement value for isActiveSeq must be a logical vector,
with
>>>>> names that match the seqlevels of the object
>>>>
>>>> The error message doesn't help much but I think the pb is that
you
>>>> didn't rename chMT properly. Try to do this:
>>>>
>>>> seqlevels(snps) <- gsub("chrMT", "chrM", seqlevels(snps))
>>>>
>>>> before you start the for(eg in entrez.ids){..} loop again.
>>>>
>>>> Cheers,
>>>> H.
>>>
>>> Thanks Herv? that nailed it. I'm having some difficulty joining up
>>> the output of predictCoding() with the query SNPs though. If
someone
>>> could point out where the disconnect in my thinking is I would
>>> appreciate it!
>>>
>>> Here's my (now edited down) script:
>>>
>>> library(BSgenome.Hsapiens.UCSC.hg19)
>>> library(VariantAnnotation)
>>> library(SNPlocs.Hsapiens.dbSNP.20110815)
>>> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>>
>>> entrez.ids = c('6335')
>>> txdb19 = TxDb.Hsapiens.UCSC.hg19.knownGene
>>>
>>> snps = getSNPlocs(c("ch1","ch2"),as.GRanges=T)
>>> seqlevels(snps) <- gsub("ch", "chr", seqlevels(snps))
>>> seqlevels(snps) <- gsub("chrMT", "chrM", seqlevels(snps))
>>>
>>> gene.list = cdsBy(txdb19, by="gene")
>>> vsd.list = gene.list[entrez.ids]
>>> cds.list = cdsBy(txdb19,by="tx")
>>>
>>> eg = entrez.ids[1]
>>>
>>> snp.idx = unique(queryHits(findOverlaps(snps, vsd.list[[eg]])))
>>> eg.snps = snps[snp.idx]
>>> iupac = values(eg.snps)[,"alleles_as_ambig"]
>>> eg.snps.exp = rep(eg.snps, nchar(IUPAC_CODE_MAP[iupac]))
>>> variant.alleles =
>>> DNAStringSet(strsplit(paste(IUPAC_CODE_MAP[iupac],collapse=""),"")
[[1]])
>>>
>>>
>>> aa =
>>> predictCoding(eg.snps.exp,txdb19,seqSource=Hsapiens,varAllele=vari
ant.alleles)
>>>
>>> #####
>>>
>>> Then if I query the predictCoding results in aa in an interactive
>>> session I get the following (see inline comments for what I think
>>> should be happening, but I must be misinterpreting what queryID
means)
>>>
>>> The docs for predictCoding() contain a small typo
>>> (s/queryHits/queryID), but otherwise seem clear?
>>>
>>> Columns include ?queryID?, ?consequence?, ?refSeq?, ?varSeq?,
>>> ?refAA?, ?varAA?, ?txID?, ?geneID?, and ?cdsID?.
>>>
>>> ?queryHits? The ?queryHits? column provides a map back to the
>>> variants in the original ?query?. If the ?query? was a
?VCF?
>>> object this index corresponds to the row in the
?GRanges? in
>>> the ?rowData? slot. If ?query? was an expanded
?GRanges?,
>>> ?RangedData? or ?RangesList? the index corresponds to
the row
>>> in the expanded object.
>>>
>>> #####
>>>
>>>> aa[1,]
>>> DataFrame with 1 row and 9 columns
>>> queryID consequence refSeq varSeq
refAA
>>> <integer> <factor> <dnastringset> <dnastringset> <aastringset>
>>> 1 1 nonsynonymous CTC ATC
L
>>> varAA txID geneID cdsID
>>> <aastringset> <character> <factor> <integer>
>>> 1 I 10921 6335 33668
>>>> #So the first SNP (queryID: 1) is nonsynonymous and maps to tx
>>>> '10921' and cds '33668'.
>>>> #If I look at the first query SNP I get this:
>>>> eg.snps.exp[aa[1,'queryID'],]
>>> GRanges with 1 range and 2 elementMetadata values:
>>> seqnames ranges strand | RefSNP_id
>>> alleles_as_ambig
>>> <rle> <iranges> <rle> | <character> <character>
>>> [1] chr2 [167055370, 167055370] * |
>>> 111558968 R
>>> ---
>>> seqlengths:
>>> chr1 chr2 chr3 chr4 chr5 chr6 ... chr20 chr21 chr22 chrX
>>> chrY chrM
>>> NA NA NA NA NA NA ... NA NA NA
>>> NA NA NA
>>>> #So SNP 1 is at 167055370 on chr2
>>>> #But if I check tx '10921' I see that the cds overlapping
167055370
>>>> is actually '33651'
>>>> #And cds '33668' is at the other end of the tx:
>>>> cds.list[[aa[1,'txID']]]
>>> GRanges with 26 ranges and 3 elementMetadata values:
>>> seqnames ranges strand | cds_id
cds_name
>>> <rle> <iranges> <rle> | <integer> <character>
>>> [1] chr2 [167168009, 167168266] - | 33668 <na>
>>> [2] chr2 [167163466, 167163584] - | 33667 <na>
>>> [3] chr2 [167163020, 167163109] - | 33666 <na>
>>> [4] chr2 [167162302, 167162430] - | 33647 <na>
>>> [5] chr2 [167160748, 167160839] - | 33646 <na>
>>> [6] chr2 [167159600, 167159812] - | 33645 <na>
>>> [7] chr2 [167151109, 167151172] - | 33644 <na>
>>> [8] chr2 [167149741, 167149882] - | 33643 <na>
>>> [9] chr2 [167144947, 167145153] - | 33642 <na>
>>> ... ... ... ... ... ...
...
>>> [18] chr2 [167099012, 167099166] - | 33659 <na>
>>> [19] chr2 [167094604, 167094777] - | 33658 <na>
>>> [20] chr2 [167089850, 167089972] - | 33657 <na>
>>> [21] chr2 [167085201, 167085482] - | 33656 <na>
>>> [22] chr2 [167084180, 167084233] - | 33655 <na>
>>> [23] chr2 [167083077, 167083214] - | 33654 <na>
>>> [24] chr2 [167060870, 167060974] - | 33653 <na>
>>> [25] chr2 [167060465, 167060735] - | 33652 <na>
>>> [26] chr2 [167055182, 167056374] - | 33651 <na>
>>> exon_rank
>>> <integer>
>>> [1] 2
>>> [2] 3
>>> [3] 4
>>> [4] 5
>>> [5] 6
>>> [6] 7
>>> [7] 8
>>> [8] 9
>>> [9] 10
>>> ... ...
>>> [18] 19
>>> [19] 20
>>> [20] 21
>>> [21] 22
>>> [22] 23
>>> [23] 24
>>> [24] 25
>>> [25] 26
>>> [26] 27
>>> ---
>>> seqlengths:
>>> chr1 chr2 ...
>>> chr18_gl000207_random
>>> 249250621 243199373 ...
>>> 4262
>>>
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor