xmapcore get intronic sequence

0

Entering edit mode

Steve Taylor ▴ 280

@steve-taylor-2838

Last seen 10.3 years ago

Hi, Is there a method in xmapcore to get part of the sequence of the adjoining intron given an exon id? For example: > exon.details("ENSE00001146308") RangedData with 1 row and 5 value columns across 1 space space ranges | stable_id strand phase <character> <iranges> | <character> <integer> <integer> 1 17 [7590695, 7590856] | ENSE00001146308 -1 -1 end_phase <integer> 1 -1 sequence <character> 1 GTTTTCCCCTCCCATGTGCTCAAGACTGGCGCTAAAAGTTTTGAGCTTCTCAAAAGTCTAGAGCCACCGT CCAGGGAGCAGGTAGCTGCTGGGCTCCGGGGACACTTTGCGTTCGGGCTGGGAGCGTGCTTTCCACGACG GTGACACGCTTCCCTGGATTGG I'd like to get chr17:7590695-7590863 Thanks, Steve

xmapcore xmapcore • 1.9k views

ADD COMMENT • link updated 14.4 years ago by Tim Yates ▴ 250 • written 14.4 years ago by Steve Taylor ▴ 280

0

Entering edit mode

Vincent J. Carey, Jr. 6.7k

@vincent-j-carey-jr-4

Last seen 3 months ago

United States

I can't imagine why xmapcore would provide that, but why not use Biostrings and BSgenome.Hsapiens.UCSC.[your favorite version]? On Wed, Jul 28, 2010 at 6:25 AM, Stephen Taylor <stephen.taylor@imm.ox.ac.uk> wrote: > Hi, > > Is there a method in xmapcore to get part of the sequence of the adjoining > intron given an exon id? > > For example: > > > exon.details("ENSE00001146308") > RangedData with 1 row and 5 value columns across 1 space > space ranges | stable_id strand phase > <character> <iranges> | <character> <integer> <integer> > 1 17 [7590695, 7590856] | ENSE00001146308 -1 -1 > end_phase > <integer> > 1 -1 > > sequence > > <character> > 1 > GTTTTCCCCTCCCATGTGCTCAAGACTGGCGCTAAAAGTTTTGAGCTTCTCAAAAGTCTAGAGCCACC GTCCAGGGAGCAGGTAGCTGCTGGGCTCCGGGGACACTTTGCGTTCGGGCTGGGAGCGTGCTTTCCACGA CGGTGACACGCTTCCCTGGATTGG > > > I'd like to get chr17:7590695-7590863 > > Thanks, > > Steve > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 14.4 years ago Vincent J. Carey, Jr. 6.7k

0

Entering edit mode

Tim Yates ▴ 250

@tim-yates-4040

Last seen 10.3 years ago

Hi Steve, No, we currently only store the sequence information for exons However, I have a site which should allow you to look up that information: http://xmap.picr.man.ac.uk/sequence/ You should be able to entr your region of interest, and click "View Below" Cheers, Tim On 28/07/2010 14:25, "Stephen Taylor" <stephen.taylor at="" imm.ox.ac.uk=""> wrote: > Hi, > > Is there a method in xmapcore to get part of the sequence of the adjoining > intron given an exon id? > > For example: > >> exon.details("ENSE00001146308") > RangedData with 1 row and 5 value columns across 1 space > space ranges | stable_id strand phase > <character> <iranges> | <character> <integer> <integer> > 1 17 [7590695, 7590856] | ENSE00001146308 -1 -1 > end_phase > <integer> > 1 -1 > > sequence > > <character> > 1 > GTTTTCCCCTCCCATGTGCTCAAGACTGGCGCTAAAAGTTTTGAGCTTCTCAAAAGTCTAGAGCCACC GTCCAGGGAG > CAGGTAGCTGCTGGGCTCCGGGGACACTTTGCGTTCGGGCTGGGAGCGTGCTTTCCACGACGGTGACA CGCTTCCCTG > GATTGG > > > I'd like to get chr17:7590695-7590863 > > Thanks, > > Steve > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 14.4 years ago Tim Yates ▴ 250

0

Entering edit mode

Hi Tim, > No, we currently only store the sequence information for exons Fair enough... > > However, I have a site which should allow you to look up that information: > > http://xmap.picr.man.ac.uk/sequence/ > > You should be able to entr your region of interest, and click "View Below" > Thanks. I have a lot of probesetids so I'll probably use BioStrings as Vincent suggested or Bio::Ensembl. On a related note, if do probeset.to.exon(probesetids), then I get less rows returned than the size of the probesetids list, I presume because they don't map to known exons. Since I want to merge this with an existing dataframe that contains the probesetids and expression values, I really need it to return a value (FALSE or NA or something) where there is no match so I can do this merge. Is there a way of doing this? Thanks, Steve > Cheers, > > Tim > > On 28/07/2010 14:25, "Stephen Taylor"<stephen.taylor at="" imm.ox.ac.uk=""> wrote: > >> Hi, >> >> Is there a method in xmapcore to get part of the sequence of the adjoining >> intron given an exon id? >> >> For example: >> >>> exon.details("ENSE00001146308") >> RangedData with 1 row and 5 value columns across 1 space >> space ranges | stable_id strand phase >> <character> <iranges> |<character> <integer> <integer> >> 1 17 [7590695, 7590856] | ENSE00001146308 -1 -1 >> end_phase >> <integer> >> 1 -1 >> >> sequence >> >> <character> >> 1 >> GTTTTCCCCTCCCATGTGCTCAAGACTGGCGCTAAAAGTTTTGAGCTTCTCAAAAGTCTAGAGCCAC CGTCCAGGGAG >> CAGGTAGCTGCTGGGCTCCGGGGACACTTTGCGTTCGGGCTGGGAGCGTGCTTTCCACGACGGTGAC ACGCTTCCCTG >> GATTGG >> >> >> I'd like to get chr17:7590695-7590863 >> >> Thanks, >> >> Steve >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 14.4 years ago Steve Taylor ▴ 280

0

Entering edit mode

Can you try doing: probeset.to.exon( probesetids, rm.unreliable=F ) Probeset to exon by default removes any "unreliable" probesets before mapping to exons, as otherwise some massively multi-targetting probesets could cause a "datastorm" Tim On 28/07/2010 15:39, "Stephen Taylor" <stephen.taylor at="" imm.ox.ac.uk=""> wrote: ...snip... > On a related note, if do > > probeset.to.exon(probesetids), > > then I get less rows returned than the size of the probesetids list, I presume > because they don't map to known exons. > Since I want to merge this with an existing dataframe that contains the > probesetids and expression values, I really need > it to return a value (FALSE or NA or something) where there is no match so I > can do this merge. Is there a way of doing > this?

ADD REPLY • link 14.4 years ago Tim Yates ▴ 250

0

Entering edit mode

Hi Tim, > probeset.to.exon( probesetids, rm.unreliable=F ) Unfortunately this is still not the same size: > length(probeset.to.exon( probesetids, rm.unreliable=F )) [1] 1274 > length(probesetids) [1] 1771 Thanks, Steve

ADD REPLY • link 14.4 years ago Steve Taylor ▴ 280

0

Entering edit mode

Another (non R) site that will provide introns (among other things) between specified co-ords is GABOS(Get A Bit Of Sequence) at: http://bioinf.wehi.edu.au/gabos/ Select genome = hg19, Select Annotation file as ensGene, Select chr17.fa from the drop down list of chromosomes, Enter a sequence range like 7590695-7590863 Click the "Retrieve Sequence Data" button and you will get: > hg19 chr17 + ensGene ENST00000431639 Gene 10 [ 1 17429 ] 7589389 7606817 [ -0 17429 +0 ] 7589389 7606817 > hg19 chr17 + ensGene ENST00000431639 Intron '1/10 [ 1153 1321 ] 7590695 7590863 [ -0 169 +0 ] 7590695 7590863 CCAATCCAGGGAAGCGTGTCACCGTCGTGGAAAGCACGCTCCCAGCCCGAACGCAAAGTGTCCCCGGAGC CCAGCAGCTACCTGCTCCCTGGACGGTGGC TCTAGACTTTTGAGAAGCTCAAAACTTTTAGCGCCAGTCTTGAGCACATGGGAGGGGAAAACCCCAATC This tells you that there is a Gene with 10 introns in this region and that part (169 bases) of the first intron is in the specified sequence range. GABOS uses copies of UCSC chromosome data and UCSC Browser annotation files, cheers, Keith ======================== Keith Satterley Bioinformatics Division The Walter and Eliza Hall Institute of Medical Research Parkville, Melbourne, Victoria, Australia ======================= Stephen Taylor wrote: > Hi Tim, > >> probeset.to.exon( probesetids, rm.unreliable=F ) > > Unfortunately this is still not the same size: > > > length(probeset.to.exon( probesetids, rm.unreliable=F )) > [1] 1274 > > length(probesetids) > [1] 1771 > > Thanks, > > Steve > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 14.4 years ago Keith Satterley ▴ 450

0

Entering edit mode

There are a couple of options: 1) Some of your probesets don't hit exons 2) Some of your probesets hit the same exons Not much can be done if it's the first case, but you can detect the second by passing as.vector=F to the probeset.to.exon method, ie: > probesetIds = c( '3081222', '3081223' ) > probeset.to.exon( probesetIds ) [1] "ENSE00001149618" > probeset.to.exon( probesetIds, as.vector=F ) RangedData with 2 rows and 6 value columns across 1 space space ranges | IN1 stable_id strand <character> <iranges> | <character> <character> <integer> 1 7 [155592680, 155596420] | 3081222 ENSE00001149618 -1 2 7 [155592680, 155596420] | 3081223 ENSE00001149618 -1 You can see the IN1 column is the probeset name that caused the result, and the stable_id column shows that both probesets hit the same exon Fingers crossed this gets to the bottom of it ;-) Cheers, Tim On 28/07/2010 15:56, "Stephen Taylor" <stephen.taylor at="" imm.ox.ac.uk=""> wrote: > Hi Tim, > >> probeset.to.exon( probesetids, rm.unreliable=F ) > > Unfortunately this is still not the same size: > >> length(probeset.to.exon( probesetids, rm.unreliable=F )) > [1] 1274 >> length(probesetids) > [1] 1771 > > Thanks, > > Steve

ADD REPLY • link 14.4 years ago Tim Yates ▴ 250

0

Entering edit mode

Hi Tim, > There are a couple of options: > > 1) Some of your probesets don't hit exons > 2) Some of your probesets hit the same exons > > Not much can be done if it's the first case, but you can detect the second > by passing as.vector=F to the probeset.to.exon method, ie: > >> probesetIds = c( '3081222', '3081223' ) >> probeset.to.exon( probesetIds ) > [1] "ENSE00001149618" > >> probeset.to.exon( probesetIds, as.vector=F ) > RangedData with 2 rows and 6 value columns across 1 space > space ranges | IN1 stable_id strand > <character> <iranges> |<character> <character> <integer> > 1 7 [155592680, 155596420] | 3081222 ENSE00001149618 -1 > 2 7 [155592680, 155596420] | 3081223 ENSE00001149618 -1 > > You can see the IN1 column is the probeset name that caused the result, and > the stable_id column shows that both probesets hit the same exon > > Fingers crossed this gets to the bottom of it ;-) Some of these are obviously not hitting exons despite having valid probesetids (I just double checked they are real probesets by going to the Netaffx site). So: > dim(as.data.frame(probeset.to.exon( probesetids, rm.unreliable=F, as.vector=F))) [1] 1666 10 My input list of probesetids was 1771. So there are still some missing :-(. It seems the only way round this for me is to write a loop and test if probeset.to.exon returns anything. How about a rm.notfound=T or F parameter at some point in the future (he asked hopefully! :-)). Thanks, Steve > > Cheers, > > Tim > > On 28/07/2010 15:56, "Stephen Taylor"<stephen.taylor at="" imm.ox.ac.uk=""> wrote: > >> Hi Tim, >> >>> probeset.to.exon( probesetids, rm.unreliable=F ) >> >> Unfortunately this is still not the same size: >> >>> length(probeset.to.exon( probesetids, rm.unreliable=F )) >> [1] 1274 >>> length(probesetids) >> [1] 1771 >> >> Thanks, >> >> Steve >

ADD REPLY • link 14.4 years ago Steve Taylor ▴ 280

0

Entering edit mode

Hi Steve, Yeah, looks like they are missing exons. I'll have a think about the rm.notfound idea, but that would require the method to return two things; the list of exons, and a list of probesets which didn't hit. If as.vector=F is passed, it would again require two results, or a RangedData object with some rows with missing IRanges objects (which I don't think is possible -- and would break the rest of xmapcore even if it were) I think for now, a loops is the only way Or it should be possible to write a method that filters your list of probeset ids, removing the ones which appear in the IN1 column of the RangedData object. Something like: > probesetIds = c( '3081222', 'doesntexist', '3081223' ) > exons = probeset.to.exon( probesetIds, as.vector=F ) Then, to get the list of probesets that didn't match to anything: > probesetIds[ !( probesetIds %in% exons[[ 'IN1' ]] ) ] [1] "doesntexist" Tim On 29/07/2010 09:09, "Steve Taylor" <stephen.taylor at="" imm.ox.ac.uk=""> wrote: > Hi Tim, > >> There are a couple of options: >> >> 1) Some of your probesets don't hit exons >> 2) Some of your probesets hit the same exons >> ...snip... >> Fingers crossed this gets to the bottom of it ;-) > > Some of these are obviously not hitting exons despite having valid probesetids > (I just double checked they are real probesets by going to the Netaffx site). > So: > >> dim(as.data.frame(probeset.to.exon( probesetids, rm.unreliable=F, >> as.vector=F))) > [1] 1666 10 > > My input list of probesetids was 1771. So there are still some missing :-(. > > It seems the only way round this for me is to write a loop and test if > probeset.to.exon returns anything. How about a rm.notfound=T or F parameter at > some point in the future (he asked hopefully! :-)). > > Thanks, > > Steve > -------------------------------------------------------- This email is confidential and intended solely for the u...{{dropped:12}}

ADD REPLY • link 14.4 years ago Tim Yates ▴ 250

0

Entering edit mode

Hi Tim, > Yeah, looks like they are missing exons. I'll have a think about the > rm.notfound idea, but that would require the method to return two things; > the list of exons, and a list of probesets which didn't hit. If as.vector=F > is passed, it would again require two results, or a RangedData object with > some rows with missing IRanges objects (which I don't think is possible -- > and would break the rest of xmapcore even if it were) > I think for now, a loops is the only way > > Or it should be possible to write a method that filters your list of > probeset ids, removing the ones which appear in the IN1 column of the > RangedData object. Something like: > >> probesetIds = c( '3081222', 'doesntexist', '3081223' ) >> exons = probeset.to.exon( probesetIds, as.vector=F ) > > Then, to get the list of probesets that didn't match to anything: > >> probesetIds[ !( probesetIds %in% exons[[ 'IN1' ]] ) ] > [1] "doesntexist" > That's a good solution. Thanks for your help, Steve

ADD REPLY • link 14.4 years ago Steve Taylor ▴ 280

Login before adding your answer.