Query neighboring genes
1
0
Entering edit mode
Yadav Sapkota ▴ 130
@yadav-sapkota-5156
Last seen 10.3 years ago
Hi, Is there any tool or package that can query neighboring genes (lets say 100 KB upstream and downstream) of a specific chromosomal range (chr1:100-200)? I have thousands of these chromosomal ranges. You help will be greatly appreciated. Regards, Yadav Sapkota Uni of Alberta [[alternative HTML version deleted]]
• 1.3k views
ADD COMMENT
0
Entering edit mode
Julie Zhu ★ 4.3k
@julie-zhu-3596
Last seen 14 months ago
United States
Yadav, You could try annotatePeakInBatch in ChIPpeakAnno package. Best regards, Julie On 5/17/12 12:40 PM, "Yadav Sapkota" <ysapkota at="" ualberta.ca=""> wrote: > Hi, > > Is there any tool or package that can query neighboring genes (lets say 100 > KB upstream and downstream) of a specific chromosomal range (chr1:100-200)? > I have thousands of these chromosomal ranges. > > You help will be greatly appreciated. > > Regards, > Yadav Sapkota > Uni of Alberta > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Or perhaps something like this? library(TxDb.appropriateOrganism.appropriateAssembly) tx.ranges <- transcripts(TxDb, columns=c('tx_id','gene_id')) dist.to.tx <- distanceToNearest(your.ranges, tx.ranges) dist.to.tx R> dist.to.tx ## DataFrame with 500000 rows and 3 columns ## queryHits subjectHits distance ## <integer> <integer> <integer> ## 1 1 3965 0 ## 2 2 3965 0 ## 3 3 3975 0 ## 4 4 3975 0 ## 5 5 3975 0 ## 6 6 4 241 ## 7 7 4 0 ## 8 8 4 21542 ## 9 9 3979 1445 ## ... ... ... ... ## ## get Entrez IDs for the nearest genes to the first 6 ranges (obviously you can do them all) ## gene_ids <- as.character(values(tx.ranges[head(dist.to.tx)$subjectHits])$gene_id) ## ## get their symbols ## unlist(mget(gene_ids, org.Hs.egSYMBOL, ifnotfound=NA)) ## ## 653635 653635 653635 653635 653635 79501 ## "WASH7P" "WASH7P" "WASH7P" "WASH7P" "WASH7P" "OR4F5" ## ## get their distances (again just the first 6) ## head(dist.to.tx$distance) ## [1] 0 0 0 0 0 241 Is that what you are after? Note: annotatePeakInBatch() might well be just as good or better; I am not very proficient with ChIPseq packages in R since I mostly use MACS and ChromHMM for the things I am interested in. So, no offense meant to any authors of the many fine ChIPseq packages! distanceToNearest() is just a handy new function in GenomicRanges which I have found to provide a fast, generally useful tool. --t On Thu, May 17, 2012 at 9:52 AM, Zhu, Lihua (Julie) <julie.zhu@umassmed.edu>wrote: > Yadav, > > You could try annotatePeakInBatch in ChIPpeakAnno package. > > Best regards, > > Julie > > > On 5/17/12 12:40 PM, "Yadav Sapkota" <ysapkota@ualberta.ca> wrote: > > > Hi, > > > > Is there any tool or package that can query neighboring genes (lets say > 100 > > KB upstream and downstream) of a specific chromosomal range > (chr1:100-200)? > > I have thousands of these chromosomal ranges. > > > > You help will be greatly appreciated. > > > > Regards, > > Yadav Sapkota > > Uni of Alberta > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi, I tried this function and it works fine for me however, I could not find any option to define the upstream and downstream size while looking for neighboring genes. For example, I wanted to only query genes that are located 100 KB upstream and 50 KB downstream from a chromosomal range. Any hint would be appreciated. Below is the code I am using for demo: source("http://bioconductor.org/biocLite.R") biocLite("ChIPpeakAnno") library("ChIPpeakAnno") data(myPeakList) data(TSS.human.GRCh37) annotatedPeak = annotatePeakInBatch (myPeakList[1:6,], AnnotationData = TSS.human.GRCh37) --Yadav On Thu, May 17, 2012 at 10:52 AM, Zhu, Lihua (Julie) <julie.zhu@umassmed.edu> wrote: > Yadav, > > You could try annotatePeakInBatch in ChIPpeakAnno package. > > Best regards, > > Julie > > > On 5/17/12 12:40 PM, "Yadav Sapkota" <ysapkota@ualberta.ca> wrote: > > > Hi, > > > > Is there any tool or package that can query neighboring genes (lets say > 100 > > KB upstream and downstream) of a specific chromosomal range > (chr1:100-200)? > > I have thousands of these chromosomal ranges. > > > > You help will be greatly appreciated. > > > > Regards, > > Yadav Sapkota > > Uni of Alberta > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- Yadav Sapkota PhD Candidate Dept. of Laboratory Medicine and Pathology, UofA Cross Cancer Institute 11560 University Avenue, Edmonton AB T6G 1Z2 Phone: 780-5778092 [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
upstream <- flank(myPeakList, 10000) downstream <- flank(myPeakList, 50000, FALSE) upstream.genes <- subsetByOverlaps(tx.ranges, upstream) downstream.genes <- subsetByOverlaps(tx.ranges, downstream) On Thu, May 17, 2012 at 1:17 PM, Yadav Sapkota <ysapkota@ualberta.ca> wrote: > Hi, > > I tried this function and it works fine for me however, I could not find > any option to define the upstream and downstream size while looking for > neighboring genes. > > For example, I wanted to only query genes that are located 100 KB upstream > and 50 KB downstream from a chromosomal range. Any hint would be > appreciated. Below is the code I am using for demo: > > source("http://bioconductor.org/biocLite.R") > biocLite("ChIPpeakAnno") > library("ChIPpeakAnno") > data(myPeakList) > data(TSS.human.GRCh37) > annotatedPeak = annotatePeakInBatch (myPeakList[1:6,], AnnotationData = > TSS.human.GRCh37) > > --Yadav > > > > > On Thu, May 17, 2012 at 10:52 AM, Zhu, Lihua (Julie) < > Julie.Zhu@umassmed.edu > > wrote: > > > Yadav, > > > > You could try annotatePeakInBatch in ChIPpeakAnno package. > > > > Best regards, > > > > Julie > > > > > > On 5/17/12 12:40 PM, "Yadav Sapkota" <ysapkota@ualberta.ca> wrote: > > > > > Hi, > > > > > > Is there any tool or package that can query neighboring genes (lets say > > 100 > > > KB upstream and downstream) of a specific chromosomal range > > (chr1:100-200)? > > > I have thousands of these chromosomal ranges. > > > > > > You help will be greatly appreciated. > > > > > > Regards, > > > Yadav Sapkota > > > Uni of Alberta > > > > > > [[alternative HTML version deleted]] > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@r-project.org > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > -- > Yadav Sapkota > PhD Candidate > Dept. of Laboratory Medicine and Pathology, UofA > Cross Cancer Institute > 11560 University Avenue, Edmonton AB T6G 1Z2 > Phone: 780-5778092 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Actually, for what you're doing, you want upstream.hits <- findOverlaps(tx.ranges, upstream) downstream.hits <- findOverlaps(tx.ranges, downstream) interpretation as previously On Thu, May 17, 2012 at 1:30 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > upstream <- flank(myPeakList, 10000) > downstream <- flank(myPeakList, 50000, FALSE) > upstream.genes <- subsetByOverlaps(tx.ranges, upstream) > downstream.genes <- subsetByOverlaps(tx.ranges, downstream) > > > On Thu, May 17, 2012 at 1:17 PM, Yadav Sapkota <ysapkota@ualberta.ca>wrote: > >> Hi, >> >> I tried this function and it works fine for me however, I could not find >> any option to define the upstream and downstream size while looking for >> neighboring genes. >> >> For example, I wanted to only query genes that are located 100 KB upstream >> and 50 KB downstream from a chromosomal range. Any hint would be >> appreciated. Below is the code I am using for demo: >> >> source("http://bioconductor.org/biocLite.R") >> biocLite("ChIPpeakAnno") >> library("ChIPpeakAnno") >> data(myPeakList) >> data(TSS.human.GRCh37) >> annotatedPeak = annotatePeakInBatch (myPeakList[1:6,], AnnotationData = >> TSS.human.GRCh37) >> >> --Yadav >> >> >> >> >> On Thu, May 17, 2012 at 10:52 AM, Zhu, Lihua (Julie) < >> Julie.Zhu@umassmed.edu >> > wrote: >> >> > Yadav, >> > >> > You could try annotatePeakInBatch in ChIPpeakAnno package. >> > >> > Best regards, >> > >> > Julie >> > >> > >> > On 5/17/12 12:40 PM, "Yadav Sapkota" <ysapkota@ualberta.ca> wrote: >> > >> > > Hi, >> > > >> > > Is there any tool or package that can query neighboring genes (lets >> say >> > 100 >> > > KB upstream and downstream) of a specific chromosomal range >> > (chr1:100-200)? >> > > I have thousands of these chromosomal ranges. >> > > >> > > You help will be greatly appreciated. >> > > >> > > Regards, >> > > Yadav Sapkota >> > > Uni of Alberta >> > > >> > > [[alternative HTML version deleted]] >> > > >> > > _______________________________________________ >> > > Bioconductor mailing list >> > > Bioconductor@r-project.org >> > > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > > Search the archives: >> > > http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> > >> > >> >> >> -- >> Yadav Sapkota >> PhD Candidate >> Dept. of Laboratory Medicine and Pathology, UofA >> Cross Cancer Institute >> 11560 University Avenue, Edmonton AB T6G 1Z2 >> Phone: 780-5778092 >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > -- > *A model is a lie that helps you see the truth.* > * > * > Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> > > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Yadav, Please set parameters output = "both", maxgap = 100000, select="all" followed by filtering the results further using a combination of insideFeature and shortestDistance. For detailed parameter setting, please type help(annotatePeakInBatch) in a R session. For example, the following will return peaks located within 5kb downstream of a gene. annotatedPeak[!is.na(annotatedPeak$insideFeature) & annotatedPeak$insideFeature == "downstream" & !is.na(annotatedPeak$shortestDistance) & annotatedPeak$shortestDistance <=5000,] Best regards, Julie On 5/17/12 4:17 PM, "Yadav Sapkota" <ysapkota at="" ualberta.ca=""> wrote: > Hi, > > I tried this function and it works fine for me however, I could not find any > option to define the upstream and downstream size while looking for > neighboring genes. > > For example, I wanted to only query genes that are located 100 KB upstream and > 50 KB downstream from a chromosomal range. Any hint would be appreciated. > Below is the code I am using for demo: > > source("http://bioconductor.org/biocLite.R") > biocLite("ChIPpeakAnno") > library("ChIPpeakAnno") > data(myPeakList) > data(TSS.human.GRCh37) > annotatedPeak = annotatePeakInBatch (myPeakList[1:6,], AnnotationData = > TSS.human.GRCh37) > > --Yadav > > > > > On Thu, May 17, 2012 at 10:52 AM, Zhu, Lihua (Julie) <julie.zhu at="" umassmed.edu=""> > wrote: >> Yadav, >> >> You could try annotatePeakInBatch in ChIPpeakAnno package. >> >> Best regards, >> >> Julie >> >> >> On 5/17/12 12:40 PM, "Yadav Sapkota" <ysapkota at="" ualberta.ca=""> wrote: >> >>> Hi, >>> >>> Is there any tool or package that can query neighboring genes (lets say 100 >>> KB upstream and downstream) of a specific chromosomal range (chr1:100-200)? >>> I have thousands of these chromosomal ranges. >>> >>> You help will be greatly appreciated. >>> >>> Regards, >>> Yadav Sapkota >>> Uni of Alberta >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > >
ADD REPLY
0
Entering edit mode
Hi, Many thanks for pointers. Actually everything worked fine when I tried to perform GO analysis for the genes which are within 20 KB upstream and 10 KB downstream of a given chromosomal region (CNV in my case) except GO analysis resulted in null. I used following scripts: source("http://bioconductor.org/biocLite.R") biocLite("ChIPpeakAnno") biocLite("GenomicRanges") library("ChIPpeakAnno") library('GenomicRanges') library(org.Hs.eg.db) data=read.csv('C:/test.csv', header=T, sep=',') myCNVdata=RangedData(IRanges(start=data$Start, end=data$End), space=(data$Chr)) data(TSS.human.GRCh37) annotatedPeak = annotatePeakInBatch (myCNVdata, AnnotationData = TSS.human.GRCh37, output='both', maxgap=20000, select='all') final_result=annotatedPeak[!c(annotatedPeak$insideFeature == 'upstream' & annotatedPeak$shortestDistance>10000),] enrichedGO <- getEnrichedGO (final_result, orgAnn="org.Hs.eg.db", maxP=0.05, multiAdj =TRUE, minGOterm=1, multiAdjMethod="BH") But the enrichedGO has nothing except the column headings. When I tried do do for first 5 rows(final_result[1:5,]), it works but when I do for all the rows in "final_result" (333), it does not yield anything. I do not get any error messages either. Any idea on where am I doing wrong? --Yadav On Thu, May 17, 2012 at 5:05 PM, Zhu, Lihua (Julie) <julie.zhu@umassmed.edu>wrote: > Yadav, > > Please set parameters output = "both", maxgap = 100000, select="all" > followed by filtering the results further using a combination of > insideFeature and shortestDistance. For detailed parameter setting, please > type help(annotatePeakInBatch) in a R session. > > For example, the following will return peaks located within 5kb downstream > of a gene. > > annotatedPeak[!is.na(annotatedPeak$insideFeature) & > annotatedPeak$insideFeature == "downstream" & > !is.na(annotatedPeak$shortestDistance) & annotatedPeak$shortestDistance > <=5000,] > > Best regards, > > Julie > > > On 5/17/12 4:17 PM, "Yadav Sapkota" <ysapkota@ualberta.ca> wrote: > > > Hi, > > > > I tried this function and it works fine for me however, I could not find > any > > option to define the upstream and downstream size while looking for > > neighboring genes. > > > > For example, I wanted to only query genes that are located 100 KB > upstream and > > 50 KB downstream from a chromosomal range. Any hint would be appreciated. > > Below is the code I am using for demo: > > > > source("http://bioconductor.org/biocLite.R") > > biocLite("ChIPpeakAnno") > > library("ChIPpeakAnno") > > data(myPeakList) > > data(TSS.human.GRCh37) > > annotatedPeak = annotatePeakInBatch (myPeakList[1:6,], AnnotationData = > > TSS.human.GRCh37) > > > > --Yadav > > > > > > > > > > On Thu, May 17, 2012 at 10:52 AM, Zhu, Lihua (Julie) < > Julie.Zhu@umassmed.edu> > > wrote: > >> Yadav, > >> > >> You could try annotatePeakInBatch in ChIPpeakAnno package. > >> > >> Best regards, > >> > >> Julie > >> > >> > >> On 5/17/12 12:40 PM, "Yadav Sapkota" <ysapkota@ualberta.ca> wrote: > >> > >>> Hi, > >>> > >>> Is there any tool or package that can query neighboring genes (lets > say 100 > >>> KB upstream and downstream) of a specific chromosomal range > (chr1:100-200)? > >>> I have thousands of these chromosomal ranges. > >>> > >>> You help will be greatly appreciated. > >>> > >>> Regards, > >>> Yadav Sapkota > >>> Uni of Alberta > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> _______________________________________________ > >>> Bioconductor mailing list > >>> Bioconductor@r-project.org > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> Search the archives: > >>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > >> > > > > > > > -- Yadav Sapkota PhD Candidate Dept. of Laboratory Medicine and Pathology, UofA Cross Cancer Institute 11560 University Avenue, Edmonton AB T6G 1Z2 Phone: 780-5778092 [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Yadav, Try this and see if you get anything. If you do, that means you need to set the filtering criteria maxP less stringent. enrichedGO <- getEnrichedGO (final_result, orgAnn="org.Hs.eg.db", maxP=1, multiAdj =TRUE, minGOterm=1, multiAdjMethod="BH") Best regards, Julie On 5/22/12 4:20 PM, "Yadav Sapkota" <ysapkota@ualberta.ca> wrote: Hi, Many thanks for pointers. Actually everything worked fine when I tried to perform GO analysis for the genes which are within 20 KB upstream and 10 KB downstream of a given chromosomal region (CNV in my case) except GO analysis resulted in null. I used following scripts: source("http://bioconductor.org/biocLite.R") biocLite("ChIPpeakAnno") biocLite("GenomicRanges") library("ChIPpeakAnno") library('GenomicRanges') library(org.Hs.eg.db) data=read.csv('C:/test.csv', header=T, sep=',') myCNVdata=RangedData(IRanges(start=data$Start, end=data$End), space=(data$Chr)) data(TSS.human.GRCh37) annotatedPeak = annotatePeakInBatch (myCNVdata, AnnotationData = TSS.human.GRCh37, output='both', maxgap=20000, select='all') final_result=annotatedPeak[!c(annotatedPeak$insideFeature == 'upstream' & annotatedPeak$shortestDistance>10000),] enrichedGO <- getEnrichedGO (final_result, orgAnn="org.Hs.eg.db", maxP=0.05, multiAdj =TRUE, minGOterm=1, multiAdjMethod="BH") But the enrichedGO has nothing except the column headings. When I tried do do for first 5 rows(final_result[1:5,]), it works but when I do for all the rows in "final_result" (333), it does not yield anything. I do not get any error messages either. Any idea on where am I doing wrong? --Yadav On Thu, May 17, 2012 at 5:05 PM, Zhu, Lihua (Julie) <julie.zhu@umassmed.edu> wrote: Yadav, Please set parameters output = "both", maxgap = 100000, select="all" followed by filtering the results further using a combination of insideFeature and shortestDistance. For detailed parameter setting, please type help(annotatePeakInBatch) in a R session. For example, the following will return peaks located within 5kb downstream of a gene. annotatedPeak[!is.na <http: is.na=""> (annotatedPeak$insideFeature) & annotatedPeak$insideFeature == "downstream" & !is.na <http: is.na=""> (annotatedPeak$shortestDistance) & annotatedPeak$shortestDistance <=5000,] Best regards, Julie On 5/17/12 4:17 PM, "Yadav Sapkota" <ysapkota@ualberta.ca> wrote: > Hi, > > I tried this function and it works fine for me however, I could not find any > option to define the upstream and downstream size while looking for > neighboring genes. > > For example, I wanted to only query genes that are located 100 KB upstream and > 50 KB downstream from a chromosomal range. Any hint would be appreciated. > Below is the code I am using for demo: > > source("http://bioconductor.org/biocLite.R") > biocLite("ChIPpeakAnno") > library("ChIPpeakAnno") > data(myPeakList) > data(TSS.human.GRCh37) > annotatedPeak = annotatePeakInBatch (myPeakList[1:6,], AnnotationData = > TSS.human.GRCh37) > > --Yadav > > > > > On Thu, May 17, 2012 at 10:52 AM, Zhu, Lihua (Julie) <julie.zhu@umassmed.edu> > wrote: >> Yadav, >> >> You could try annotatePeakInBatch in ChIPpeakAnno package. >> >> Best regards, >> >> Julie >> >> >> On 5/17/12 12:40 PM, "Yadav Sapkota" <ysapkota@ualberta.ca> wrote: >> >>> Hi, >>> >>> Is there any tool or package that can query neighboring genes (lets say 100 >>> KB upstream and downstream) of a specific chromosomal range (chr1:100-200)? >>> I have thousands of these chromosomal ranges. >>> >>> You help will be greatly appreciated. >>> >>> Regards, >>> Yadav Sapkota >>> Uni of Alberta >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 539 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6