IlluminaHumanMethylation450k.db: Missing probes in IlluminaHumanMethylation450kPROBELOCATION function?
2
0
Entering edit mode
Simone ▴ 190
@simone-5854
Last seen 6.6 years ago
Hello! A little question: I've got a table with beta values obtained by Illumina's 450K BeadChip microarray and want to know for each probe where in the gene it is located (mainly promoter region or gene body). I found the IlluminaHumanMethylation450kPROBELOCATION function in the IlluminaHumanMethylation450k.db package which seems to do what I want, but not for all the probes. In detail, I have got data for 473,029 probes, but the function only returns values for 354,770 unique probes, so 118,259 ones are missing. Is this expected behaviour? Should I use another package to get location information for all the probes contained in my file? And secondly: As the IlluminaHumanMethylation450k.db package seems to be deprecated, on which build of the genome is the IlluminaHumanMethylation450kPROBELOCATION information based? Because I see that in the package there are some separate functions for build 36 respectively 37, but in the help of the function in question I can only find the information that mappings are based on data of Illumina from January 2011 (which should then be GRCh37, the build I have to work with in this case, but however I am not completely sure as it doesn't say it clearly). Best regards, Simone > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] sqldf_0.4-6.4 RSQLite.extfuns_0.0.1 [3] chron_2.3-43 gsubfn_0.6-5 [5] proto_0.3-10 RColorBrewer_1.0-5 [7] illuminaHumanv2.db_1.18.0 IlluminaHumanMethylation450k.db_2.0.7 [9] IlluminaHumanMethylation27k.db_1.4.7 org.Hs.eg.db_2.9.0 [11] RSQLite_0.11.4 DBI_0.2-7 [13] AnnotationDbi_1.22.6 affy_1.38.1 [15] Biobase_2.20.0 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] affyio_1.28.0 AnnotationForge_1.2.1 BiocInstaller_1.10.2 IRanges_1.18.1 [5] preprocessCore_1.22.0 stats4_3.0.1 tcltk_3.0.1 tools_3.0.1 [9] zlibbioc_1.6.0
Microarray illuminaHumanv2 Microarray illuminaHumanv2 • 2.9k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States
Hi Simone, On 7/1/2013 1:35 PM, Simone wrote: > Hello! > > A little question: I've got a table with beta values obtained by > Illumina's 450K BeadChip microarray and want to know for each probe > where in the gene it is located (mainly promoter region or gene body). > I found the IlluminaHumanMethylation450kPROBELOCATION function in the > IlluminaHumanMethylation450k.db package which seems to do what I want, > but not for all the probes. > In detail, I have got data for 473,029 probes, but the function only > returns values for 354,770 unique probes, so 118,259 ones are missing. > Is this expected behaviour? Should I use another package to get > location information for all the probes contained in my file? You would be better off using the FDb.InfiniumMethylation.hg19 package, which not only has all the locations for the probes, but also has them in a more useful format. > library(FDb.InfiniumMethylation.hg19) > x <- get450k() Warning message: In if is.na(genome(GR))) { : the condition has length > 1 and only the first element will be used > x GRanges with 485577 ranges and 7 metadata columns: seqnames ranges strand | addressA addressB channel <rle> <iranges> <rle> | <rle> <rle> <rle> cg13869341 chr1 [15865, 15866] * | 62703328 16661461 Red cg14008030 chr1 [18827, 18828] * | 27651330 <na> Both cg12045430 chr1 [29407, 29408] * | 25703424 34666387 Red cg20826792 chr1 [29425, 29426] * | 61731400 14693326 Red cg00381604 chr1 [29435, 29436] * | 26752380 50693408 Red You can do all kinds of cool things with a GRanges object that you cannot do with simple location data. But this comes at a cost of complexity, so you will need to do some reading. I would recommend at a minimum that you read the vignettes for GenomicFeatures, and look at the help page for this package (?FDb.InfiniumMethylation.hg19). And the current build is based on only hg19 (GRCh37), so there is no issue of which build you are getting. Best, Jim > > And secondly: As the IlluminaHumanMethylation450k.db package seems to > be deprecated, on which build of the genome is the > IlluminaHumanMethylation450kPROBELOCATION information based? Because I > see that in the package there are some separate functions for build 36 > respectively 37, but in the help of the function in question I can > only find the information that mappings are based on data of Illumina > from January 2011 (which should then be GRCh37, the build I have to > work with in this case, but however I am not completely sure as it > doesn't say it clearly). > > Best regards, > Simone > >> sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > LC_TIME=en_US.UTF-8 > [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 > LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > LC_ADDRESS=C > [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 > LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] sqldf_0.4-6.4 RSQLite.extfuns_0.0.1 > [3] chron_2.3-43 gsubfn_0.6-5 > [5] proto_0.3-10 RColorBrewer_1.0-5 > [7] illuminaHumanv2.db_1.18.0 IlluminaHumanMethylation450k.db_2.0.7 > [9] IlluminaHumanMethylation27k.db_1.4.7 org.Hs.eg.db_2.9.0 > [11] RSQLite_0.11.4 DBI_0.2-7 > [13] AnnotationDbi_1.22.6 affy_1.38.1 > [15] Biobase_2.20.0 BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] affyio_1.28.0 AnnotationForge_1.2.1 BiocInstaller_1.10.2 > IRanges_1.18.1 > [5] preprocessCore_1.22.0 stats4_3.0.1 tcltk_3.0.1 > tools_3.0.1 > [9] zlibbioc_1.6.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
Dear James, > You would be better off using the FDb.InfiniumMethylation.hg19 package, > which not only has all the locations for the probes, but also has them in a > more useful format. > >> library(FDb.InfiniumMethylation.hg19) >> x <- get450k() Thanks for the suggestion. For some reason this does not work here. I get the following error: Error in `seqinfo<-`(`*tmp*`, value = NULL) : the supplied 'seqinfo' must be a Seqinfo object In addition: Warning message: In if is.na(genome(GR))) { : the condition has length > 1 and only the first element will be used > GRanges with 485577 ranges and 7 metadata columns: > seqnames ranges strand | addressA addressB > channel > <rle> <iranges> <rle> | <rle> <rle> <rle> > cg13869341 chr1 [15865, 15866] * | 62703328 16661461 > Red > cg14008030 chr1 [18827, 18828] * | 27651330 <na> Both > cg12045430 chr1 [29407, 29408] * | 25703424 34666387 > Red > cg20826792 chr1 [29425, 29426] * | 61731400 14693326 > Red > cg00381604 chr1 [29435, 29436] * | 26752380 50693408 > Red Would there also be a possibility to get directly the information where in the gene the probes are located? For example, when I use IlluminaHumanMethylation450kPROBELOCATION I get information like this: probe_id transcript_location 1 cg00035864 NR_001550:TSS1500 2 cg00050873 NM_001164471:Body 3 cg00050873 NR_001553:TSS1500 4 cg00061679 NM_004081:Body 5 cg00061679 NM_020420:Body 6 cg00061679 NM_001005375:Body >From this I can easily extract the corresponding annotations (1stExon, 3'UTR, 5'UTR, Body, TSS1500, TSS200). In your sample above I do not see anything like this, but I read something about annotateGRanges and annotateDMRInfo, probably they would help, but because of the problem above I still could not try. In the vignette you mentioned I couldn't find an example retrieving this information at first sight, but I'll have a closer look on it. Best wishes, Simone
ADD REPLY
0
Entering edit mode
Tim Triche ★ 4.2k
@tim-triche-3561
Last seen 4.4 years ago
United States
You will need to use toggleProbes() to get all probe locations (this is an undesirable side effect of the db0 package infrastructure, which is geared towards expression arrays where probes that map to multiple accessions are a "bad thing", and that is the reason I deprecated the 27k.db and 450k.db packages). The FDb.InfiniumMethylation packages have the locations of the targeted CpGs (and CpHs, and SNPs) for the corresponding build (just for the record, the genome build targeted by 450k.db is hg19/GRch37), but they are returned using get450k() or get27k() as a GRanges object. In the near future I am going to suggest that people migrate to minfi for processing, as the terminal data structure emitted by recent versions of 'minfi' is also based around a GRanges representation for the probes (it is a subclass of SummarizedExperiment, which I have also been working on scaling up to handle thousands of samples at a time). Essentially, the mistake I made in compiling the 27k.db and 450k.db packages was to use an expression-centric (or gene- and transcript- centric) infrastructure to represent probes targeting individual loci. It's more useful to think of the 450k array, in particular, as a SNP array, with functions for a given target locus inferred after discovery. It's also more useful to think of the chip as a SNP array when considering differentially methylated regions, which could be loosely regarded as the methylation equivalent of copy number variations. Last but not least, you can in fact use the array to estimate copy number along the genome, as for determining sex from chrX copy number. On Mon, Jul 1, 2013 at 10:35 AM, Simone <enomis.bioc@gmail.com> wrote: > Hello! > > A little question: I've got a table with beta values obtained by > Illumina's 450K BeadChip microarray and want to know for each probe > where in the gene it is located (mainly promoter region or gene body). > I found the IlluminaHumanMethylation450kPROBELOCATION function in the > IlluminaHumanMethylation450k.db package which seems to do what I want, > but not for all the probes. > In detail, I have got data for 473,029 probes, but the function only > returns values for 354,770 unique probes, so 118,259 ones are missing. > Is this expected behaviour? Should I use another package to get > location information for all the probes contained in my file? > > And secondly: As the IlluminaHumanMethylation450k.db package seems to > be deprecated, on which build of the genome is the > IlluminaHumanMethylation450kPROBELOCATION information based? Because I > see that in the package there are some separate functions for build 36 > respectively 37, but in the help of the function in question I can > only find the information that mappings are based on data of Illumina > from January 2011 (which should then be GRCh37, the build I have to > work with in this case, but however I am not completely sure as it > doesn't say it clearly). > > Best regards, > Simone > > > sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > LC_TIME=en_US.UTF-8 > [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 > LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > LC_ADDRESS=C > [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 > LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] sqldf_0.4-6.4 RSQLite.extfuns_0.0.1 > [3] chron_2.3-43 gsubfn_0.6-5 > [5] proto_0.3-10 RColorBrewer_1.0-5 > [7] illuminaHumanv2.db_1.18.0 > IlluminaHumanMethylation450k.db_2.0.7 > [9] IlluminaHumanMethylation27k.db_1.4.7 org.Hs.eg.db_2.9.0 > [11] RSQLite_0.11.4 DBI_0.2-7 > [13] AnnotationDbi_1.22.6 affy_1.38.1 > [15] Biobase_2.20.0 BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] affyio_1.28.0 AnnotationForge_1.2.1 BiocInstaller_1.10.2 > IRanges_1.18.1 > [5] preprocessCore_1.22.0 stats4_3.0.1 tcltk_3.0.1 > tools_3.0.1 > [9] zlibbioc_1.6.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Dear All, I would like to formally introduce to you the featureCounts function included in the Rsubread package. featureCounts is R function designed for summarizing sequencing reads to genomic features such as genes, exons and promoters. It is a light-weight general-purpose read counting program (essentially written in C), and it has the following features: (1) It performs precise read assignments by taking care of indels, junctions and fusions in the reads. (2) It takes less than 4 minutes to summarize 20 million pairs of reads to 26k RefSeq genes using one thread, and only uses 40MB of memory (you can even run it on a Mac laptop). (3) It supports multi-threaded running. (4) It supports GTF format annotation and SAM/BAM read data. (5) It supports strand-specific read summarization. (6) It can perform read summarization at both feature level (eg. exons) and meta-feature level (eg. genes). (7) It allows users to specify whether reads overlapping with more than one feature should be counted or not. (8) It gives users full control on the summarization of paired-end reads, including allowing them to check if both ends are mapped and/or if the paired-end distances satisfy the distance criteria. (9) It discriminates the features, which were overlapped by both ends from the same fragment, from those which were overlapped by only one end so as to get more fragments counted. (10) It allows users to specify whether chimeric fragments should be counted. (11) It can exclude multi-mapping reads and reads with low mapping quality scores from summarization. To use this function, make sure you are using the latest version of Rsubread (1.10.5 in the release branch). A technical report for featureCounts can be found here - http://arxiv.org/abs/1305.3347. You may also refer to the Rsubread users guide for some details about this function (typing 'RsubreadUsersGuide()' in your R session). To see how featureCounts can be used in an RNA-seq analysis pipeline, you may have a look at this case study - http://bioinf.wehi.edu.au/RNAseqCaseStudy . This case study will also be used in a Workshop in the incoming Bioc2013 meeting. Hope you find it useful. Best wishes, Wei ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}
ADD REPLY
0
Entering edit mode
Dear Wei I recently gave a try to Rsubread/featureCounts() but found that featureCounts only supports mm9, mm10 and hg19 assemblies (using the XXX_RefSeq_exon.txt files with XXX being one of the assemblies mentioned). Would it be possible to modify this function to obtain this information from Bioconductor annotation packages? That would allow anyone to use mm8, hg18 or use featureCounts() with other species if needed and the corresponding data is present. A similar interface to generate the index files from the genome sequences (from the Bioconductor genome annotation packages) for the alignment step would be extremely useful. Best wishes, Diego On Tue, Jul 2, 2013 at 9:02 AM, Wei Shi <shi@wehi.edu.au> wrote: > Dear All, > > I would like to formally introduce to you the featureCounts function > included in the Rsubread package. featureCounts is R function designed for > summarizing sequencing reads to genomic features such as genes, exons and > promoters. It is a light-weight general-purpose read counting program > (essentially written in C), and it has the following features: > (1) It performs precise read assignments by taking care of indels, > junctions and fusions in the reads. > (2) It takes less than 4 minutes to summarize 20 million pairs of reads to > 26k RefSeq genes using one thread, and only uses 40MB of memory (you can > even run it on a Mac laptop). > (3) It supports multi-threaded running. > (4) It supports GTF format annotation and SAM/BAM read data. > (5) It supports strand-specific read summarization. > (6) It can perform read summarization at both feature level (eg. exons) > and meta-feature level (eg. genes). > (7) It allows users to specify whether reads overlapping with more than > one feature should be counted or not. > (8) It gives users full control on the summarization of paired-end reads, > including allowing them to check if both ends are mapped and/or if the > paired-end distances satisfy the distance criteria. > (9) It discriminates the features, which were overlapped by both ends from > the same fragment, from those which were overlapped by only one end so as > to get more fragments counted. > (10) It allows users to specify whether chimeric fragments should be > counted. > (11) It can exclude multi-mapping reads and reads with low mapping quality > scores from summarization. > > To use this function, make sure you are using the latest version of > Rsubread (1.10.5 in the release branch). > > A technical report for featureCounts can be found here - > http://arxiv.org/abs/1305.3347. You may also refer to the Rsubread users > guide for some details about this function (typing 'RsubreadUsersGuide()' > in your R session). > > To see how featureCounts can be used in an RNA-seq analysis pipeline, you > may have a look at this case study - > http://bioinf.wehi.edu.au/RNAseqCaseStudy . This case study will also be > used in a Workshop in the incoming Bioc2013 meeting. > > Hope you find it useful. > > Best wishes, > > Wei > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:13}}
ADD REPLY
0
Entering edit mode
Dear Diego, You can provide your own annotation to featureCounts via the 'annot' parameter. Have a look at the help page of featureCounts, which describes the formats of external annotations allowed by featureCounts. I imagine it wouldn't be too hard to generate a data frame from a bioc annotation and then feed it to featureCounts. Best wishes, Wei On Jul 2, 2013, at 12:31 PM, Diego Diez wrote: > Dear Wei > > I recently gave a try to Rsubread/featureCounts() but found that featureCounts only supports mm9, mm10 and hg19 assemblies (using the XXX_RefSeq_exon.txt files with XXX being one of the assemblies mentioned). Would it be possible to modify this function to obtain this information from Bioconductor annotation packages? That would allow anyone to use mm8, hg18 or use featureCounts() with other species if needed and the corresponding data is present. A similar interface to generate the index files from the genome sequences (from the Bioconductor genome annotation packages) for the alignment step would be extremely useful. > > Best wishes, > Diego > > > On Tue, Jul 2, 2013 at 9:02 AM, Wei Shi <shi@wehi.edu.au> wrote: > Dear All, > > I would like to formally introduce to you the featureCounts function included in the Rsubread package. featureCounts is R function designed for summarizing sequencing reads to genomic features such as genes, exons and promoters. It is a light-weight general-purpose read counting program (essentially written in C), and it has the following features: > (1) It performs precise read assignments by taking care of indels, junctions and fusions in the reads. > (2) It takes less than 4 minutes to summarize 20 million pairs of reads to 26k RefSeq genes using one thread, and only uses 40MB of memory (you can even run it on a Mac laptop). > (3) It supports multi-threaded running. > (4) It supports GTF format annotation and SAM/BAM read data. > (5) It supports strand-specific read summarization. > (6) It can perform read summarization at both feature level (eg. exons) and meta-feature level (eg. genes). > (7) It allows users to specify whether reads overlapping with more than one feature should be counted or not. > (8) It gives users full control on the summarization of paired-end reads, including allowing them to check if both ends are mapped and/or if the paired-end distances satisfy the distance criteria. > (9) It discriminates the features, which were overlapped by both ends from the same fragment, from those which were overlapped by only one end so as to get more fragments counted. > (10) It allows users to specify whether chimeric fragments should be counted. > (11) It can exclude multi-mapping reads and reads with low mapping quality scores from summarization. > > To use this function, make sure you are using the latest version of Rsubread (1.10.5 in the release branch). > > A technical report for featureCounts can be found here - http://arxiv.org/abs/1305.3347. You may also refer to the Rsubread users guide for some details about this function (typing 'RsubreadUsersGuide()' in your R session). > > To see how featureCounts can be used in an RNA-seq analysis pipeline, you may have a look at this case study - http://bioinf.wehi.edu.au/RNAseqCaseStudy . This case study will also be used in a Workshop in the incoming Bioc2013 meeting. > > Hope you find it useful. > > Best wishes, > > Wei > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:20}}
ADD REPLY
0
Entering edit mode
hey I just remembered why you (Wei) put my name on the package in the first place ;-) require(Rsubread) ?createAnnotationFile require(TxDb.Hsapiens.UCSC.hg18.knownGene) createAnnotationFile(transcripts(TxDb.Hsapiens.UCSC.hg18.knownGene)) ## ## Your GRanges elementMetadata must contain a column called "id". ## Error in createAnnotationFile(transcripts(TxDb.Hsapiens.UCSC.hg18.knownGene)) : ## Aborting. ## ## ...oops, looks like I need to write another patch in order for this to "do what you mean"... ## ...meanwhile txs <- transcripts(TxDb.Hsapiens.UCSC.hg18.knownGene) txs$id <- txs$tx_id ## doh createAnnotationFile(txs, fname='transcripts.hg18.txt') ## Annotations were sucessfully written to transcripts.hg18.txt Alright this could be a little smoother, that said, both the "generate an index from a BSgenome" and "generate new annotations from a TxDb, ideally as an installable package" requests are sensible. Me personally, my biggest request is to make QuasR work with subread, so that I can do idiotic things like realign bisulfite-converted genomes on my laptop. Not because it's a good idea, but because I can :-) On Tue, Jul 2, 2013 at 3:27 AM, Wei Shi <shi@wehi.edu.au> wrote: > Dear Diego, > > You can provide your own annotation to featureCounts via the 'annot' > parameter. Have a look at the help page of featureCounts, which describes > the formats of external annotations allowed by featureCounts. I imagine it > wouldn't be too hard to generate a data frame from a bioc annotation and > then feed it to featureCounts. > > Best wishes, > > Wei > > > > On Jul 2, 2013, at 12:31 PM, Diego Diez wrote: > > > Dear Wei > > > > I recently gave a try to Rsubread/featureCounts() but found that > featureCounts only supports mm9, mm10 and hg19 assemblies (using the > XXX_RefSeq_exon.txt files with XXX being one of the assemblies mentioned). > Would it be possible to modify this function to obtain this information > from Bioconductor annotation packages? That would allow anyone to use mm8, > hg18 or use featureCounts() with other species if needed and the > corresponding data is present. A similar interface to generate the index > files from the genome sequences (from the Bioconductor genome annotation > packages) for the alignment step would be extremely useful. > > > > Best wishes, > > Diego > > > > > > On Tue, Jul 2, 2013 at 9:02 AM, Wei Shi <shi@wehi.edu.au> wrote: > > Dear All, > > > > I would like to formally introduce to you the featureCounts function > included in the Rsubread package. featureCounts is R function designed for > summarizing sequencing reads to genomic features such as genes, exons and > promoters. It is a light-weight general-purpose read counting program > (essentially written in C), and it has the following features: > > (1) It performs precise read assignments by taking care of indels, > junctions and fusions in the reads. > > (2) It takes less than 4 minutes to summarize 20 million pairs of reads > to 26k RefSeq genes using one thread, and only uses 40MB of memory (you can > even run it on a Mac laptop). > > (3) It supports multi-threaded running. > > (4) It supports GTF format annotation and SAM/BAM read data. > > (5) It supports strand-specific read summarization. > > (6) It can perform read summarization at both feature level (eg. exons) > and meta-feature level (eg. genes). > > (7) It allows users to specify whether reads overlapping with more than > one feature should be counted or not. > > (8) It gives users full control on the summarization of paired-end > reads, including allowing them to check if both ends are mapped and/or if > the paired-end distances satisfy the distance criteria. > > (9) It discriminates the features, which were overlapped by both ends > from the same fragment, from those which were overlapped by only one end so > as to get more fragments counted. > > (10) It allows users to specify whether chimeric fragments should be > counted. > > (11) It can exclude multi-mapping reads and reads with low mapping > quality scores from summarization. > > > > To use this function, make sure you are using the latest version of > Rsubread (1.10.5 in the release branch). > > > > A technical report for featureCounts can be found here - > http://arxiv.org/abs/1305.3347. You may also refer to the Rsubread users > guide for some details about this function (typing 'RsubreadUsersGuide()' > in your R session). > > > > To see how featureCounts can be used in an RNA-seq analysis pipeline, > you may have a look at this case study - > http://bioinf.wehi.edu.au/RNAseqCaseStudy . This case study will also be > used in a Workshop in the incoming Bioc2013 meeting. > > > > Hope you find it useful. > > > > Best wishes, > > > > Wei > > > > ______________________________________________________________________ > > The information in this email is confidential and inte...{{dropped:20}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
In fact, now that I think about it, if I just make it so that the createAnnotationFile function instead does something like createAnnotation() and returns an appropriate target for countFeatures, then things should be A-OK. Likewise for the index creation, although that's really something you (WEHI guys) should do so that I don't @$%# it up. But QuasR-Rsubread integration is still extremely high on the list of "would be awesome" features. As would "committing the patches I sent that extremely busy group," although I'm hardly one to talk about punctuality. Still -- for the record, and I think the Bowtie guys will verify this -- subread is Much Faster. On Tue, Jul 2, 2013 at 10:37 AM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > hey I just remembered why you (Wei) put my name on the package in the > first place ;-) > > require(Rsubread) > ?createAnnotationFile > > require(TxDb.Hsapiens.UCSC.hg18.knownGene) > createAnnotationFile(transcripts(TxDb.Hsapiens.UCSC.hg18.knownGene)) > ## > ## Your GRanges elementMetadata must contain a column called "id". > ## Error in > createAnnotationFile(transcripts(TxDb.Hsapiens.UCSC.hg18.knownGene)) : > ## Aborting. > ## > ## ...oops, looks like I need to write another patch in order for this to > "do what you mean"... > ## ...meanwhile > > txs <- transcripts(TxDb.Hsapiens.UCSC.hg18.knownGene) > txs$id <- txs$tx_id ## doh > createAnnotationFile(txs, fname='transcripts.hg18.txt') > ## Annotations were sucessfully written to transcripts.hg18.txt > > Alright this could be a little smoother, that said, both the "generate an > index from a BSgenome" and "generate new annotations from a TxDb, ideally > as an installable package" requests are sensible. > > Me personally, my biggest request is to make QuasR work with subread, so > that I can do idiotic things like realign bisulfite-converted genomes on my > laptop. Not because it's a good idea, but because I can :-) > > > > > > On Tue, Jul 2, 2013 at 3:27 AM, Wei Shi <shi@wehi.edu.au> wrote: > >> Dear Diego, >> >> You can provide your own annotation to featureCounts via the 'annot' >> parameter. Have a look at the help page of featureCounts, which describes >> the formats of external annotations allowed by featureCounts. I imagine it >> wouldn't be too hard to generate a data frame from a bioc annotation and >> then feed it to featureCounts. >> >> Best wishes, >> >> Wei >> >> >> >> On Jul 2, 2013, at 12:31 PM, Diego Diez wrote: >> >> > Dear Wei >> > >> > I recently gave a try to Rsubread/featureCounts() but found that >> featureCounts only supports mm9, mm10 and hg19 assemblies (using the >> XXX_RefSeq_exon.txt files with XXX being one of the assemblies mentioned). >> Would it be possible to modify this function to obtain this information >> from Bioconductor annotation packages? That would allow anyone to use mm8, >> hg18 or use featureCounts() with other species if needed and the >> corresponding data is present. A similar interface to generate the index >> files from the genome sequences (from the Bioconductor genome annotation >> packages) for the alignment step would be extremely useful. >> > >> > Best wishes, >> > Diego >> > >> > >> > On Tue, Jul 2, 2013 at 9:02 AM, Wei Shi <shi@wehi.edu.au> wrote: >> > Dear All, >> > >> > I would like to formally introduce to you the featureCounts function >> included in the Rsubread package. featureCounts is R function designed for >> summarizing sequencing reads to genomic features such as genes, exons and >> promoters. It is a light-weight general-purpose read counting program >> (essentially written in C), and it has the following features: >> > (1) It performs precise read assignments by taking care of indels, >> junctions and fusions in the reads. >> > (2) It takes less than 4 minutes to summarize 20 million pairs of reads >> to 26k RefSeq genes using one thread, and only uses 40MB of memory (you can >> even run it on a Mac laptop). >> > (3) It supports multi-threaded running. >> > (4) It supports GTF format annotation and SAM/BAM read data. >> > (5) It supports strand-specific read summarization. >> > (6) It can perform read summarization at both feature level (eg. exons) >> and meta-feature level (eg. genes). >> > (7) It allows users to specify whether reads overlapping with more than >> one feature should be counted or not. >> > (8) It gives users full control on the summarization of paired- end >> reads, including allowing them to check if both ends are mapped and/or if >> the paired-end distances satisfy the distance criteria. >> > (9) It discriminates the features, which were overlapped by both ends >> from the same fragment, from those which were overlapped by only one end so >> as to get more fragments counted. >> > (10) It allows users to specify whether chimeric fragments should be >> counted. >> > (11) It can exclude multi-mapping reads and reads with low mapping >> quality scores from summarization. >> > >> > To use this function, make sure you are using the latest version of >> Rsubread (1.10.5 in the release branch). >> > >> > A technical report for featureCounts can be found here - >> http://arxiv.org/abs/1305.3347. You may also refer to the Rsubread users >> guide for some details about this function (typing 'RsubreadUsersGuide()' >> in your R session). >> > >> > To see how featureCounts can be used in an RNA-seq analysis pipeline, >> you may have a look at this case study - >> http://bioinf.wehi.edu.au/RNAseqCaseStudy . This case study will also be >> used in a Workshop in the incoming Bioc2013 meeting. >> > >> > Hope you find it useful. >> > >> > Best wishes, >> > >> > Wei >> > >> > ______________________________________________________________________ >> > The information in this email is confidential and inte...{{dropped:20}} >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > -- > *A model is a lie that helps you see the truth.* > * > * > Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Tim - I have just made changes to your createAnnotationFile function to make it compatible with featureCounts since I made some changes to featureCounts recently. Diego - you may try out the latest version of createAnnotationFile function in Rsubread (it should be available to you in ~ 2 days from bioc devel branch) to see if it can help you to easily transform a bioc annotation to an annotation which can be used by feautreCounts. Below is an example for doing this: library(Rsubread) library(TxDb.Hsapiens.UCSC.hg19.lincRNAsTranscripts) hg19LincRNAs <- transcripts(TxDb.Hsapiens.UCSC.hg19.lincRNAsTranscripts) names(values(hg19LincRNAs)) <- gsub('tx_id','id',names(values(hg19LincRNAs))) annot_for_featureCounts <- createAnnotationFile(hg19LincRNAs) fcounts <- featureCounts("your_mapping_results.sam",annot=annot_for_fe atureCounts) Let me know if you run into any problems. Cheers, Wei On Jul 3, 2013, at 3:40 AM, Tim Triche, Jr. wrote: > In fact, now that I think about it, if I just make it so that the createAnnotationFile function instead does something like createAnnotation() and returns an appropriate target for countFeatures, then things should be A-OK. Likewise for the index creation, although that's really something you (WEHI guys) should do so that I don't @$%# it up. > > But QuasR-Rsubread integration is still extremely high on the list of "would be awesome" features. As would "committing the patches I sent that extremely busy group," although I'm hardly one to talk about punctuality. Still -- for the record, and I think the Bowtie guys will verify this -- subread is Much Faster. > > > > > On Tue, Jul 2, 2013 at 10:37 AM, Tim Triche, Jr. <tim.triche@gmail.com> wrote: > hey I just remembered why you (Wei) put my name on the package in the first place ;-) > > require(Rsubread) > ?createAnnotationFile > > require(TxDb.Hsapiens.UCSC.hg18.knownGene) > createAnnotationFile(transcripts(TxDb.Hsapiens.UCSC.hg18.knownGene)) > ## > ## Your GRanges elementMetadata must contain a column called "id". > ## Error in createAnnotationFile(transcripts(TxDb.Hsapiens.UCSC.hg18.knownGene)) : > ## Aborting. > ## > ## ...oops, looks like I need to write another patch in order for this to "do what you mean"... > ## ...meanwhile > > txs <- transcripts(TxDb.Hsapiens.UCSC.hg18.knownGene) > txs$id <- txs$tx_id ## doh > createAnnotationFile(txs, fname='transcripts.hg18.txt') > ## Annotations were sucessfully written to transcripts.hg18.txt > > Alright this could be a little smoother, that said, both the "generate an index from a BSgenome" and "generate new annotations from a TxDb, ideally as an installable package" requests are sensible. > > Me personally, my biggest request is to make QuasR work with subread, so that I can do idiotic things like realign bisulfite- converted genomes on my laptop. Not because it's a good idea, but because I can :-) > > > > > > On Tue, Jul 2, 2013 at 3:27 AM, Wei Shi <shi@wehi.edu.au> wrote: > Dear Diego, > > You can provide your own annotation to featureCounts via the 'annot' parameter. Have a look at the help page of featureCounts, which describes the formats of external annotations allowed by featureCounts. I imagine it wouldn't be too hard to generate a data frame from a bioc annotation and then feed it to featureCounts. > > Best wishes, > > Wei > > > > On Jul 2, 2013, at 12:31 PM, Diego Diez wrote: > > > Dear Wei > > > > I recently gave a try to Rsubread/featureCounts() but found that featureCounts only supports mm9, mm10 and hg19 assemblies (using the XXX_RefSeq_exon.txt files with XXX being one of the assemblies mentioned). Would it be possible to modify this function to obtain this information from Bioconductor annotation packages? That would allow anyone to use mm8, hg18 or use featureCounts() with other species if needed and the corresponding data is present. A similar interface to generate the index files from the genome sequences (from the Bioconductor genome annotation packages) for the alignment step would be extremely useful. > > > > Best wishes, > > Diego > > > > > > On Tue, Jul 2, 2013 at 9:02 AM, Wei Shi <shi@wehi.edu.au> wrote: > > Dear All, > > > > I would like to formally introduce to you the featureCounts function included in the Rsubread package. featureCounts is R function designed for summarizing sequencing reads to genomic features such as genes, exons and promoters. It is a light-weight general-purpose read counting program (essentially written in C), and it has the following features: > > (1) It performs precise read assignments by taking care of indels, junctions and fusions in the reads. > > (2) It takes less than 4 minutes to summarize 20 million pairs of reads to 26k RefSeq genes using one thread, and only uses 40MB of memory (you can even run it on a Mac laptop). > > (3) It supports multi-threaded running. > > (4) It supports GTF format annotation and SAM/BAM read data. > > (5) It supports strand-specific read summarization. > > (6) It can perform read summarization at both feature level (eg. exons) and meta-feature level (eg. genes). > > (7) It allows users to specify whether reads overlapping with more than one feature should be counted or not. > > (8) It gives users full control on the summarization of paired-end reads, including allowing them to check if both ends are mapped and/or if the paired-end distances satisfy the distance criteria. > > (9) It discriminates the features, which were overlapped by both ends from the same fragment, from those which were overlapped by only one end so as to get more fragments counted. > > (10) It allows users to specify whether chimeric fragments should be counted. > > (11) It can exclude multi-mapping reads and reads with low mapping quality scores from summarization. > > > > To use this function, make sure you are using the latest version of Rsubread (1.10.5 in the release branch). > > > > A technical report for featureCounts can be found here - http://arxiv.org/abs/1305.3347. You may also refer to the Rsubread users guide for some details about this function (typing 'RsubreadUsersGuide()' in your R session). > > > > To see how featureCounts can be used in an RNA-seq analysis pipeline, you may have a look at this case study - http://bioinf.wehi.edu.au/RNAseqCaseStudy . This case study will also be used in a Workshop in the incoming Bioc2013 meeting. > > > > Hope you find it useful. > > > > Best wishes, > > > > Wei > > > > ______________________________________________________________________ > > The information in this email is confidential and inte...{{dropped:20}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > A model is a lie that helps you see the truth. > > Howard Skipper > > > > -- > A model is a lie that helps you see the truth. > > Howard Skipper ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:8}}
ADD REPLY
0
Entering edit mode
Hi Tim and Wei, Brilliant! Thank you, I will give it a try and let you know then about the experience. Cheers, Diego On Wed, Jul 3, 2013 at 10:34 AM, Wei Shi <shi@wehi.edu.au> wrote: > Tim - I have just made changes to your createAnnotationFile function to > make it compatible with featureCounts since I made some changes to > featureCounts recently. > > Diego - you may try out the latest version of createAnnotationFile > function in Rsubread (it should be available to you in ~ 2 days from bioc > devel branch) to see if it can help you to easily transform a bioc > annotation to an annotation which can be used by feautreCounts. Below is an > example for doing this: > > library(Rsubread) > library(TxDb.Hsapiens.UCSC.hg19.lincRNAsTranscripts) > hg19LincRNAs <- transcripts(TxDb.Hsapiens.UCSC.hg19.lincRNAsTranscripts) > names(values(hg19LincRNAs)) <- > gsub('tx_id','id',names(values(hg19LincRNAs))) > annot_for_featureCounts <- createAnnotationFile(hg19LincRNAs) > fcounts <- > featureCounts("your_mapping_results.sam",annot=annot_for_featureCoun ts) > > Let me know if you run into any problems. > > Cheers, > Wei > > On Jul 3, 2013, at 3:40 AM, Tim Triche, Jr. wrote: > > In fact, now that I think about it, if I just make it so that > the createAnnotationFile function instead does something > like createAnnotation() and returns an appropriate target for > countFeatures, then things should be A-OK. Likewise for the index > creation, although that's really something you (WEHI guys) should do so > that I don't @$%# it up. > > But QuasR-Rsubread integration is still extremely high on the list of > "would be awesome" features. As would "committing the patches I sent that > extremely busy group," although I'm hardly one to talk about punctuality. > Still -- for the record, and I think the Bowtie guys will verify this -- > subread is Much Faster. > > > > > On Tue, Jul 2, 2013 at 10:37 AM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > >> hey I just remembered why you (Wei) put my name on the package in the >> first place ;-) >> >> require(Rsubread) >> ?createAnnotationFile >> >> require(TxDb.Hsapiens.UCSC.hg18.knownGene) >> createAnnotationFile(transcripts(TxDb.Hsapiens.UCSC.hg18.knownGene)) >> ## >> ## Your GRanges elementMetadata must contain a column called "id". >> ## Error in >> createAnnotationFile(transcripts(TxDb.Hsapiens.UCSC.hg18.knownGene)) : >> ## Aborting. >> ## >> ## ...oops, looks like I need to write another patch in order for this to >> "do what you mean"... >> ## ...meanwhile >> >> txs <- transcripts(TxDb.Hsapiens.UCSC.hg18.knownGene) >> txs$id <- txs$tx_id ## doh >> createAnnotationFile(txs, fname='transcripts.hg18.txt') >> ## Annotations were sucessfully written to transcripts.hg18.txt >> >> Alright this could be a little smoother, that said, both the "generate an >> index from a BSgenome" and "generate new annotations from a TxDb, ideally >> as an installable package" requests are sensible. >> >> Me personally, my biggest request is to make QuasR work with subread, so >> that I can do idiotic things like realign bisulfite-converted genomes on my >> laptop. Not because it's a good idea, but because I can :-) >> >> >> >> >> >> On Tue, Jul 2, 2013 at 3:27 AM, Wei Shi <shi@wehi.edu.au> wrote: >> >>> Dear Diego, >>> >>> You can provide your own annotation to featureCounts via the 'annot' >>> parameter. Have a look at the help page of featureCounts, which describes >>> the formats of external annotations allowed by featureCounts. I imagine it >>> wouldn't be too hard to generate a data frame from a bioc annotation and >>> then feed it to featureCounts. >>> >>> Best wishes, >>> >>> Wei >>> >>> >>> >>> On Jul 2, 2013, at 12:31 PM, Diego Diez wrote: >>> >>> > Dear Wei >>> > >>> > I recently gave a try to Rsubread/featureCounts() but found that >>> featureCounts only supports mm9, mm10 and hg19 assemblies (using the >>> XXX_RefSeq_exon.txt files with XXX being one of the assemblies mentioned). >>> Would it be possible to modify this function to obtain this information >>> from Bioconductor annotation packages? That would allow anyone to use mm8, >>> hg18 or use featureCounts() with other species if needed and the >>> corresponding data is present. A similar interface to generate the index >>> files from the genome sequences (from the Bioconductor genome annotation >>> packages) for the alignment step would be extremely useful. >>> > >>> > Best wishes, >>> > Diego >>> > >>> > >>> > On Tue, Jul 2, 2013 at 9:02 AM, Wei Shi <shi@wehi.edu.au> wrote: >>> > Dear All, >>> > >>> > I would like to formally introduce to you the featureCounts function >>> included in the Rsubread package. featureCounts is R function designed for >>> summarizing sequencing reads to genomic features such as genes, exons and >>> promoters. It is a light-weight general-purpose read counting program >>> (essentially written in C), and it has the following features: >>> > (1) It performs precise read assignments by taking care of indels, >>> junctions and fusions in the reads. >>> > (2) It takes less than 4 minutes to summarize 20 million pairs of >>> reads to 26k RefSeq genes using one thread, and only uses 40MB of memory >>> (you can even run it on a Mac laptop). >>> > (3) It supports multi-threaded running. >>> > (4) It supports GTF format annotation and SAM/BAM read data. >>> > (5) It supports strand-specific read summarization. >>> > (6) It can perform read summarization at both feature level (eg. >>> exons) and meta-feature level (eg. genes). >>> > (7) It allows users to specify whether reads overlapping with more >>> than one feature should be counted or not. >>> > (8) It gives users full control on the summarization of paired- end >>> reads, including allowing them to check if both ends are mapped and/or if >>> the paired-end distances satisfy the distance criteria. >>> > (9) It discriminates the features, which were overlapped by both ends >>> from the same fragment, from those which were overlapped by only one end so >>> as to get more fragments counted. >>> > (10) It allows users to specify whether chimeric fragments should be >>> counted. >>> > (11) It can exclude multi-mapping reads and reads with low mapping >>> quality scores from summarization. >>> > >>> > To use this function, make sure you are using the latest version of >>> Rsubread (1.10.5 in the release branch). >>> > >>> > A technical report for featureCounts can be found here - >>> http://arxiv.org/abs/1305.3347. You may also refer to the Rsubread >>> users guide for some details about this function (typing >>> 'RsubreadUsersGuide()' in your R session). >>> > >>> > To see how featureCounts can be used in an RNA-seq analysis pipeline, >>> you may have a look at this case study - >>> http://bioinf.wehi.edu.au/RNAseqCaseStudy . This case study will also >>> be used in a Workshop in the incoming Bioc2013 meeting. >>> > >>> > Hope you find it useful. >>> > >>> > Best wishes, >>> > >>> > Wei >>> > >>> > ______________________________________________________________________ >>> > The information in this email is confidential and inte...{{dropped:20}} >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> >> >> -- >> *A model is a lie that helps you see the truth.* >> * >> * >> Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> >> > > > > -- > *A model is a lie that helps you see the truth.* > * > * > Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> > > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:10}}
ADD REPLY
0
Entering edit mode
Dear Wei, Thank you for your feedback. Indeed I missed that option from my original quick examination. I will take a closer look and try your suggestion. Best wishes, Diego On Tue, Jul 2, 2013 at 7:27 PM, Wei Shi <shi@wehi.edu.au> wrote: > Dear Diego, > > You can provide your own annotation to featureCounts via the 'annot' > parameter. Have a look at the help page of featureCounts, which describes > the formats of external annotations allowed by featureCounts. I imagine it > wouldn't be too hard to generate a data frame from a bioc annotation and > then feed it to featureCounts. > > Best wishes, > > Wei > > > > On Jul 2, 2013, at 12:31 PM, Diego Diez wrote: > > Dear Wei > > I recently gave a try to Rsubread/featureCounts() but found that > featureCounts only supports mm9, mm10 and hg19 assemblies (using the > XXX_RefSeq_exon.txt files with XXX being one of the assemblies mentioned). > Would it be possible to modify this function to obtain this information > from Bioconductor annotation packages? That would allow anyone to use mm8, > hg18 or use featureCounts() with other species if needed and the > corresponding data is present. A similar interface to generate the index > files from the genome sequences (from the Bioconductor genome annotation > packages) for the alignment step would be extremely useful. > > Best wishes, > Diego > > > On Tue, Jul 2, 2013 at 9:02 AM, Wei Shi <shi@wehi.edu.au> wrote: > >> Dear All, >> >> I would like to formally introduce to you the featureCounts function >> included in the Rsubread package. featureCounts is R function designed for >> summarizing sequencing reads to genomic features such as genes, exons and >> promoters. It is a light-weight general-purpose read counting program >> (essentially written in C), and it has the following features: >> (1) It performs precise read assignments by taking care of indels, >> junctions and fusions in the reads. >> (2) It takes less than 4 minutes to summarize 20 million pairs of reads >> to 26k RefSeq genes using one thread, and only uses 40MB of memory (you can >> even run it on a Mac laptop). >> (3) It supports multi-threaded running. >> (4) It supports GTF format annotation and SAM/BAM read data. >> (5) It supports strand-specific read summarization. >> (6) It can perform read summarization at both feature level (eg. exons) >> and meta-feature level (eg. genes). >> (7) It allows users to specify whether reads overlapping with more than >> one feature should be counted or not. >> (8) It gives users full control on the summarization of paired-end reads, >> including allowing them to check if both ends are mapped and/or if the >> paired-end distances satisfy the distance criteria. >> (9) It discriminates the features, which were overlapped by both ends >> from the same fragment, from those which were overlapped by only one end so >> as to get more fragments counted. >> (10) It allows users to specify whether chimeric fragments should be >> counted. >> (11) It can exclude multi-mapping reads and reads with low mapping >> quality scores from summarization. >> >> To use this function, make sure you are using the latest version of >> Rsubread (1.10.5 in the release branch). >> >> A technical report for featureCounts can be found here - >> http://arxiv.org/abs/1305.3347. You may also refer to the Rsubread users >> guide for some details about this function (typing 'RsubreadUsersGuide()' >> in your R session). >> >> To see how featureCounts can be used in an RNA-seq analysis pipeline, you >> may have a look at this case study - >> http://bioinf.wehi.edu.au/RNAseqCaseStudy . This case study will also be >> used in a Workshop in the incoming Bioc2013 meeting. >> >> Hope you find it useful. >> >> Best wishes, >> >> Wei >> >> ______________________________________________________________________ >> The information in this email is confidential and intend...{{dropped:6}} >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:10}}
ADD REPLY
0
Entering edit mode
Dear Tim, > You will need to use toggleProbes() to get all probe locations (this is an > undesirable side effect of the db0 package infrastructure, which is geared > towards expression arrays where probes that map to multiple accessions are a > "bad thing", and that is the reason I deprecated the 27k.db and 450k.db > packages). I understand. However, also with toggleProbes() I have got a problem. I tried for example: > x <- toggleProbes(IlluminaHumanMethylation450kPATH2PROBE, "all") Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ?toggleProbes? for signature ?"AnnDbBimap"? But for example the following works > x <- toggleProbes(IlluminaHumanMethylation450kENSEMBL2PROBE, "all") However, I would need the location information, so the column "transcript_location" (e.g. 5'UTR, TSS200, Body, etc.) and it seems that I cannot get it like this? > The FDb.InfiniumMethylation packages have the locations of the targeted CpGs > (and CpHs, and SNPs) for the corresponding build (just for the record, the > genome build targeted by 450k.db is hg19/GRch37), but they are returned > using get450k() or get27k() as a GRanges object. Currently, this doesn't work for me neither (see my previous post). Probably I'm doing something wrong ... > Essentially, the mistake I made in compiling the 27k.db and 450k.db packages > was to use an expression-centric (or gene- and transcript-centric) > infrastructure to represent probes targeting individual loci. It's more > useful to think of the 450k array, in particular, as a SNP array, with > functions for a given target locus inferred after discovery. It's also more > useful to think of the chip as a SNP array when considering differentially > methylated regions, which could be loosely regarded as the methylation > equivalent of copy number variations. Last but not least, you can in fact > use the array to estimate copy number along the genome, as for determining > sex from chrX copy number. Thank you for your hints and explanations! But at the moment I am still stucked at a much simpler step, I would just need an easy way to distinguish probes located in promoters from those located in gene bodies (and to discard those which are annotated to both). Best wishes, Simone
ADD REPLY
0
Entering edit mode
First off, my apologies about the BSgenome requirement. I realized (...slowly...) that rtracklayer requires the appropriate build of BSgenome to correctly assign the chromosome lengths when requested. It's either that or assume the user has connectivity to UCSC, or store them statically (which seems like a bad idea, but perhaps it isn't). I attempted to upload an updated version of the package but due to missed communications with Marc, it never made it onto the BioC repository. Mea culpa. That said... The following is for a bunch of hits from an experiment, but you could just swap in something like if(!require(BSgenome.Hsapiens.UCSC.hg19)) biocLite('BSgenome.Hsapiens.UCSC.hg19') if(!require(FDb.InfiniumMethylation.hg19)) biocLite('FDb.InfiniumMethylation.hg19') library(FDb.InfiniumMethylation.hg19) hm450 <- get450k() hm450.CpGs <- split(hm450, hm450$probeType)$cg strand(hm450.CpGs) <- '*' ## always worth double checking and use hm450.CpGs in the following code where I am using varByAgeProbes... library(Homo.sapiens) txsByGene <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, 'gene') names(txsByGene) <- mget(names(txsByGene), org.Hs.egSYMBOL, ifnotfound=NA) genicHits <- subsetByOverlaps(varByAgeProbes, txsByGene) genicOLs <- findOverlaps(varByAgeProbes, txsByGene) genesHit <- sort(table(names(txsByGene)[subjectHits(genicOLs)])) promotersByGene <- flank(txsByGene, 1500) ## adjust the number to suit your definitions promoterHits <- subsetByOverlaps(varByAgeProbes, promotersByGene) promoterOLs <- findOverlaps(varByAgeProbes, promotersByGene) promotersHit <- sort(table(names(promotersByGene)[subjectHits(promoterOLs)])) promoterGenicHits <- intersect( promoterHits, genicHits ) ## probes that hit both regions promotersAndGenes <- reduce(unlist(c(reduce(promotersByGene), reduce(txsByGene)))) intergenicHits <- varByAgeProbes[ which(!varByAgeProbes %over% promotersAndGenes) ] ## trust but verify: length(intersect(intergenicHits, genicHits)) ## [1] 0 length(intersect(intergenicHits, promoterHits)) ## [1] 0 So what, right? I mean, you could have figured this out from the manifest...maybe (i.e., assuming you didn't use the GEO manifest with its Excel-gene names, and assuming there are no recently updated or rare transcripts of interest in your data). But suppose you have a bunch of ChIP-seq or DNAse-seq results, or footprints of transcription factor binding sites, and you want to see which of those are hit. Or H3K27me3 peak calls, or CTCF peaks. Let's use the latter. (There's a very specific reason that I am looking at CTCF locations in this particular analysis) require(rtracklayer) CTCF.peaks <- import("CTCF_peaks.bed", genome='hg19') subsetByOverlaps( intergenicHits, CTCF.peaks ) You *could* do the above with just the manifest, but it would be a colossal pain in the... posterior. Now consider if you were looking at DNAse-inaccessible PU.1 sites, or HiC interacting domains, or the like. Anyways, I'll patch up the issues with FDb.InfiniumMethylation.hg18/19, try to make it a bit faster, and re-upload the packages over the weekend (or before, but I worry that can't be done). I hope the above clarifies my intent with the package. An alternative is to use minfi's mapToGenome() function, FWIW. On Tue, Jul 2, 2013 at 2:14 AM, Simone <enomis.bioc@gmail.com> wrote: > Dear Tim, > > > You will need to use toggleProbes() to get all probe locations (this is > an > > undesirable side effect of the db0 package infrastructure, which is > geared > > towards expression arrays where probes that map to multiple accessions > are a > > "bad thing", and that is the reason I deprecated the 27k.db and 450k.db > > packages). > > I understand. However, also with toggleProbes() I have got a problem. > I tried for example: > > > x <- toggleProbes(IlluminaHumanMethylation450kPATH2PROBE, "all") > > Error in (function (classes, fdef, mtable) : > unable to find an inherited method for function ‘toggleProbes’ for > signature ‘"AnnDbBimap"’ > > But for example the following works > > > x <- toggleProbes(IlluminaHumanMethylation450kENSEMBL2PROBE, "all") > > However, I would need the location information, so the column > "transcript_location" (e.g. 5'UTR, TSS200, Body, etc.) and it seems > that I cannot get it like this? > > > The FDb.InfiniumMethylation packages have the locations of the targeted > CpGs > > (and CpHs, and SNPs) for the corresponding build (just for the record, > the > > genome build targeted by 450k.db is hg19/GRch37), but they are returned > > using get450k() or get27k() as a GRanges object. > > Currently, this doesn't work for me neither (see my previous post). > Probably I'm doing something wrong ... > > > Essentially, the mistake I made in compiling the 27k.db and 450k.db > packages > > was to use an expression-centric (or gene- and transcript-centric) > > infrastructure to represent probes targeting individual loci. It's more > > useful to think of the 450k array, in particular, as a SNP array, with > > functions for a given target locus inferred after discovery. It's also > more > > useful to think of the chip as a SNP array when considering > differentially > > methylated regions, which could be loosely regarded as the methylation > > equivalent of copy number variations. Last but not least, you can in > fact > > use the array to estimate copy number along the genome, as for > determining > > sex from chrX copy number. > > Thank you for your hints and explanations! > But at the moment I am still stucked at a much simpler step, I would > just need an easy way to distinguish probes located in promoters from > those located in gene bodies (and to discard those which are annotated > to both). > > Best wishes, > Simone > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Wow, Tim, thank you so much, this code works perfectly! I think I never received from anybody a code directly and completely working ... And you are definitely right, I have to learn how to use all the GRanges stuff, it seems to be really flexible. Will work through the vignette! Again, thank you for your great help. Simone On Tue, Jul 2, 2013 at 7:23 PM, Tim Triche, Jr. <tim.triche at="" gmail.com=""> wrote: > First off, my apologies about the BSgenome requirement. I realized > (...slowly...) that rtracklayer requires the appropriate build of BSgenome > to correctly assign the chromosome lengths when requested. It's either that > or assume the user has connectivity to UCSC, or store them statically (which > seems like a bad idea, but perhaps it isn't). I attempted to upload an > updated version of the package but due to missed communications with Marc, > it never made it onto the BioC repository. Mea culpa. That said... > > The following is for a bunch of hits from an experiment, but you could just > swap in something like > > if(!require(BSgenome.Hsapiens.UCSC.hg19)) > biocLite('BSgenome.Hsapiens.UCSC.hg19') > if(!require(FDb.InfiniumMethylation.hg19)) > biocLite('FDb.InfiniumMethylation.hg19') > > library(FDb.InfiniumMethylation.hg19) > hm450 <- get450k() > hm450.CpGs <- split(hm450, hm450$probeType)$cg > > strand(hm450.CpGs) <- '*' ## always worth double checking > > > > and use hm450.CpGs in the following code where I am using varByAgeProbes... > > > library(Homo.sapiens) > txsByGene <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, 'gene') > names(txsByGene) <- mget(names(txsByGene), > org.Hs.egSYMBOL, ifnotfound=NA) > genicHits <- subsetByOverlaps(varByAgeProbes, txsByGene) > > genicOLs <- findOverlaps(varByAgeProbes, txsByGene) > genesHit <- sort(table(names(txsByGene)[subjectHits(genicOLs)])) > > promotersByGene <- flank(txsByGene, 1500) ## adjust the number to suit your > definitions > > promoterHits <- subsetByOverlaps(varByAgeProbes, promotersByGene) > > promoterOLs <- findOverlaps(varByAgeProbes, promotersByGene) > promotersHit <- > sort(table(names(promotersByGene)[subjectHits(promoterOLs)])) > > promoterGenicHits <- intersect( promoterHits, genicHits ) ## probes that > hit both regions > > promotersAndGenes <- reduce(unlist(c(reduce(promotersByGene), > reduce(txsByGene)))) > > intergenicHits <- varByAgeProbes[ which(!varByAgeProbes %over% > promotersAndGenes) ] > > ## trust but verify: > length(intersect(intergenicHits, genicHits)) > ## [1] 0 > length(intersect(intergenicHits, promoterHits)) > ## [1] 0 > > So what, right? I mean, you could have figured this out from the > manifest...maybe (i.e., assuming you didn't use the GEO manifest with its > Excel-gene names, and assuming there are no recently updated or rare > transcripts of interest in your data). But suppose you have a bunch of > ChIP-seq or DNAse-seq results, or footprints of transcription factor binding > sites, and you want to see which of those are hit. Or H3K27me3 peak calls, > or CTCF peaks. Let's use the latter. (There's a very specific reason that > I am looking at CTCF locations in this particular analysis) > > require(rtracklayer) > CTCF.peaks <- import("CTCF_peaks.bed", genome='hg19') > > subsetByOverlaps( intergenicHits, CTCF.peaks ) > > You *could* do the above with just the manifest, but it would be a colossal > pain in the... posterior. Now consider if you were looking at > DNAse-inaccessible PU.1 sites, or HiC interacting domains, or the like. > > Anyways, I'll patch up the issues with FDb.InfiniumMethylation.hg18/19, try > to make it a bit faster, and re-upload the packages over the weekend (or > before, but I worry that can't be done). I hope the above clarifies my > intent with the package. An alternative is to use minfi's mapToGenome() > function, FWIW. > > > > > > On Tue, Jul 2, 2013 at 2:14 AM, Simone <enomis.bioc at="" gmail.com=""> wrote: >> >> Dear Tim, >> >> > You will need to use toggleProbes() to get all probe locations (this is >> > an >> > undesirable side effect of the db0 package infrastructure, which is >> > geared >> > towards expression arrays where probes that map to multiple accessions >> > are a >> > "bad thing", and that is the reason I deprecated the 27k.db and 450k.db >> > packages). >> >> I understand. However, also with toggleProbes() I have got a problem. >> I tried for example: >> >> > x <- toggleProbes(IlluminaHumanMethylation450kPATH2PROBE, "all") >> >> Error in (function (classes, fdef, mtable) : >> unable to find an inherited method for function ?toggleProbes? for >> signature ?"AnnDbBimap"? >> >> But for example the following works >> >> > x <- toggleProbes(IlluminaHumanMethylation450kENSEMBL2PROBE, "all") >> >> However, I would need the location information, so the column >> "transcript_location" (e.g. 5'UTR, TSS200, Body, etc.) and it seems >> that I cannot get it like this? >> >> > The FDb.InfiniumMethylation packages have the locations of the targeted >> > CpGs >> > (and CpHs, and SNPs) for the corresponding build (just for the record, >> > the >> > genome build targeted by 450k.db is hg19/GRch37), but they are returned >> > using get450k() or get27k() as a GRanges object. >> >> Currently, this doesn't work for me neither (see my previous post). >> Probably I'm doing something wrong ... >> >> > Essentially, the mistake I made in compiling the 27k.db and 450k.db >> > packages >> > was to use an expression-centric (or gene- and transcript- centric) >> > infrastructure to represent probes targeting individual loci. It's more >> > useful to think of the 450k array, in particular, as a SNP array, with >> > functions for a given target locus inferred after discovery. It's also >> > more >> > useful to think of the chip as a SNP array when considering >> > differentially >> > methylated regions, which could be loosely regarded as the methylation >> > equivalent of copy number variations. Last but not least, you can in >> > fact >> > use the array to estimate copy number along the genome, as for >> > determining >> > sex from chrX copy number. >> >> Thank you for your hints and explanations! >> But at the moment I am still stucked at a much simpler step, I would >> just need an easy way to distinguish probes located in promoters from >> those located in gene bodies (and to discard those which are annotated >> to both). >> >> Best wishes, >> Simone > > > > > -- > A model is a lie that helps you see the truth. > > Howard Skipper
ADD REPLY
0
Entering edit mode
re: IlluminaHumanMethylation450kPATH2PROBE This one is new to me. I suspect that not all probes have a pathway mapped to them, but I will have to investigate further. If it's a KEGG issue, it may not be possible to resolve this due to KEGG's deprecation. On Tue, Jul 2, 2013 at 2:14 AM, Simone <enomis.bioc@gmail.com> wrote: > Dear Tim, > > > You will need to use toggleProbes() to get all probe locations (this is > an > > undesirable side effect of the db0 package infrastructure, which is > geared > > towards expression arrays where probes that map to multiple accessions > are a > > "bad thing", and that is the reason I deprecated the 27k.db and 450k.db > > packages). > > I understand. However, also with toggleProbes() I have got a problem. > I tried for example: > > > x <- toggleProbes(IlluminaHumanMethylation450kPATH2PROBE, "all") > > Error in (function (classes, fdef, mtable) : > unable to find an inherited method for function ‘toggleProbes’ for > signature ‘"AnnDbBimap"’ > > But for example the following works > > > x <- toggleProbes(IlluminaHumanMethylation450kENSEMBL2PROBE, "all") > > However, I would need the location information, so the column > "transcript_location" (e.g. 5'UTR, TSS200, Body, etc.) and it seems > that I cannot get it like this? > > > The FDb.InfiniumMethylation packages have the locations of the targeted > CpGs > > (and CpHs, and SNPs) for the corresponding build (just for the record, > the > > genome build targeted by 450k.db is hg19/GRch37), but they are returned > > using get450k() or get27k() as a GRanges object. > > Currently, this doesn't work for me neither (see my previous post). > Probably I'm doing something wrong ... > > > Essentially, the mistake I made in compiling the 27k.db and 450k.db > packages > > was to use an expression-centric (or gene- and transcript-centric) > > infrastructure to represent probes targeting individual loci. It's more > > useful to think of the 450k array, in particular, as a SNP array, with > > functions for a given target locus inferred after discovery. It's also > more > > useful to think of the chip as a SNP array when considering > differentially > > methylated regions, which could be loosely regarded as the methylation > > equivalent of copy number variations. Last but not least, you can in > fact > > use the array to estimate copy number along the genome, as for > determining > > sex from chrX copy number. > > Thank you for your hints and explanations! > But at the moment I am still stucked at a much simpler step, I would > just need an easy way to distinguish probes located in promoters from > those located in gene bodies (and to discard those which are annotated > to both). > > Best wishes, > Simone > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Dear Tim, Thanks a lot for your comprehensive reply. The code you posted seems quite complicated, I'll have a closer look on it tomorrow. I am just writing already now to tell you that I made a mistake in my previous posting: I did not mean IlluminaHumanMethylation450kPATH2PROBE, but IlluminaHumanMethylation450kPROBELOCATION. I mixed it up with some tests I did regarding toggleProbes(). Sorry! However, I get the same error for IlluminaHumanMethylation450kPROBELOCATION (which is probably the reason why I finally mixed it up): x <- toggleProbes(IlluminaHumanMethylation450kPROBELOCATION, "all") Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ?toggleProbes? for signature ?"AnnDbBimap"? Best wishes, Simone On Tue, Jul 2, 2013 at 7:25 PM, Tim Triche, Jr. <tim.triche at="" gmail.com=""> wrote: > re: IlluminaHumanMethylation450kPATH2PROBE > > This one is new to me. I suspect that not all probes have a pathway mapped > to them, but I will have to investigate further. If it's a KEGG issue, it > may not be possible to resolve this due to KEGG's deprecation. > > > > On Tue, Jul 2, 2013 at 2:14 AM, Simone <enomis.bioc at="" gmail.com=""> wrote: >> >> Dear Tim, >> >> > You will need to use toggleProbes() to get all probe locations (this is >> > an >> > undesirable side effect of the db0 package infrastructure, which is >> > geared >> > towards expression arrays where probes that map to multiple accessions >> > are a >> > "bad thing", and that is the reason I deprecated the 27k.db and 450k.db >> > packages). >> >> I understand. However, also with toggleProbes() I have got a problem. >> I tried for example: >> >> > x <- toggleProbes(IlluminaHumanMethylation450kPATH2PROBE, "all") >> >> Error in (function (classes, fdef, mtable) : >> unable to find an inherited method for function ?toggleProbes? for >> signature ?"AnnDbBimap"? >> >> But for example the following works >> >> > x <- toggleProbes(IlluminaHumanMethylation450kENSEMBL2PROBE, "all") >> >> However, I would need the location information, so the column >> "transcript_location" (e.g. 5'UTR, TSS200, Body, etc.) and it seems >> that I cannot get it like this? >> >> > The FDb.InfiniumMethylation packages have the locations of the targeted >> > CpGs >> > (and CpHs, and SNPs) for the corresponding build (just for the record, >> > the >> > genome build targeted by 450k.db is hg19/GRch37), but they are returned >> > using get450k() or get27k() as a GRanges object. >> >> Currently, this doesn't work for me neither (see my previous post). >> Probably I'm doing something wrong ... >> >> > Essentially, the mistake I made in compiling the 27k.db and 450k.db >> > packages >> > was to use an expression-centric (or gene- and transcript- centric) >> > infrastructure to represent probes targeting individual loci. It's more >> > useful to think of the 450k array, in particular, as a SNP array, with >> > functions for a given target locus inferred after discovery. It's also >> > more >> > useful to think of the chip as a SNP array when considering >> > differentially >> > methylated regions, which could be loosely regarded as the methylation >> > equivalent of copy number variations. Last but not least, you can in >> > fact >> > use the array to estimate copy number along the genome, as for >> > determining >> > sex from chrX copy number. >> >> Thank you for your hints and explanations! >> But at the moment I am still stucked at a much simpler step, I would >> just need an easy way to distinguish probes located in promoters from >> those located in gene bodies (and to discard those which are annotated >> to both). >> >> Best wishes, >> Simone > > > > > -- > A model is a lie that helps you see the truth. > > Howard Skipper
ADD REPLY
0
Entering edit mode
There's probably several ways to tighten up the GRanges part of the code, but looking at overlaps, etc. is much easier with fast interval mappings. It takes a bit of getting used to, however I have never seen a single person go back after getting used to the GenomicRanges approach to doing these things. YMMV Will look into the issue with toggleProbes. According to the documentation, it should work, but I get the same error that you do. I tried poking around a bit: showMethods("toggleProbes") ## Function: toggleProbes (package AnnotationDbi) ## x="ProbeAnnDbBimap" ## x="ProbeAnnDbMap" ## x="ProbeGo3AnnDbBimap" ## x="ProbeIpiAnnDbMap" toggleProbes(as(IlluminaHumanMethylation450kPROBELOCATION, 'ProbeAnnDbBimap'), 'all') ## Error in .toggleFilter(x, value) : ## This method can only be used on mappings where the Lkeys are probes. Perhaps the problem is that IlluminaHumanMethylation450kPROBELOCATION is joining tables? ## from zzz.R in the package... comments are verbatim from when I first put it together ## ## handy for finding out whether we're dealing with a promoter, body, UTR... IlluminaHumanMethylation450kPROBELOCATION<- AnnotationForge:::createSimpleBimap( "probelocation","Probe_ID","location",datacache,"PROBELOCATION","Illum inaHumanMe thylation450k.db") hmmm. On Tue, Jul 2, 2013 at 12:25 PM, Simone <enomis.bioc@gmail.com> wrote: > Dear Tim, > > Thanks a lot for your comprehensive reply. The code you posted seems > quite complicated, I'll have a closer look on it tomorrow. I am just > writing already now to tell you that I made a mistake in my previous > posting: I did not mean IlluminaHumanMethylation450kPATH2PROBE, but > IlluminaHumanMethylation450kPROBELOCATION. I mixed it up with some > tests I did regarding toggleProbes(). Sorry! > > However, I get the same error for > IlluminaHumanMethylation450kPROBELOCATION (which is probably the > reason why I finally mixed it up): > > x <- toggleProbes(IlluminaHumanMethylation450kPROBELOCATION, "all") > Error in (function (classes, fdef, mtable) : > unable to find an inherited method for function ‘toggleProbes’ for > signature ‘"AnnDbBimap"’ > > Best wishes, > Simone > > On Tue, Jul 2, 2013 at 7:25 PM, Tim Triche, Jr. <tim.triche@gmail.com> > wrote: > > re: IlluminaHumanMethylation450kPATH2PROBE > > > > This one is new to me. I suspect that not all probes have a pathway > mapped > > to them, but I will have to investigate further. If it's a KEGG issue, > it > > may not be possible to resolve this due to KEGG's deprecation. > > > > > > > > On Tue, Jul 2, 2013 at 2:14 AM, Simone <enomis.bioc@gmail.com> wrote: > >> > >> Dear Tim, > >> > >> > You will need to use toggleProbes() to get all probe locations (this > is > >> > an > >> > undesirable side effect of the db0 package infrastructure, which is > >> > geared > >> > towards expression arrays where probes that map to multiple accessions > >> > are a > >> > "bad thing", and that is the reason I deprecated the 27k.db and > 450k.db > >> > packages). > >> > >> I understand. However, also with toggleProbes() I have got a problem. > >> I tried for example: > >> > >> > x <- toggleProbes(IlluminaHumanMethylation450kPATH2PROBE, "all") > >> > >> Error in (function (classes, fdef, mtable) : > >> unable to find an inherited method for function ‘toggleProbes’ for > >> signature ‘"AnnDbBimap"’ > >> > >> But for example the following works > >> > >> > x <- toggleProbes(IlluminaHumanMethylation450kENSEMBL2PROBE, "all") > >> > >> However, I would need the location information, so the column > >> "transcript_location" (e.g. 5'UTR, TSS200, Body, etc.) and it seems > >> that I cannot get it like this? > >> > >> > The FDb.InfiniumMethylation packages have the locations of the > targeted > >> > CpGs > >> > (and CpHs, and SNPs) for the corresponding build (just for the record, > >> > the > >> > genome build targeted by 450k.db is hg19/GRch37), but they are > returned > >> > using get450k() or get27k() as a GRanges object. > >> > >> Currently, this doesn't work for me neither (see my previous post). > >> Probably I'm doing something wrong ... > >> > >> > Essentially, the mistake I made in compiling the 27k.db and 450k.db > >> > packages > >> > was to use an expression-centric (or gene- and transcript- centric) > >> > infrastructure to represent probes targeting individual loci. It's > more > >> > useful to think of the 450k array, in particular, as a SNP array, with > >> > functions for a given target locus inferred after discovery. It's > also > >> > more > >> > useful to think of the chip as a SNP array when considering > >> > differentially > >> > methylated regions, which could be loosely regarded as the methylation > >> > equivalent of copy number variations. Last but not least, you can in > >> > fact > >> > use the array to estimate copy number along the genome, as for > >> > determining > >> > sex from chrX copy number. > >> > >> Thank you for your hints and explanations! > >> But at the moment I am still stucked at a much simpler step, I would > >> just need an easy way to distinguish probes located in promoters from > >> those located in gene bodies (and to discard those which are annotated > >> to both). > >> > >> Best wishes, > >> Simone > > > > > > > > > > -- > > A model is a lie that helps you see the truth. > > > > Howard Skipper > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 935 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6