Entering edit mode
Weng Khong LIM
▴
10
@weng-khong-lim-5513
Last seen 10.2 years ago
Help
"bioconductor-request@r-project.org" <bioconductor- request@r-project.org=""> wrote:
>Send Bioconductor mailing list submissions to
> bioconductor@r-project.org
>
>To subscribe or unsubscribe via the World Wide Web, visit
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>or, via email, send a message with subject or body 'help' to
> bioconductor-request@r-project.org
>
>You can reach the person managing the list at
> bioconductor-owner@r-project.org
>
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of Bioconductor digest..."
>
>
>Today's Topics:
>
> 1. Open-rank faculty position, Dept of Biostatistics, Virginia
> Commonwealth University (Kellie J Archer/FS/VCU)
> 2. Re: question on easyRNASeq developer version (Yanju Zhang)
> 3. Re: question on easyRNASeq developer version (Nicolas Delhomme)
> 4. Re: question on easyRNASeq developer version (Yanju Zhang)
> 5. Re: Error of GTF Annotation in easyRNASeq (Nicolas Delhomme)
> 6. Feature request in readVcf (Sean Davis)
> 7. Re: Feature request in readVcf (Tim Triche, Jr.)
> 8. Re: GO annotation (Marc Carlson)
> 9. Re: GO annotation (Srinivasan, Sathish K)
> 10. Is normalization in edgeR required for small RNA sequencing
> data? (Daniela Lopes Paim Pinto)
> 11. NGS public data analysis (Jill Pleasance)
> 12. Re: GO annotation (KJ Lim)
> 13. Analysis of public GEO datasets - NGS (Jill [guest])
> 14. Re: Is normalization in edgeR required for small RNA
> sequencing data? (Mark Robinson)
> 15. Euro Bioc Devel 2012 Zurich CH -- Dec 13-14 2012 --
> registration open (Mark Robinson)
>
>
>---------------------------------------------------------------------
-
>
>Message: 1
>Date: Fri, 21 Sep 2012 09:45:27 -0400
>From: Kellie J Archer/FS/VCU <kjarcher@vcu.edu>
>To: bioconductor@r-project.org
>Subject: [BioC] Open-rank faculty position, Dept of Biostatistics,
> Virginia Commonwealth University
>Message-ID:
>
<of6c640745.b02aba54-on85257a80.004b92b2-85257a80.004b92c1@vcu.edu>
>Content-Type: text/plain; charset="ISO-8859-1"
>
>
>The Department of Biostatistics at Virginia Commonwealth University
>(VCU) is
>seeking to fill a tenured/tenure-eligible faculty position at the
level
>of
>assistant, associate, or full professor. We are seeking applicants
with
>training and research interest in the design and statistical analysis
>of
>high-throughput genomic data (e.g., next generation sequencing,
>microarray,
>proteomic technologies), bioinformatics, computational biology, or
>closely
>related area. Additionally, applicants should have collaborative
>research
>experience. Primary responsibilities include teaching and advising
>graduate
> students as well as conducting independent methodological research.
In
>addition, the successful applicant will be expected to collaborate
with
>other VCU investigators in related fields in obtaining extramural
grant
> support.
>
>The Department of Biostatistics has a 40+ year history in the VCU
>School of
>Medicine and is committed to excellence in both biostatistical
research
>and
>graduate education. The department offers both M.S. and Ph.D.
programs
>in
>Biostatistics, including a concentration in Genomic Biostatistics, a
>M.S.
>in Clinical Research in Biostatistics, and a Master of Public Health.
>Our
> biostatistics faculty, students, and staff collaborate with
clinical
>investigators on the Medical College of Virginia Campus (which
includes
>the
>Schools of Medicine, Dentistry, Pharmacy, Nursing, and Allied Health)
>in a
>wide variety of biomedical research projects. Located in Richmond,
>Virginia,
>VCU has established relationships with the Virginia Department of
>Health as
> well as local and regional health departments.
>
> Qualifications: For all levels, candidates should have a Ph.D.
in
>biostatistics, statistics or related field, demonstrated experience
in
>the
>analyses of high-throughput genomic or proteomic data, familiarity
with
>statistical programming environments for analyzing such data, and
>excellent
> oral and written communication skills.
>
> By Level of Appointment:
>
> Full Professor: Applicants should have an established track
record
> publishing in peer-reviewed journals, have national or
international
>prominence in their area of expertise, and have demonstrated
experience
> obtaining extramural research support.
>
>Associate Professor: Applicants should have an established track
record
> publishing in peer-reviewed journals and have demonstrated
experience
> obtaining extramural research support.
>
> Assistant Professor: Applicants should have at least two years
of
>experience beyond completion of their degree program and must
>demonstrate
> excellent oral and written communication skills.
>
>All candidates should have demonstrated experience working in and
>fostering
>a diverse faculty, staff, and student environment or commitment to do
>so as
>a faculty member at VCU. Potential candidates can submit
>applications,
>including a statement of research, teaching philosophy, curriculum
>vitae and
>contact information for three professional references, via mail ???
to
>Yvonne
>Hargrove, Department of Biostatistics, Virginia Commonwealth
>University,
>P.O. Box 980032, Richmond, VA 23298-0032 ??? or by e-mail
>to
> yfhargro@vcu.edu.
>
>Virginia Commonwealth University is an equal opportunity/affirmative
>action
>employer. Women, minorities and persons with disabilities are
>encouraged to
> apply.
> Kellie J. Archer, Ph.D.
> Associate Professor, Department of Biostatistics
> Director, VCU Massey Cancer Center Biostatistics Shared Resource
> Virginia Commonwealth University
> 830 East Main St., 718
> Richmond, VA 23298-0032
> phone: (804) 827-2039
> fax: (804) 828-8900
> e-mail: kjarcher@vcu.edu
> website: www.people.vcu.edu/~kjarcher
>
>
>------------------------------
>
>Message: 2
>Date: Fri, 21 Sep 2012 16:32:58 +0200
>From: Yanju Zhang <hollandorange.yanju@gmail.com>
>To: Nicolas Delhomme <delhomme@embl.de>
>Cc: bioconductor@r-project.org
>Subject: Re: [BioC] question on easyRNASeq developer version
>Message-ID:
>
<cabnzwf6nfm0eqs_ht3n5mcqcn=mbfkt+gutycobsm2gwyaycow@mail.gmail.com>
>Content-Type: text/plain
>
>Hi Nico
>As mentioned in SEQAnswers, I also encountered this problem:
>
>> "Error in mk_singleBracketReplacementValue(x, value) :
>> 'value' must be a CompressedIntegerList object"
>
>In my bam files, the reads are with different length.
>
>I am expecting the solution. If you need more information, please let
>me know.
>
>Best wishes
>Yanju
>
> [[alternative HTML version deleted]]
>
>
>
>------------------------------
>
>Message: 3
>Date: Fri, 21 Sep 2012 16:37:13 +0200
>From: Nicolas Delhomme <delhomme@embl.de>
>To: Yanju Zhang <hollandorange.yanju@gmail.com>
>Cc: bioconductor@r-project.org
>Subject: Re: [BioC] question on easyRNASeq developer version
>Message-ID: <aafb721c-86af-49bf-acf8-47ae5dba320d@embl.de>
>Content-Type: text/plain; charset=us-ascii
>
>Hi Yanju,
>
>Would you be OK with uploading the file that creates the problem on
my
>dropbox? If that's OK, I'll send you a link to it. That would be best
>for me to reproduce the error.
>
>Cheers,
>
>Nico
>
>---------------------------------------------------------------
>Nicolas Delhomme
>
>Genome Biology Computational Support
>
>European Molecular Biology Laboratory
>
>Tel: +49 6221 387 8310
>Email: nicolas.delhomme@embl.de
>Meyerhofstrasse 1 - Postfach 10.2209
>69102 Heidelberg, Germany
>---------------------------------------------------------------
>
>
>
>
>
>On Sep 21, 2012, at 4:32 PM, Yanju Zhang wrote:
>
>> Hi Nico
>> As mentioned in SEQAnswers, I also encountered this problem:
>> > "Error in mk_singleBracketReplacementValue(x, value) :
>> > 'value' must be a CompressedIntegerList object"
>>
>> In my bam files, the reads are with different length.
>>
>> I am expecting the solution. If you need more information, please
let
>me know.
>>
>>
>> Best wishes
>> Yanju
>>
>>
>
>
>
>------------------------------
>
>Message: 4
>Date: Fri, 21 Sep 2012 17:54:21 +0200
>From: Yanju Zhang <hollandorange.yanju@gmail.com>
>To: Nicolas Delhomme <delhomme@embl.de>
>Cc: bioconductor@r-project.org
>Subject: Re: [BioC] question on easyRNASeq developer version
>Message-ID:
>
<cabnzwf45nimkwg3guqjwbyqqqhxbw7qmpd4dx_s9wxjh13vo2w@mail.gmail.com>
>Content-Type: text/plain
>
>Hi Nico,
>
>It is fine with me to upload my bam file. Please give me the link.
>
>Best wishes
>Yanju
>
>Code + error + sessionInfo
>> chr.sizes=as.list(seqlengths(Hsapiens))
>> bamfiles=dir(getwd(),pattern="*.sorted.bam$")
>> RNASeq<- easyRNASeq(filesDirectory=getwd(),
>+ organism="Hsapiens",
>+ chr.sizes=chr.sizes,
>+ #readLength=80L,
>+ annotationMethod="biomaRt",
>+ format="bam",
>+ count="genes",
>+ summarization="geneModels",
>+ filenames=bamfiles[1],
>+ outputFormat="RNAseq"
>+ )
>
>
>
>Checking arguments...
>Fetching annotations...
>Computing gene models...
>Summarizing counts...
>Processing test.sorted.bam
>Updating the read length information.
>The reads have been trimmed.
>Minimum length of 50 bp.
>Maximum length of 80 bp.
>Error in mk_singleBracketReplacementValue(x, value) :
> 'value' must be a CompressedIntegerList object
>In addition: Warning messages:
>1: The use of the list for providing chromosome sizes has been
>deprecated.
>Use a named numeric vector instead.
>2: In easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens",
>chr.sizes
>= chr.sizes, :
>There are 16696 synthetic exons as determined from your annotation
that
>overlap! This implies that some reads will be counted more than once!
>Is
>that really what you want?
>3: In fetchCoverage(rnaSeq, format = format, filename = filename,
>filter =
>filter, :
>You enforce UCSC chromosome conventions, however the provided
>alignments
>are not compliant. Correcting it.
>4: In fetchCoverage(rnaSeq, format = format, filename = filename,
>filter =
>filter, :
>Not all the chromosome names in your chromosome size list 'chr.sizes'
>are
>present in your read file(s) (aln or bam).
>5: In fetchCoverage(rnaSeq, format = format, filename = filename,
>filter =
>filter, :
> The available chromosomes in both your read file(s) (aln or bam)
and
>'chr.sizes' list were restricted to their common term.
>These are: chr1, chr10, chr11, chr12, chr13, chr14, chr15, chr16,
>chr17,
>chr18, chr19, chr2, chr20, chr21, chr22, chr3, chr4, chr5, chr6,
chr7,
>chr8, chr9, chrM, chrX, chrY.
>
>> sessionInfo()
>R version 2.15.1 (2012-06-22)
>Platform: x86_64-unknown-linux-gnu (64-bit)
>
>locale:
>[1] C
>
>attached base packages:
>[1] parallel stats graphics grDevices utils datasets
methods
>[8] base
>
>other attached packages:
> [1] BSgenome.Hsapiens.UCSC.hg19_1.3.19 easyRNASeq_1.3.14
> [3] ShortRead_1.15.11 latticeExtra_0.6-24
> [5] RColorBrewer_1.0-5 Rsamtools_1.9.30
> [7] DESeq_1.9.14 lattice_0.20-6
> [9] locfit_1.5-8 BSgenome_1.25.8
>[11] GenomicRanges_1.9.65 Biostrings_2.25.12
>[13] IRanges_1.15.44 edgeR_2.99.8
>[15] limma_3.12.1 biomaRt_2.13.2
>[17] Biobase_2.17.7 genomeIntervals_1.13.3
>[19] BiocGenerics_0.3.1 intervals_0.13.3
>
>loaded via a namespace (and not attached):
> [1] AnnotationDbi_1.18.1 DBI_0.2-5 RCurl_1.91-1
> [4] RSQLite_0.11.1 XML_3.9-4 annotate_1.34.1
> [7] bitops_1.0-4.1 genefilter_1.38.0 geneplotter_1.35.1
>[10] grid_2.15.1 hwriter_1.3 splines_2.15.1
>[13] stats4_2.15.1 survival_2.36-14 xtable_1.7-0
>[16] zlibbioc_1.2.0
>
>
>
>
>On 21 September 2012 16:37, Nicolas Delhomme <delhomme@embl.de>
wrote:
>
>> Hi Yanju,
>>
>> Would you be OK with uploading the file that creates the problem on
>my
>> dropbox? If that's OK, I'll send you a link to it. That would be
best
>for
>> me to reproduce the error.
>>
>> Cheers,
>>
>> Nico
>>
>> ---------------------------------------------------------------
>> Nicolas Delhomme
>>
>> Genome Biology Computational Support
>>
>> European Molecular Biology Laboratory
>>
>> Tel: +49 6221 387 8310
>> Email: nicolas.delhomme@embl.de
>> Meyerhofstrasse 1 - Postfach 10.2209
>> 69102 Heidelberg, Germany
>> ---------------------------------------------------------------
>>
>>
>>
>>
>>
>> On Sep 21, 2012, at 4:32 PM, Yanju Zhang wrote:
>>
>> > Hi Nico
>> > As mentioned in SEQAnswers, I also encountered this problem:
>> > > "Error in mk_singleBracketReplacementValue(x, value) :
>> > > 'value' must be a CompressedIntegerList object"
>> >
>> > In my bam files, the reads are with different length.
>> >
>> > I am expecting the solution. If you need more information, please
>let me
>> know.
>> >
>> >
>> > Best wishes
>> > Yanju
>> >
>> >
>>
>>
>
> [[alternative HTML version deleted]]
>
>
>
>------------------------------
>
>Message: 5
>Date: Fri, 21 Sep 2012 17:59:35 +0200
>From: Nicolas Delhomme <delhomme@embl.de>
>To: Nicolas Delhomme <delhomme@embl.de>
>Cc: Dadi Gao <dgao3450@uni.sydney.edu.au>, bioconductor@r-project.org
>Subject: Re: [BioC] Error of GTF Annotation in easyRNASeq
>Message-ID: <e8f88d62-00ca-45b9-97b9-6b8dfa8cc0a7@embl.de>
>Content-Type: text/plain; charset=us-ascii
>
>Hi Dadi,
>
>The error comes from a change of API that affects a package I depend
>upon. I've contacted the maintainer and will let you know once it
gets
>fixed. I might take some time though (~ 1 week).
>
>Cheers,
>
>Nico
>
>---------------------------------------------------------------
>Nicolas Delhomme
>
>Genome Biology Computational Support
>
>European Molecular Biology Laboratory
>
>Tel: +49 6221 387 8310
>Email: nicolas.delhomme@embl.de
>Meyerhofstrasse 1 - Postfach 10.2209
>69102 Heidelberg, Germany
>---------------------------------------------------------------
>
>
>
>
>
>On Sep 21, 2012, at 10:26 AM, Nicolas Delhomme wrote:
>
>> Moreover, to make sure that this is not a package conflict can you
>please NOT load the library(RnaSeqTutorial). You do not need it to
run
>easyRNASeq. So your script should read:
>>
>> library(easyRNASeq)
>> library(BSgenome.Mmusculus.UCSC.mm9)
>>
>> setwd("/home/gao/RNA")
>>
>> ## the "." is your current directory.
>> count.table <- easyRNASeq(".",
>> pattern=".sorted.bam$",
>> organism="MMusculus",
>> annotationMethod="gtf",
>> annotationFile="mm9gene.gtf",
>> count="genes",
>> summarization="geneModels",
>> normalize=TRUE
>> )
>>
>>
>> Cheers,
>>
>> Nico
>>
>> ---------------------------------------------------------------
>> Nicolas Delhomme
>>
>> Genome Biology Computational Support
>>
>> European Molecular Biology Laboratory
>>
>> Tel: +49 6221 387 8310
>> Email: nicolas.delhomme@embl.de
>> Meyerhofstrasse 1 - Postfach 10.2209
>> 69102 Heidelberg, Germany
>> ---------------------------------------------------------------
>>
>>
>>
>>
>>
>> On Sep 21, 2012, at 10:19 AM, Nicolas Delhomme wrote:
>>
>>> Dear Dadi,
>>>
>>> I will need a little more information from you. In addition, it's
>best if you post such emails to the bioconductor mailing list (which
>I've Cced, so please "answer to all" when you reply.). See there for
>subscribing: http://www.bioconductor.org/help/mailing-list/. What I
>need to know from you first is what is described in that page:
>http://www.bioconductor.org/help/mailing-list/posting-guide/ mainly
>under the sections preparing and composing. In essence I need to know
>what version of R and bioconductor packages you are using.
>>>
>>> Then, installing you package in the installation directory of an
>existing package is not the safest. You might a) disrupt that package
>functionality b) possibly lose your data if that package gets
updated.
>You'd rather move your RNA folder to you home directory and use that
>directory, e.g. /home/gao/RNA instead. Using the setwd command, you
can
>make that your current working dir.
>>>
>>> So the following two blocks results in the same:
>>>
>>> setwd("/home/gao/RNA")
>>>
>>> ## the "." is your current directory.
>>> count.table <- easyRNASeq(".",
>>> pattern=".sorted.bam$",
>>> organism="MMusculus",
>>> annotationMethod="gtf",
>>> annotationFile="mm9gene.gtf",
>>> count="genes",
>>> summarization="geneModels",
>>> normalize=TRUE
>>> )
>>>
>>> Or:
>>>
>>> count.table <- easyRNASeq("/home/gao/RNA",
>>> pattern=".sorted.bam$",
>>> organism="MMusculus",
>>> annotationMethod="gtf",
>>> annotationFile="/home/gao/RNA /mm9gene.gtf",
>>> count="genes",
>>> summarization="geneModels",
>>> normalize=TRUE
>>> )
>>>
>>> Now, for the error, can you please tell me more about what aligner
>you used for you data , whether it is Paired-End or not and finally
>whether the reads have been dynamically trimmed (i.e. if reads of
>variable length are expected ) or not?
>>>
>>> What actually bothers me in your error is that it mentions:
>>>
>>> easyRNASeq(system.file("miRNA", package = "RnaSeqTutorial"),
>>>
>>> instead of
>>>
>>> easyRNASeq(system.file("RNA", package="RnaSeqTutorial"),
>>>
>>> i.e. miRNA instead of RNA. So to make sure that the error is
>reproducible can you move your RNA folder to a different directory
and
>re-run the command as above? I don't expect this to solve the error
>though, but at least we'd have a "cleaner" setup for reproducing it.
>>>
>>> Best,
>>>
>>> Nico
>>>
>>> ---------------------------------------------------------------
>>> Nicolas Delhomme
>>>
>>> Genome Biology Computational Support
>>>
>>> European Molecular Biology Laboratory
>>>
>>> Tel: +49 6221 387 8310
>>> Email: nicolas.delhomme@embl.de
>>> Meyerhofstrasse 1 - Postfach 10.2209
>>> 69102 Heidelberg, Germany
>>> ---------------------------------------------------------------
>>>
>>>
>>>
>>>
>>>
>>> On Sep 21, 2012, at 2:41 AM, Dadi Gao wrote:
>>>
>>>> Dear Dr. Delhomme,
>>>>
>>>> I'm currently study gene expression pattern from deep sequencing
>data of mouse blood cell using easyRNASeq.
>>>> I created a folder called "RNA" under R package RnaSeqTutorial
>path.
>>>> Within this folder, I put 3 RNA-seq data files called
>"N1.sorted.bam", "N2.sorted.bam" and "N3.sorted.bam", with their bam
>index files.
>>>> It also contains a GTF file for mouse gene annotation downloaded
>from UCSC, called "mm9gene.gtf".
>>>>
>>>> I'm using the following code to normalize the gene expression:
>>>>
>>>> library(easyRNASeq)
>>>> library(RnaSeqTutorial)
>>>> library(BSgenome.Mmusculus.UCSC.mm9)
>>>>
>>>> count.table <- easyRNASeq(system.file("RNA",
>package="RnaSeqTutorial"),
>>>> pattern=".sorted.bam$",
>>>> organism="MMusculus",
>>>> annotationMethod="gtf",
>>>> annotationFile=system.file("RNA", "mm9gene.gtf",
>package="RnaSeqTutorial"),
>>>> count="genes",
>>>> summarization="geneModels",
>>>> normalize=TRUE
>>>> )
>>>>
>>>> But this runs with an error as:
>>>>
>>>> Checking arguments...
>>>> Fetching annotations...
>>>> Read 962651 records
>>>> Warning message:
>>>> In easyRNASeq(system.file("miRNA", package = "RnaSeqTutorial"),
:
>>>> You enforce UCSC chromosome conventions, however the provided
>chromosome size list is not compliant. Correcting it.
>>>> Error in all.annotation[all.annotation$type %in% annotation.type,
]
>:
>>>> error in evaluating the argument 'i' in selecting a method for
>function '[': Error in all.annotation$type %in% annotation.type :
>>>> error in evaluating the argument 'x' in selecting a method for
>function '%in%': Error in function (classes, fdef, mtable) :
>>>> unable to find an inherited method for function "annotation", for
>signature "Genome_intervals_stranded"
>>>>
>>>> Did I do something wrong?
>>>>
>>>> Sincerely yours,
>>>> Dadi Gao
>>>>
>>>> Bioinformatics Group
>>>> Centenary Institute
>>>> Building 93, Royal Prince Alfred Hospital
>>>> Missenden Rd, Camperdown, NSW 2050
>>>> Australia
>>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor@r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>
>------------------------------
>
>Message: 6
>Date: Fri, 21 Sep 2012 13:51:54 -0400
>From: Sean Davis <sdavis2@mail.nih.gov>
>To: bioconductor@r-project.org
>Subject: [BioC] Feature request in readVcf
>Message-ID:
> <caneavbmmjpjryjx8axvcdsd1xw4qbajd=s7b3oaep_ypr- mxwa@mail.gmail.com="">
>Content-Type: text/plain
>
>Hi, Val.
>
>Is there in interest in simply ignoring unknown INFO and GENOTYPE
>fields
>when parsing VCF files, perhaps by issuing a warning instead of an
>error?
>There are LOTS of malformed VCF files out there. In some cases, they
>are
>not useable, but in this case, they can be perfectly useable if these
>unknown fields are simply ignored.
>
>> dat = readVcf('tmp.gatk.vcf',genome='hg19')
>Error: scanVcf: record 22 INFO 'KGPilot123' not found
> path:
>/Volumes/CCRBioinfo/projects/RosenbergImmuneStudy/staging/tmp.gatk.vc
f
>
>Thanks,
>Sean
>
> [[alternative HTML version deleted]]
>
>
>
>------------------------------
>
>Message: 7
>Date: Fri, 21 Sep 2012 13:50:29 -0700
>From: "Tim Triche, Jr." <tim.triche@gmail.com>
>To: Sean Davis <sdavis2@mail.nih.gov>
>Cc: bioconductor@r-project.org
>Subject: Re: [BioC] Feature request in readVcf
>Message-ID:
>
<cac+n9bvynua_smpcsfwwxmmu3ovzqnc314bq6+c=eyeozmgpmw@mail.gmail.com>
>Content-Type: text/plain
>
>+1
>
>thanks,
>
>--t
>
>
>
>On Fri, Sep 21, 2012 at 10:51 AM, Sean Davis <sdavis2@mail.nih.gov>
>wrote:
>
>> Hi, Val.
>>
>> Is there in interest in simply ignoring unknown INFO and GENOTYPE
>fields
>> when parsing VCF files, perhaps by issuing a warning instead of an
>error?
>> There are LOTS of malformed VCF files out there. In some cases,
>they are
>> not useable, but in this case, they can be perfectly useable if
these
>> unknown fields are simply ignored.
>>
>> > dat = readVcf('tmp.gatk.vcf',genome='hg19')
>> Error: scanVcf: record 22 INFO 'KGPilot123' not found
>> path:
>>
>/Volumes/CCRBioinfo/projects/RosenbergImmuneStudy/staging/tmp.gatk.vc
f
>>
>> Thanks,
>> Sean
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>
>--
>*A model is a lie that helps you see the truth.*
>*
>*
>Howard
>Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
>
> [[alternative HTML version deleted]]
>
>
>
>------------------------------
>
>Message: 8
>Date: Fri, 21 Sep 2012 17:49:26 -0700
>From: Marc Carlson <mcarlson@fhcrc.org>
>To: bioconductor@r-project.org
>Subject: Re: [BioC] GO annotation
>Message-ID: <505D0B16.80804@fhcrc.org>
>Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
>Hi Lim,
>
>First of all it all depends on what you have for gene identifiers.
If
>you are like most people you will have entrez gene IDs. So for now I
>will assume you have those.
>
>## So lets further assume you are working with humans and just choose
>the 1st two entrez gene IDs so that we can make a (hopefully
>meaningful)
>example
>ids = c("1","2")
>## now load the org library for humans
>library(org.Hs.eg.db)
>## then you can call select to extract your GO IDs like this:
>select(org.Hs.eg.db, keys = ids, cols = "GO", keytype = "ENTREZID")
>
>Now one thing to notice is that if you have some other kind of
>identifier, then your keytype argument will have to be set to a
>different value. And hopefully, the kind of ID you are using, is
>present in the package that you have to search... See the manual
page
>for select for more information.
>
>?select
>
> From your question, I also recognize that you may not be able to do
>this because it sounds like you might be using a more unusual
organism
>and not be using something commonplace like human. Well don't give
up
>just yet, because we may be able to help you there too. You can look
>at
>the manual page for the function makeOrgPackageFromNCBI to learn how
>you
>can try to generate an org package from just the taxonomy ID (which
you
>can look up on NCBIs website). If the data is available at NCBI,
then
>you should be able to generate a package from NCBI that will match
your
>organism of choice.
>
>?makeOrgPackageFromNCBI
>
>
>Does that answer your question?
>
>
> Marc
>
>
>
>On 09/21/2012 02:35 AM, KJ Lim wrote:
>> Dear Bioconductor community,
>>
>> Good day.
>>
>> I did the differential expression analysis for my RNA-Seq data with
>edgeR
>> package. I have a list of differentially expressed genes now and I
>would
>> like to find the GO terms for the genes.
>>
>> I have been reading and searching around for the right package.
But,
>I
>> found that several packages are developed based on model species.
>Could the
>> community kindly please suggest me what GO annotation package I can
>use for
>> non-model species; plant RNA-Seq data?
>>
>> Thank you very much and have a nice weekend.
>>
>> Best regards,
>> KJ Lim
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
>------------------------------
>
>Message: 9
>Date: Sat, 22 Sep 2012 01:21:29 -0400
>From: "Srinivasan, Sathish K" <ssrinivasan@med.miami.edu>
>To: Marc Carlson <mcarlson@fhcrc.org>, "bioconductor@r-project.org"
> <bioconductor@r-project.org>
>Subject: Re: [BioC] GO annotation
>Message-ID:
><4EB41664F8279A4CA870A740E038CFEA3E1E7EB0CB@MEDEXMB05.ad.med.miami.ed
u>
>
>Content-Type: text/plain; charset="us-ascii"
>
>Hi Marc,
>Could you suggest a go-to literature reference on annotating genomic
>data using bioconductor packages, probably a book or any of its kind.
>Thanks
>
>~Sathish
>
>-----Original Message-----
>From: bioconductor-bounces@r-project.org
>[mailto:bioconductor-bounces@r-project.org] On Behalf Of Marc Carlson
>Sent: Friday, September 21, 2012 8:49 PM
>To: bioconductor@r-project.org
>Subject: Re: [BioC] GO annotation
>
>Hi Lim,
>
>First of all it all depends on what you have for gene identifiers.
If
>you are like most people you will have entrez gene IDs. So for now I
>will assume you have those.
>
>## So lets further assume you are working with humans and just choose
>the 1st two entrez gene IDs so that we can make a (hopefully
>meaningful) example ids = c("1","2") ## now load the org library for
>humans
>library(org.Hs.eg.db)
>## then you can call select to extract your GO IDs like this:
>select(org.Hs.eg.db, keys = ids, cols = "GO", keytype = "ENTREZID")
>
>Now one thing to notice is that if you have some other kind of
>identifier, then your keytype argument will have to be set to a
>different value. And hopefully, the kind of ID you are using, is
>present in the package that you have to search... See the manual
page
>for select for more information.
>
>?select
>
>From your question, I also recognize that you may not be able to do
>this because it sounds like you might be using a more unusual
organism
>and not be using something commonplace like human. Well don't give
up
>just yet, because we may be able to help you there too. You can look
>at the manual page for the function makeOrgPackageFromNCBI to learn
how
>you can try to generate an org package from just the taxonomy ID
(which
>you can look up on NCBIs website). If the data is available at NCBI,
>then you should be able to generate a package from NCBI that will
match
>your organism of choice.
>
>?makeOrgPackageFromNCBI
>
>
>Does that answer your question?
>
>
> Marc
>
>
>
>On 09/21/2012 02:35 AM, KJ Lim wrote:
>> Dear Bioconductor community,
>>
>> Good day.
>>
>> I did the differential expression analysis for my RNA-Seq data with
>> edgeR package. I have a list of differentially expressed genes now
>and
>> I would like to find the GO terms for the genes.
>>
>> I have been reading and searching around for the right package.
But,
>I
>> found that several packages are developed based on model species.
>> Could the community kindly please suggest me what GO annotation
>> package I can use for non-model species; plant RNA-Seq data?
>>
>> Thank you very much and have a nice weekend.
>>
>> Best regards,
>> KJ Lim
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor@r-project.org
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
>------------------------------
>
>Message: 10
>Date: Sat, 22 Sep 2012 00:23:09 +0200
>From: Daniela Lopes Paim Pinto <d.lopespaimpinto@sssup.it>
>To: bioconductor@r-project.org
>Subject: [BioC] Is normalization in edgeR required for small RNA
> sequencing data?
>Message-ID:
> <cahk- ra1aw+tcnkajrax7wfjychh22whn2sfwhat2wmu-="WWdsA@mail.gmail.com">
>Content-Type: text/plain
>
>Dear All,
>
>I am PhD student, currently working on differential expression
analysis
>of
>my smallRNA library deep sequencing data and trying to identify
>differentially expressed miRNAs, using edgeR package. I have 24
>different
>samples with 2 biological replicates (48 libraries). I am performing
>multiple group comparison using GLM method and also Anova-like test
to
>idetify DE miRNAs among the different groups of my samples.
>My question is :
>
>Do I need to normalize my input data using *calcNormFactors() *once
I
>set
>my DGE list or I could proceed without any normalization? I assume
in
>this
>case that edgeR performs a default normallization when it is
>"calculating
>library sizes from column totals"?
>
>
>I would really appreciate any suggestion on this!
>
>
>Thanks in advance,
>
>
>Daniela
>
> [[alternative HTML version deleted]]
>
>
>
>------------------------------
>
>Message: 11
>Date: Sat, 22 Sep 2012 08:03:32 +0100
>From: Jill Pleasance <jpleasance@gmail.com>
>To: bioconductor@r-project.org
>Subject: [BioC] NGS public data analysis
>Message-ID:
> <cajhs7oj_ebg2vgbx=fpyvxxkbdq- tdzzmdbmsymnau="r6Gp6Lw@mail.gmail.com">
>Content-Type: text/plain
>
>Hi
>
>
>
>I am writing as I am trying to analyse NGS data from public data
(GEO)
>specifically datasets such as one sample per time point. The raw
>(somewhat
>processed data) is 3 samples at different time points where ?The read
>count
>at exon, splice-junction, transcript and gene levels were summarized
>and
>normalized to relative abundance in Fragments Per Kilobase of exon
>model
>per Million (FPKM) in order to compare transcription level among
>samples.?
>
>
>
>The authors of this paper then used The differentially expressed
>transcripts were identified using M-A based random sampling method
>implemented in DEGseq package in BioConductor (
>http://bioconductor.org/packages/2.5/bioc/html/DEGseq.html). The
>transcripts were further filtered at > 2-fold change and a minimum
read
>count of 50 in either condition.
>
>
>
>I have read through some of your posts where Gordon suggested using a
>simple excel formula to achieve fold changes when you don?t have
>replicates
>
>*lib.size1 <- sum(y1)*
>
>>>* lib.size2 <- sum(y2)*
>
>>>* logFC <-
log2((y1+0.5)/(lib.size1+0.5)/(y2+0.5)*(lib.size2+0.5))*
>
>* *
>
>Is this something I could apply to the current analysis? I have 3
>files -
>with gene ID and counts (one for each sample) and if genes are not
>listed
>in the sample files ? I assume the counts are zero. Would you have
any
>suggestions as to what to do with these zero count reads?
>
>
>I am trying to avoid learning how to script write at the moment to
see
>if
>this analysis works and obviously when
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
[[alternative HTML version deleted]]