Parthav,
Great to know that you got the correct strand information now.
To understand the meaning of each output variable, please type
help(annotatePeakInBatch) in R. Under the value section, you will see
the
description for each output variable. For example, distancetoFeature
is
described as "distance to the nearest feature such as transcription
start
site. By default, the distance is calculated as the distance between
the
start of the binding site and the TSS that is the gene start for genes
located on the forward strand and the gene end for genes located on
the
reverse strand."
Please see additional inline comments below.
Best regards,
Julie
On 12/3/13 11:37 AM, "Jailwala, Parthav (NIH/NCI) [C]"
<parthav.jailwala at="" nih.gov=""> wrote:
> Hi Julie,
>
> Thanks !
> I fixed the strand information in the annotation file and now I do
get
> correct strand information in the output.
>
> However, when looking at the output, I am still confused about the
> 'upstream/downstream' determination for features that are on -ve
strand.
> My understanding is that for genes on the reverse strand, the Start
= 3'
> end of the gene and the End= 5' end of the gene. Hence, when I chose
'TSS'
> as the option, all distances should have been calculated from the
TSS,
> that is the 'End' coordinate for that gene.
Correct.
> Also, for features on the
> negative strand, if the start of the peak is higher than the TSS of
the
> feature, then actually, the peak is 'Upstream' of the feature.
However, in
> the output, for features on -ve strand,when the start of the peak is
> higher than the TSS of the feature, the peak is determined to be
> 'Downstream' of the feature.
Could you please send me an example output row? Also which version of
ChIPpeakAnno did you use ? Please type sessionInfo() in R and copy the
output.
>
> I will really appreciate if you can advise if my understanding is
> incorrect.
>
> Thanks
> Parthav
>
>
>
>
> On 12/3/13 11:09 AM, "Zhu, Lihua (Julie)" <julie.zhu at="" umassmed.edu=""> wrote:
>
>> Parthav,
>>
>> Your annotation file is not in bed format, i.e., strand information
needs
>> to
>> be on the 6th column (
http://genome.ucsc.edu/FAQ/FAQformat#format1). You
>> can fix it by adding score as 5th column.
>>
>> Please let me know if you still have problem after fixing the
annotation
>> file. Thanks!
>>
>> Best regards,
>>
>> Julie
>>
>>
>> On 12/3/13 10:10 AM, "Jailwala, Parthav (NIH/NCI) [C]"
>> <parthav.jailwala at="" nih.gov=""> wrote:
>>
>>> Julie,
>>>
>>> Thanks for your response. Attached is my input file of 'peaks'
(2070
>>> lincRNA_mergedGTF.txt), the features annotation file that I am
using
>>> (23188PCGgroupEnsemblGTFwithstrand.txt: it has strand information
coded
>>> as
>>> +,-).
>>>
>>> Also attached is the output file that shows the strand information
as
>>> all
>>> positive (2070lincRNAmergedGTF.annout)
>>>
>>> Here is the sessionInfo()
>>>
>>>
>>>> sessionInfo()
>>> R version 3.0.2 (2013-09-25)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] parallel grid stats graphics grDevices utils datasets
>>> [8] methods base
>>>
>>> other attached packages:
>>> [1] ChIPpeakAnno_2.10.0 GenomicFeatures_1.14.2
>>> [3] limma_3.18.3 org.Hs.eg.db_2.10.1
>>> [5] GO.db_2.10.1 RSQLite_0.11.4
>>> [7] DBI_0.2-7 AnnotationDbi_1.24.0
>>> [9] BSgenome.Ecoli.NCBI.20080805_1.3.17 BSgenome_1.30.0
>>> [11] GenomicRanges_1.14.3 Biostrings_2.30.1
>>> [13] XVector_0.2.0 IRanges_1.20.6
>>> [15] multtest_2.18.0 Biobase_2.22.0
>>> [17] biomaRt_2.18.0 BiocGenerics_0.8.0
>>> [19] VennDiagram_1.6.5
>>>
>>> loaded via a namespace (and not attached):
>>> [1] MASS_7.3-29 RCurl_1.95-4.1 Rsamtools_1.14.2 XML_3.98-1.1
>>> [5] bitops_1.0-6 rtracklayer_1.22.0 splines_3.0.2 stats4_3.0.2
>>> [9] survival_2.37-4 tools_3.0.2 zlibbioc_1.8.0
>>>>
>>>
>>>
>>> On 12/3/13 9:52 AM, "Zhu, Lihua (Julie)"
>>> <julie.zhu at="" umassmed.edu<mailto:julie.zhu="" at="" umassmed.edu="">>
wrote:
>>>
>>> Parthav,
>>>
>>> Could you please send us the code snippets, a test bed file and
the
>>> sessionInfo? Thanks!
>>>
>>> Best regards,
>>>
>>> Julie
>>>
>>>
>>> On 12/3/13 9:43 AM, "Jailwala, Parthav (NIH/NCI) [C]"
>>> <parthav.jailwala at="" nih.gov<mailto:parthav.jailwala="" at="" nih.gov="">>
wrote:
>>>
>>> Hi Julie,
>>> I have a strand issue with using the AnnotatePeakinBatch function
>>> within the
>>> ChipPeakAnno package and am reaching out to you to see if you can
help
>>> to
>>> figure out what is the issue.
>>> I am trying to find the distance to the TSS , for a set of
lincRNA. To
>>> do
>>> this, I am using my own BED file of the 'background' or
Annotation. The
>>> BED
>>> file looks like this:
>>> Y 597158 623056 Ddx3y -
>>> Y 346986 365290 Eif2s3y +
>>> Y 2118049 2129045 Gm10256 +
>>> Y 2156899 2168120 Gm10352 +
>>> Y 1976249 1976584 Gm16501 -
>>> Y 2390390 2398856 Gm3376 +
>>> As you can see, there is now header row for the column names as
well
>>> as, the
>>> fifth column is the strand of the feature.
>>> Now, when I run the command, in the output file, the 'Strand'
column is
>>> always
>>> +ve (Always + eventhough the feature is on ?ve strand).
>>> Here is a sample from the output file:
>>>
>>> "","space","start","end","width","names","peak","strand","feature"
,"start
>>> _posi
>>> tion","end_position","insid
>>>
>>> eFeature","distancetoFeature","shortestDistance","fromOverlappingO
rNeares
>>> t"
>>> "1","1",9708702,9782003,73302,"0001
>>>
23152","0001","+","23152",9708703,9738463,"includeFeature",-1,1,"Near
>>> estStart"
>>> "2","1",134088012,134153958,65947,"0002
>>> 22624","0002","+","22624",134088013,134153958,"overlapStart",-1,0
>>> ,"NearestStart"
>>> "3","1",171899539,172040632,141094,"0003
>>> 22283","0003","+","22283",171902439,172040632,"overlapStart",-29
>>> 00,0,"NearestStart"
>>> "4","1",195333431,195335997,2567,"0004
>>> 22164","0004","+","22164",195172540,195196491,"downstream",160891,
>>> 136940,"NearestStart"
>>> I will really appreciate if you can tell me what is wrong with my
>>> inputs.
>>> Thanks
>>> Parthav Jailwala
>>> Parthav Jailwala [Contractor]
>>> Bioinformatics Analyst, CCRIFX Bioinformatics Core
>>> Advanced Biomedical Computing Center (ABCC)
>>> Information Systems Program
>>> Leidos Biomedical Research, Inc.
>>> (formerly SAIC-Frederick, Inc.)
>>> Frederick National Laboratory for Cancer Research (FNLCR)
>>> P. O. Box B, Frederick, MD 21702
>>> Building 41-B620, NIH, Bethesda, MD
>>> E-mail:
>>>
>>> parthav.jailwala at nih.gov<mailto:parthav.jailwala at="" nih.gov=""><mailto:parthav.>>> jailw
>>> ala at nih.gov>
>>> Bethesda: 301.451.3455
>>> Frederick: 301.846.5664
>>> Fax (Bethesda): 301.480.0391
>>>
http://ccrifx.cancer.gov<http: ccrifx.cancer.gov=""/>
>>> [cid:3573556C-D796-400A-A322-DCBDDD35455A]
>>>
>>>
>>
>