Hello, I am ***new*** to NGS data analysis. I am working with rice RNA-seq data. I have generated the bam files using Tophat and wish to use edgeR. I need to find the DEGs and also RPKM values at the gene level. There are some issues which I am unable to resolve:
1) For the build which I am using, three files are available: locus.gff, transcript.gff and transcript_exon.gff. Which of these should I use for a gene-level analysis? For RPKM, I need the gene lengths as well. The formats are given below
2) There is some problem with the format of the GTF files? I used the following commands:
x<-featureCounts("accepted_hits.bam", annot.ext="transcripts_exon.gff", isGTFAnnotationFile + =TRUE, GTF.featureType + ="exon", GTF.attrType="ID", useMetaFeatures + =TRUE)
I have been checking many forums, but unable to get a satisfactory explanation. Kindly help.
locus.gff
chr01 irgsp1_locus gene 2983 10815 . + . ID=Os01g0100100;Name=Os01g0100100;Note=RabGAP/TBC domain containing protein. (Os01t0100100-01);Transcript variants=Os01t0100100-01
chr01 irgsp1_locus gene 11218 12435 . + . ID=Os01g0100200;Name=Os01g0100200;Note=Conserved hypothetical protein. (Os01t0100200-01);Transcript variants=Os01t0100200-01
chr01 irgsp1_locus gene 11372 12284 . - . ID=Os01g0100300;Name=Os01g0100300;Note=Cytochrome P450 domain containing protein. (Os01t0100300-00);Transcript variants=Os01t0100300-00
transcript.gff
chr01 irgsp1_rep mRNA 2983 10815 . + . ID=Os01t0100100-01;Name=Os01t0100100-01;
chr01 irgsp1_rep five_prime_UTR 2983 3268 . + . Parent=Os01t0100100-01
chr01 irgsp1_rep five_prime_UTR 3354 3448 . + . Parent=Os01t0100100-01
chr01 irgsp1_rep CDS 3449 3616 . + . Parent=Os01t0100100-01
chr01 irgsp1_rep CDS 4357 4455 . + . Parent=Os01t0100100-01
transcript_exon.gff
chr01 irgsp1_rep mRNA 2983 10815 . + . ID=Os01t0100100-01;Name=Os01t0100100-01
chr01 irgsp1_rep exon 2983 3268 . + . Parent=Os01t0100100-01
chr01 irgsp1_rep exon 3354 3616 . + . Parent=Os01t0100100-01
chr01 irgsp1_rep exon 4357 4455 . + . Parent=Os01t0100100-01