HTSeq-count on several gff3/gtf files for use in DESeq2
2
0
Entering edit mode
Jon Bråte ▴ 260
@jon-brate-6263
Last seen 4 months ago
Norway

I have several gtf-files and a gff3 file representing different sets of genes. I want to count the expression using HTSeq-count and input them all to DESeq2. But I am not sure what is the best approach. I was thinking that I could simply concatenate all the gtf and gff3 files, but some of the gtf-files have some overlapping gene names (with different isoforms), and the gff3-file will not be identical if I convert i to gtf. And if I count all the files separately, can I then concatenate the files later? What about the last few "special" lines produced by HTSeq if I use the HTSeq-import function in DESeq2?

I might also use Cuffdiff for comparison later, so I guess my question will also apply to Cuffdiff.

htseqtools deseq2 counts cuffdiff • 3.6k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 18 hours ago
United States

I'm not sure exactly what's the best approach here. htseq-count will count against the complete set of exons of a gene, so having multiple isoforms with the same gene name in both files is not a problem (even with the same exon listed twice).

So it seems the best approach would be to produce a combined GTF file, although I don't have advice on how to do this. You might try biostars.org for advice.

The special lines are ignored by DESeq2 when reading in from htseq-count.

I'd recommend a separate post for each software. The developers are pinged with an email when you post and tag with the software name, so it's sending out extra emails to busy developers.

ADD COMMENT
0
Entering edit mode
Jon Bråte ▴ 260
@jon-brate-6263
Last seen 4 months ago
Norway

Thanks! I found that combining gtf-files can be a bit tricky. Especially when there is are a mix of gff and gtf files. For the moment I count all the files separately with htseq and import one by one into DESeq2 and create a deseqdataset before merging the count matrices. Then I estimate size factors and dispersions on the combined counts. Probably there is a smoother way though.

ADD COMMENT
1
Entering edit mode

If you want to combine your annotation files, an alternative way will be to read in your gtf/gff files into R and then you will get Data Frame objects for your annotation. You can then combine your data frames and run featureCounts in Rsubread package to get counts. FeatureCounts only needs to have five columns of annotation data including gene id, chr, start, end and stand. Therefore your data frames can be easily merged. Type ?featureCounts for more details.

ADD REPLY

Login before adding your answer.

Traffic: 631 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6