Question

gff3 file for featureCounts

0

Entering edit mode

selmanurkeskin • 0

@a7dd715a

Last seen 2.7 years ago

Turkey

Hi, I read in this site that gff3 format is also acceptable for featureCounts and I don't have gtf but only gff3 file, but it doesn't work.

featureCounts -T 6 -t exon -g gene_id -Q 30 -F GTF -a medtr.R108_HM340.gnm1.ann1.85YW.gcv_genes.gff3  -o htseqresults.txt  
/w_NF18553root_sorted.bam

and the output is this

Load annotation file medtr.R108_HM340.gnm1.ann1.85YW.gcv_genes.gff3 ...    ||
ERROR: no features were loaded in format GTF. The annotation format can be specified by the '-F' option, and the required feature type can be specified by the '-t' option..
The porgram has to terminate.

gtf/gff3 subread featureCounts • 4.8k views

ADD COMMENT • link updated 2.7 years ago by Gordon Smyth 52k • written 2.7 years ago by selmanurkeskin • 0

score 0 · Answer 1 · 2022-07-12

You are using the command line featureCounts function, which is not part of Bioconductor. Looking at the help for that version, I don't see any mention of GFF file format being acceptable. The featureCounts function in the Bioconductor Rsubread package does say it will accept a GFF file as input, so you might try that.

However, the GFF file you are using is probably the wrong one. I believe you want medtr.R108_HM340.gnm1.ann1.85YW.gene_models_main.gff3.gz instead.

An alternative to using GFF (if it doesn't work, that is. I have no experience with using a GFF personally), you could always make a SAF file.

> library(GenomicFeatures)
> download.file("https://peanutbase.org/data/v1/Medicago_truncatula/R108_HM340.gnm1.ann1.85YW/medtr.R108_HM340.gnm1.ann1.85YW.gene_models_main.gff3.gz", "peanut.gff3.gz")
trying URL 'https://peanutbase.org/data/v1/Medicago_truncatula/R108_HM340.gnm1.ann1.85YW/medtr.R108_HM340.gnm1.ann1.85YW.gene_models_main.gff3.gz'
Content type 'application/x-gzip' length 12367722 bytes (11.8 MB)
downloaded 11.8 MB

## Make a TxDb object first

> txdb <- makeTxDbFromGFF("peanut.gff3.gz")
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK

## Get the exon data

> ex <- exonsBy(txdb, "gene")

## make an SAF data.frame

> len <- lengths(ex)
> ex2 <- unlist(ex)
> saf <- data.frame(GeneID = rep(names(len), len), Chr = seqnames(ex2), Start = start(ex2), End = end(ex2), Strand = as.character(strand(ex2)))
> head(saf)
            GeneID                          Chr Start   End Strand
1 BZG31_000s000010 medtr.R108_HM340.gnm1.scf000 16103 16197      -
2 BZG31_000s000010 medtr.R108_HM340.gnm1.scf000 16334 16401      -
3 BZG31_000s000010 medtr.R108_HM340.gnm1.scf000 16938 17116      -
4 BZG31_000s000010 medtr.R108_HM340.gnm1.scf000 17188 17196      -
5 BZG31_000s000020 medtr.R108_HM340.gnm1.scf000 18559 19824      -
6 BZG31_000s000030 medtr.R108_HM340.gnm1.scf000 21927 21971      +
>

And then you can use that directly. See ?featureCounts for more information.