I ran genome-wide association studies for wheat quality traits in wheat (Triticum aestivum) which returned a few significant regions of the genome. I would like to extract the genes within those regions using the version 2.1 of the wheat reference genome, using the "rtracklayer". I do not know where to start, I consulted the package documentation and I still don't understand it.
You don't need rtracklayer for this (not directly anyway), but instead you need txdbmaker which you can use to make a TxDb object that contains the gene locations, etc. There are (at least) two annotation services that have gene data for wheat; NCBI and Ensembl. I imagine MSU has one as well, so it depends on what genome you generated your SNP data from. Ideally you stick with the same annotation service because they all do things slightly differently, and trying to map things between any of them is a fool's errand. I generally use NCBI annotations, but they have a (terrible, IMO) habit of naming chromosomes using RefSeq IDs, and I don't know an easy way of changing the chromosomes, so I will show you how to use Ensembl instead.
> library(txdbmaker)
>> txdb <- makeTxDbFromGFF("https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-60/gff3/triticum_aestivum/Triticum_aestivum.IWGSC.60.gff3.gz")
Import genomic features from the file as a GRanges object ... trying URL 'https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-60/gff3/triticum_aestivum/Triticum_aestivum.IWGSC.60.gff3.gz'
Content type 'application/x-gzip' length 24463854 bytes (23.3 MB)
downloaded 23.3 MB
OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning message:
In .extract_exons_from_GRanges(exon_IDX, gr, mcols0, tx_IDX, feature = "exon", :
68 exons couldn't be linked to a
transcript so were dropped
(showing only the first 6):
seqid start end strand
1 1A 326329197 326329304 +
2 1A 366114101 366114204 +
3 1A 403039760 403039877 -
4 1B 357916232 357916339 -
5 1B 395058787 395058890 +
6 1B 432758094 432758212 +
ID Name
1 <NA> ENSRNA050014038-E1
2 <NA> ENSRNA050013952-E1
3 <NA> ENSRNA050013948-E1
4 <NA> ENSRNA050017309-E1
5 <NA> ENSRNA050019912-E1
6 <NA> ENSRNA050022692-E1
Parent
1 transcript:ENSRNA050014038-T1
2 transcript:ENSRNA050013952-T1
3 transcript:ENSRNA050013948-T1
4 transcript:ENSRNA050017309-T1
5 transcript:ENSRNA050019912-T1
6 transcript:ENSRNA050022692-T1
Parent_type
1 <NA>
2 <NA>
3 <NA>
4 <NA>
5 <NA>
6 <NA>
> genes(txdb)
GRanges object with 120676 ranges and 1 metadata column:
seqnames
<Rle>
ENSRNA050007810 3A
ENSRNA050007817 3A
ENSRNA050007820 3A
ENSRNA050007823 3A
ENSRNA050007826 3A
... ...
TraesCSU02G272776 Un
TraesCSU02G272800 Un
TraesCSU02G272807 Un
TraesCSU02G272900 Un
TraesCSU02G272997 Un
ranges
<IRanges>
ENSRNA050007810 8594746-8594817
ENSRNA050007817 8598700-8598771
ENSRNA050007820 19771253-19771326
ENSRNA050007823 27007038-27007108
ENSRNA050007826 32161321-32161392
... ...
TraesCSU02G272776 478410340-478410699
TraesCSU02G272800 478934578-478934940
TraesCSU02G272807 478901536-478902039
TraesCSU02G272900 479850130-479850488
TraesCSU02G272997 480840068-480840121
strand |
<Rle> |
ENSRNA050007810 + |
ENSRNA050007817 + |
ENSRNA050007820 + |
ENSRNA050007823 + |
ENSRNA050007826 + |
... ... .
TraesCSU02G272776 - |
TraesCSU02G272800 + |
TraesCSU02G272807 - |
TraesCSU02G272900 + |
TraesCSU02G272997 + |
gene_id
<character>
ENSRNA050007810 ENSRNA050007810
ENSRNA050007817 ENSRNA050007817
ENSRNA050007820 ENSRNA050007820
ENSRNA050007823 ENSRNA050007823
ENSRNA050007826 ENSRNA050007826
... ...
TraesCSU02G272776 TraesCSU02G272776
TraesCSU02G272800 TraesCSU02G272800
TraesCSU02G272807 TraesCSU02G272807
TraesCSU02G272900 TraesCSU02G272900
TraesCSU02G272997 TraesCSU02G272997
-------
seqinfo: 22 sequences from an unspecified genome; no seqlengths
Now is where you have to take over. I showed you how to get a GRanges of the gene locations. You can do all sorts of things with these objects and it's well beyond the scope of a support site post to cover them. You can read the vignettes for GenomicFeatures and GenomicRanges and maybe GenomicAlignments.