Hi All,
I am using QuasR to analyze mt paired end RNAseq data, now I have bam files for my 12 samples and the next step is to generate a count table for Differential gene expression analysis. I follwed the QuasR vignette and I used qCount() but I had a problem (R could not find my genome file)
sampleFile <- "DOX_RNAseq.txt"
genomeFile<-"BSgenome.Hsapiens.NCBI.GRCh38"
proj_RNASeq<- qAlign(sampleFile, genomeFile,cacheDir = "E:/Doxorubicin Project RNA-seq Data")
library(GenomicFeatures)
annotationFile <- "BSgenome.Hsapiens.NCBI.GRCh38_annotation.gtf"
chrLen <- scanFaIndex(genomeFile)
Error in scanFaIndex(open(FaFile(file))) :
error in evaluating the argument 'file' in selecting a method for function 'scanFaIndex': Error in .io_check_exists(path(con)) : file(s) do not exist:
'BSgenome.Hsapiens.NCBI.GRCh38'
Thanks
Tarek
Hi Hans,
Thanks for you reply
-Sorry for the confusion, actually I am trying to generate a TxDB as query. According to QausR vignette they generate a TxDB using this script. They used scanFaIndex first then they proceed till TxDb
library(GenomicFeatures)
> annotFile <- "extdata/hg19sub_annotation.gtf"
> chrLen <- scanFaIndex(genomeFile)
> chrominfo <- data.frame(chrom=as.character(seqnames(chrLen)),length=width(chrLen),is_circular=rep(FALSE, length(chrLen)))
> txdb <- makeTxDbFromGFF(file=annotFile, format="gtf",chrominfo=chrominfo, dataSource="Ensembl",
organism="Homo sapiens")
- Can I get annotation file from BSgenome? so that I can use makeTxDbFromGFF () to get TxDb
- Alternatively, I tried to generate a TxDb from biomart which was successful, but when calling qCount() I got an error (see below)
>TxDb<- makeTxDbFromBiomart(dataset="hsapiens_gene_ensembl")
>TxDb
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: BioMart
# Organism: Homo sapiens
# Taxonomy ID: 9606
# Resource URL: www.ensembl.org:80
# BioMart database: ENSEMBL_MART_ENSEMBL
# BioMart database version: Ensembl Genes 83
# BioMart dataset: hsapiens_gene_ensembl
# BioMart dataset description: Homo sapiens genes (GRCh38.p5)
# BioMart dataset version: GRCh38.p5
# Full dataset: yes
# miRBase build ID: NA
# transcript_nrow: 217425
# exon_nrow: 735779
# cds_nrow: 295205
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2016-01-15 14:29:41 -0600 (Fri, 15 Jan 2016)
# GenomicFeatures version at creation time: 1.22.7
# RSQLite version at cr
>proj_RNASeq_geneLevels <- qCount(proj_RNASeq, TxDb, reportLevel="gene")
Error in qCount(proj_RNASeq, TxDb, reportLevel = "gene") :
sequence levels in 'query' not found in alignment files: CHR_HG126_PATCH, CHR_HG1342_HG2282_PATCH, CHR_HG1362_PATCH, CHR_HG142_HG150_NOVEL_TEST, CHR_HG151_NOVEL_TEST, CHR_HG1651_PATCH, CHR_HG1832_PATCH, CHR_HG2021_PATCH, CHR_HG2022_PATCH, CHR_HG2030_PATCH, CHR_HG2058_PATCH, CHR_HG2062_PATCH, CHR_HG2066_PATCH, CHR_HG2072_PATCH, CHR_HG2095_PATCH, CHR_HG2104_PATCH, CHR_HG2116_PATCH, CHR_HG2128_PATCH, CHR_HG2191_PATCH, CHR_HG2213_PATCH, CHR_HG2216_PATCH, CHR_HG2217_PATCH, CHR_HG2232_PATCH, CHR_HG2233_PATCH, CHR_HG2235_PATCH, CHR_HG2237_PATCH, CHR_HG2239_PATCH, CHR_HG2241_PATCH, CHR_HG2242_HG2243_PATCH, CHR_HG2244_HG2245_PATCH, CHR_HG2247_PATCH, CHR_HG2249_PATCH, CHR_HG2288_HG2289_PATCH, CHR_HG2290_PATCH, CHR_HG2291_PATCH, CHR_HG2334_PATCH, CHR_HG23_PATCH, CHR_HG26_PATCH, CHR_HG986_PATCH, CHR_HSCHR10_1_CTG1, CHR_HSCHR10_1_CTG2, CHR_HSCHR10_1_CTG3, CHR_HSCHR10_1_CTG4, CHR_HSCHR10_1_CTG6, CHR_HSCHR11_1_CTG1_1, CHR_HSCHR11_1_CTG1_2, CHR_HSCHR11_1_CTG2, CHR_HSCHR11_1_CTG3, CHR_HSCHR11
Tarek
Hi Tarek
In the QuasR vignette, they can use 'chrLen <- scanFaIndex(genomeFile)' because "genome" file was defined like this:'genomeFile <- "extdata/hg19sub.fa"'.
Have you tried making a local fasta file out of the BSgenome package? and then called scanFaIndex , as I was suggesting last week?
WRT you new questions:
- BSgenome packages do not contain gene annotation
- yes, you can make a TxDb from Biomart, but you have to be careful, to make sure, it is the same assembly and the individual chromosome are called the same. Which is obviously not the case in your situation: "sequence levels in 'query' not found in alignment files"
Hans-Rudolf
Hi Hans,
Thanks a lot, I downloaded the annotation file from FTP ensemble download page (same asembly) and it worked.
>genome<-export(BSgenome.Hsapiens.NCBI.GRCh38, "BSgenome.Hsapiens.NCBI.GRCh38.fasta")
>genome_fa<-scanFaIndex("BSgenome.Hsapiens.NCBI.GRCh38.fasta")
>annotationfile<-"Homo_sapiens.GRCh38.83.chr.gtf.gz"
>chrominfo <- data.frame(chrom=as.character(seqnames(genome_fa)),length=width(genome_fa),is_circular=rep(FALSE, length(genome_fa)))
>txdb <- makeTxDbFromGFF(file=annotationfile, format="gtf",chrominfo=chrominfo,dataSource="NCBI",organism="Homo sapiens")
Tarek