Hi,
I am trying to produce ReadCount values for a RNA-Seq dataset using the Rsubread package.
I first converted each SRA file from the GEO site for the dataset to a fastq file. I then concatenated all of the fastq files into one file. Then I gzipped the file.
I then used the rsubread package to produce readcount values:
Here is my code:
buildindex(basename="reference_indexreal",reference="hg19.fa", colorspace=TRUE) align(index="reference_indexreal",readfile1="GSE55296.fastq.gz",output_file="alignResults55296.BAM")
#The align function had this output
Processed : 101042942 reads
Mapped : 15 reads (0.0%)
Indels : 0
Running time : 259.9 minutes
#I then used the featureCounts method to produce read count values
fc_SE <- featureCounts("alignResults55296.BAM",annot.inbuilt="hg19")
#This was the output
//================================= Running ==================================\\
|| ||
|| Load annotation file /Library/Frameworks/R.framework/Versions/3.2/Reso ... ||
|| Features : 225074 ||
|| Meta-features : 25702 ||
|| Chromosomes : 52 ||
|| ||
|| Process BAM file alignResults55296.BAM... ||
|| Single-end reads are included. ||
|| Assign reads to features... ||
|| Total reads : 101042942 ||
|| Successfully assigned reads : 8 (0.0%) ||
|| Running time : 2.38 minutes ||
|| ||
|| Read assignment finished. ||
|| ||
\\===================== http://subread.sourceforge.net/ ======================//
Then I tried to print the count values and got the result:
fc_SE$counts
alignResults55296.BAM
653635 0
100422834 0
645520 0
79501 0
Please advise on how I should fix this. Thanks in advance.
Wei Shi is the one to help you. In the meantime, you say you used the sratoolkit to convert to fastq files. It would probably be helpful to see the commands you used. Much of what you have done doesn't make sense to me. Did you really take 32 samples and pile the data into one large fastq file? I have no idea why you would do that. These are paired end data, so I would instead expect you to have used fastq-dump to convert into 64 individual fastq files, not one. Is there a reason for doing that?