Using the chrAliases
flag in Rsubread featureCounts is not working as expected.
Single-end reads mapped to genome file
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA000001405.15GRCh38/GCA000001405.15GRCh38_genomic.fna.gz
& annotation file ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA000001405.15GRCh38/seqsforalignmentpipelines.ucscids/GCA000001405.15GRCh38fullanalysisset.refseqannotation.gtf.gz
featureCounts code as follows:
file.output<-featureCounts(files="input.bam",
+ annot.ext="GCA_000001405.15_GRCh38_full_analysis_set.refseq_annotation.gtf.gz",
+ isGTFAnnotationFile = TRUE, chrAliases = "chrAliaseskg.csv",
+ useMetaFeatures=TRUE, nthreads=8, allowMultiOverlap=TRUE, strandSpecific=2, verbose=TRUE)
chrAliaseskg.csv file extracted from ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA000001405.15GRCh38/GCA000001405.15GRCh38assemblyreport+ucsc_names.txt
head chrAliaseskg.csv
chr1,CM000663.2
chr2,CM000664.2
chr3,CM000665.2
chr4,CM000666.2
chr5,CM000667.2
chr6,CM000668.2
chr7,CM000669.2
chr8,CM000670.2
chr9,CM000671.2
chr10, CM000672.2
Error file reads:
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| S Hr02_ASO_kiga006_L001_acceptedhitssrt.bam ||
|| ||
|| Annotation : GCA_000001405.15_GRCh38_full_analysis_set.re ... ||
|| Chromosome alias file : chrAliaseskg.csv ||
|| Dir for temp files : . ||
|| Threads : 8 ||
|| Level : meta-feature level ||
|| Paired-end : no ||
|| Multimapping reads : counted ||
|| Multi-overlapping reads : counted ||
|| Min overlapping bases : 1 ||
|| ||
\\===================== http://subread.sourceforge.net/ ======================//
//================================= Running ==================================\\
|| ||
|| 455 chromosome name aliases are loaded. ||
|| Load annotation file GCA_000001405.15_GRCh38_full_analysis_set.refseq_ ... ||
|| Features : 1935025 ||
|| Meta-features : 43388 ||
|| Chromosomes/contigs : 303 ||
|| ||
|| Process BAM file Hr02_ASO_kiga006_L001_acceptedhitssrt.bam... ||
|| Single-end reads are included. ||
|| Strand specific : reversely stranded ||
|| Assign alignments to features... ||
|| ||
|| Chromosomes/contigs in annotation but not in input file ||
|| chr14_KI270847v1_alt ||
|| chr1_GL383520v2_alt ||
|| chr19_GL949748v2_alt ||
|| chr4_KI270785v1_alt ||
|| chr15_KI270906v1_alt ||
|| chr19_KI270889v1_alt ||
.............
|| Chromosomes/contigs in input file but not in annotation ||
|| KI270795.1 ||
|| KI270853.1 ||
|| KI270425.1 ||
|| J01415.2 ||
|| KI270731.1 ||
|| KI270303.1 ||
|| GL383530.1 ||
|| KI270849.1 ||
............
|| Total alignments : 3151547 ||
|| Successfully assigned alignments : 0 (0.0%) ||
|| Running time : 0.01 minutes ||
|| ||
It looks the way you ran featureCounts is correct. What it is the end-of-line character in chrAliaseskg.csv? It is '\n' or '\r\n'?
Looks like it ends with '\r\n', not '\n'. There's nothing in the Rsubread instructions to indicate that it matters. Does it?
The carriage return may affect things - you can try
dos2unix chrAliaseskg.csv
to see if that fixes it. Or, viased
:Also, I don't know if this entry in your file will affect anything:
That extra space should not be there.
Sorry for having this issue... Indeed it wasn't documented, and it causes problems to Windows users. I think a better way is to make it compatible with both "\n" and "\r\n" systems.