DEXSeq Exon count not working, please help with an alternative like featureCounts
0
0
Entering edit mode
NIK • 0
@nik-23185
Last seen 13 months ago
United States

Hi,

I am using DEXSeq to analyze differential exon usage. I did my alignment on GRCm39 (Ensembl) with fasta and .gtf file from Ensembl. Alignment was done by STAR (see below). When I run dexseq_count.py to quantify the reads on every exon in my .bam files, I see that the output file contains very few to no reads for each exon. In contrast, when I import the same .bam file into Igv, I see thousands of reads on the same exons. Why does the python script count only a few dozen in comparison? I think my alignment has worked perfectly, it is just the counting with the python scripts that is the problem. In addition, when I try later to import the counts.txt file with DEXSeqDataSetFromHTSeq. I get an error that is only resolved when I go back to the .gff file and the counts.txt and remove all the quotation marks. I would appreciate any help I can get.

EDIT: Since I am not getting any answer, can sb suggest how I could get a count matrix like the one below, where I am having counts per exon? I guess it would be possible to get sth like this with featureCounts. However, when I tried, I do not get the numbering for each exon (I get sth like this: ENSMUSG00000025902.7, without the ":002")

Here is an example output from the counts table:

"ENSMUSG00000000001":"001"  0
"ENSMUSG00000000001":"002"  0
"ENSMUSG00000000001":"003"  0
"ENSMUSG00000000001":"004"  0
"ENSMUSG00000000001":"005"  0
"ENSMUSG00000000001":"006"  0

removing the quotation marks:

ENSMUSG00000000001:001  0
ENSMUSG00000000001:002  0
ENSMUSG00000000001:003  0
ENSMUSG00000000001:004  0
ENSMUSG00000000001:005  0
ENSMUSG00000000001:006  0

output count table (bottom)

_ambiguous 30288 _ambiguous_readpair_position 0 _empty 43052386 _lowaqual 0 _notaligned 0

####===Genome Generate via STAR===###
STAR  --runMode genomeGenerate --genomeDir ./genomes/GRCm39/ --genomeFastaFiles Mus_musculus.GRCm39.dna.primary_assembly.fa --sjdbGTFfile  Mus_musculus.GRCm39.106.gtf --sjdbOverhang 99 –-runThreadN 6 --genomeSAsparseD 2 --limitGenomeGenerateRAM 409086573952

####===Alignment via STAR===###
STAR --genomeLoad LoadAndExit --genomeDir GRCm39/
for i in $(ls *READ1* | sed s/-READ1-Sequences.fastq// | sort -u); 
do
STAR --runThreadN 5 --genomeDir GRCm39/ --genomeLoad LoadAndKeep --outSAMtype BAM SortedByCoordinate --readFilesIn .projects/1day_4/${i}-READ1-Sequences.fastq ./projects/day_4/${i}-READ2-Sequences.fastq --outFileNamePrefix ./projects/day_4/bam_files/${i}.
done
STAR --genomeLoad Remove --genomeDir /genomes/GRCm39/ 


####===generating gff file===###
python /.python_scripts/dexseq_prepare_annotation.py "./GRCm39_ENSEMBL/Mus_musculus.GRCm39.106.gtf" GRCm39.106.gff

####===counting exons===###
!/bin/bash
for i in ./bam_files_ENSEMBL/*.bam
 do
python ./python_scripts/dexseq_count.py  -p yes -f bam -r pos ./GRCm39_ENSEMBL/Flattened_GRCm39.gff $i ${i}_counted.txt
done


R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] writexl_1.4.0               pasilla_1.24.0              DEXSeq_1.42.0               RColorBrewer_1.1-3          BiocParallel_1.30.2        
 [6] edgeR_3.38.1                SingleCellExperiment_1.18.0 ensembldb_2.20.1            AnnotationFilter_1.20.0     GenomicFeatures_1.48.1     
[11] AnnotationDbi_1.58.0        limma_3.52.1                DESeq2_1.36.0               SummarizedExperiment_1.26.1 Biobase_2.56.0             
[16] MatrixGenerics_1.8.0        matrixStats_0.62.0          GenomicRanges_1.48.0        GenomeInfoDb_1.32.2         IRanges_2.30.0             
[21] S4Vectors_0.34.0            BiocGenerics_0.42.0        

loaded via a namespace (and not attached):
 [1] ProtGenerics_1.28.0      bitops_1.0-7             bit64_4.0.5              filelock_1.0.2           progress_1.2.2           httr_1.4.3              
 [7] tools_4.2.0              utf8_1.2.2               R6_2.5.1                 DBI_1.1.2                lazyeval_0.2.2           colorspace_2.0-3        
[13] tidyselect_1.1.2         prettyunits_1.1.1        bit_4.0.4                curl_4.3.2               compiler_4.2.0           cli_3.3.0               
[19] xml2_1.3.3               DelayedArray_0.22.0      rtracklayer_1.56.0       scales_1.2.0             genefilter_1.78.0        rappdirs_0.3.3          
[25] stringr_1.4.0            digest_0.6.29            Rsamtools_2.12.0         XVector_0.36.0           pkgconfig_2.0.3          dbplyr_2.1.1            
[31] fastmap_1.1.0            rlang_1.0.2              rstudioapi_0.13          RSQLite_2.2.14           BiocIO_1.6.0             generics_0.1.2          
[37] hwriter_1.3.2.1          dplyr_1.0.9              RCurl_1.98-1.6           magrittr_2.0.3           GenomeInfoDbData_1.2.8   Matrix_1.4-1            
[43] Rcpp_1.0.8.3             munsell_0.5.0            fansi_1.0.3              lifecycle_1.0.1          stringi_1.7.6            yaml_2.3.5              
[49] zlibbioc_1.42.0          BiocFileCache_2.4.0      grid_4.2.0               blob_1.2.3               parallel_4.2.0           crayon_1.5.1            
[55] lattice_0.20-45          Biostrings_2.64.0        splines_4.2.0            annotate_1.74.0          hms_1.1.1                KEGGREST_1.36.0         
[61] locfit_1.5-9.5           pillar_1.7.0             rjson_0.2.21             geneplotter_1.74.0       biomaRt_2.52.0           XML_3.99-0.9            
[67] glue_1.6.2               BiocManager_1.30.18      png_0.1-7                vctrs_0.4.1              gtable_0.3.0             purrr_0.3.4             
[73] assertthat_0.2.1         cachem_1.0.6             ggplot2_3.3.6            xtable_1.8-4             restfulr_0.0.13          survival_3.3-1          
[79] tibble_3.1.7             GenomicAlignments_1.32.0 memoise_2.0.1            statmod_1.4.36           ellipsis_0.3.2
splicing Exon featureCounts subread DEXSeq • 1.1k views
ADD COMMENT
0
Entering edit mode

OK, I found the problem. I used Truseq lib prep and therefore I had wrongly chosen the strand orientation. I needed to specify -s reverse in the python dexseq_count.py script. However, I still need to remove the ""

ADD REPLY

Login before adding your answer.

Traffic: 486 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6