Question

DEXSeq Exon count not working, please help with an alternative like featureCounts

0

Entering edit mode

NIK • 0

@nik-23185

Last seen 13 months ago

United States

Hi,

I am using DEXSeq to analyze differential exon usage. I did my alignment on GRCm39 (Ensembl) with fasta and .gtf file from Ensembl. Alignment was done by STAR (see below). When I run dexseq_count.py to quantify the reads on every exon in my .bam files, I see that the output file contains very few to no reads for each exon. In contrast, when I import the same .bam file into Igv, I see thousands of reads on the same exons. Why does the python script count only a few dozen in comparison? I think my alignment has worked perfectly, it is just the counting with the python scripts that is the problem. In addition, when I try later to import the counts.txt file with DEXSeqDataSetFromHTSeq. I get an error that is only resolved when I go back to the .gff file and the counts.txt and remove all the quotation marks. I would appreciate any help I can get.

EDIT: Since I am not getting any answer, can sb suggest how I could get a count matrix like the one below, where I am having counts per exon? I guess it would be possible to get sth like this with featureCounts. However, when I tried, I do not get the numbering for each exon (I get sth like this: ENSMUSG00000025902.7, without the ":002")

Here is an example output from the counts table:

"ENSMUSG00000000001":"001"  0
"ENSMUSG00000000001":"002"  0
"ENSMUSG00000000001":"003"  0
"ENSMUSG00000000001":"004"  0
"ENSMUSG00000000001":"005"  0
"ENSMUSG00000000001":"006"  0

removing the quotation marks:

ENSMUSG00000000001:001  0
ENSMUSG00000000001:002  0
ENSMUSG00000000001:003  0
ENSMUSG00000000001:004  0
ENSMUSG00000000001:005  0
ENSMUSG00000000001:006  0

output count table (bottom)

_ambiguous 30288 _ambiguous_readpair_position 0 _empty 43052386 _lowaqual 0 _notaligned 0

####===Genome Generate via STAR===###
STAR  --runMode genomeGenerate --genomeDir ./genomes/GRCm39/ --genomeFastaFiles Mus_musculus.GRCm39.dna.primary_assembly.fa --sjdbGTFfile  Mus_musculus.GRCm39.106.gtf --sjdbOverhang 99 –-runThreadN 6 --genomeSAsparseD 2 --limitGenomeGenerateRAM 409086573952

####===Alignment via STAR===###
STAR --genomeLoad LoadAndExit --genomeDir GRCm39/
for i in $(ls *READ1* | sed s/-READ1-Sequences.fastq// | sort -u); 
do
STAR --runThreadN 5 --genomeDir GRCm39/ --genomeLoad LoadAndKeep --outSAMtype BAM SortedByCoordinate --readFilesIn .projects/1day_4/${i}-READ1-Sequences.fastq ./projects/day_4/${i}-READ2-Sequences.fastq --outFileNamePrefix ./projects/day_4/bam_files/${i}.
done
STAR --genomeLoad Remove --genomeDir /genomes/GRCm39/ 


####===generating gff file===###
python /.python_scripts/dexseq_prepare_annotation.py "./GRCm39_ENSEMBL/Mus_musculus.GRCm39.106.gtf" GRCm39.106.gff

####===counting exons===###
!/bin/bash
for i in ./bam_files_ENSEMBL/*.bam
 do
python ./python_scripts/dexseq_count.py  -p yes -f bam -r pos ./GRCm39_ENSEMBL/Flattened_GRCm39.gff $i ${i}_counted.txt
done


R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] writexl_1.4.0               pasilla_1.24.0              DEXSeq_1.42.0               RColorBrewer_1.1-3          BiocParallel_1.30.2        
 [6] edgeR_3.38.1                SingleCellExperiment_1.18.0 ensembldb_2.20.1            AnnotationFilter_1.20.0     GenomicFeatures_1.48.1     
[11] AnnotationDbi_1.58.0        limma_3.52.1                DESeq2_1.36.0               SummarizedExperiment_1.26.1 Biobase_2.56.0             
[16] MatrixGenerics_1.8.0        matrixStats_0.62.0          GenomicRanges_1.48.0        GenomeInfoDb_1.32.2         IRanges_2.30.0             
[21] S4Vectors_0.34.0            BiocGenerics_0.42.0        

loaded via a namespace (and not attached):
 [1] ProtGenerics_1.28.0      bitops_1.0-7             bit64_4.0.5              filelock_1.0.2           progress_1.2.2           httr_1.4.3              
 [7] tools_4.2.0              utf8_1.2.2               R6_2.5.1                 DBI_1.1.2                lazyeval_0.2.2           colorspace_2.0-3        
[13] tidyselect_1.1.2         prettyunits_1.1.1        bit_4.0.4                curl_4.3.2               compiler_4.2.0           cli_3.3.0               
[19] xml2_1.3.3               DelayedArray_0.22.0      rtracklayer_1.56.0       scales_1.2.0             genefilter_1.78.0        rappdirs_0.3.3          
[25] stringr_1.4.0            digest_0.6.29            Rsamtools_2.12.0         XVector_0.36.0           pkgconfig_2.0.3          dbplyr_2.1.1            
[31] fastmap_1.1.0            rlang_1.0.2              rstudioapi_0.13          RSQLite_2.2.14           BiocIO_1.6.0             generics_0.1.2          
[37] hwriter_1.3.2.1          dplyr_1.0.9              RCurl_1.98-1.6           magrittr_2.0.3           GenomeInfoDbData_1.2.8   Matrix_1.4-1            
[43] Rcpp_1.0.8.3             munsell_0.5.0            fansi_1.0.3              lifecycle_1.0.1          stringi_1.7.6            yaml_2.3.5              
[49] zlibbioc_1.42.0          BiocFileCache_2.4.0      grid_4.2.0               blob_1.2.3               parallel_4.2.0           crayon_1.5.1            
[55] lattice_0.20-45          Biostrings_2.64.0        splines_4.2.0            annotate_1.74.0          hms_1.1.1                KEGGREST_1.36.0         
[61] locfit_1.5-9.5           pillar_1.7.0             rjson_0.2.21             geneplotter_1.74.0       biomaRt_2.52.0           XML_3.99-0.9            
[67] glue_1.6.2               BiocManager_1.30.18      png_0.1-7                vctrs_0.4.1              gtable_0.3.0             purrr_0.3.4             
[73] assertthat_0.2.1         cachem_1.0.6             ggplot2_3.3.6            xtable_1.8-4             restfulr_0.0.13          survival_3.3-1          
[79] tibble_3.1.7             GenomicAlignments_1.32.0 memoise_2.0.1            statmod_1.4.36           ellipsis_0.3.2

splicing Exon featureCounts subread DEXSeq • 1.1k views

ADD COMMENT • link 2.5 years ago NIK • 0

0

Entering edit mode

OK, I found the problem. I used Truseq lib prep and therefore I had wrongly chosen the strand orientation. I needed to specify -s reverse in the python dexseq_count.py script. However, I still need to remove the ""

ADD REPLY • link 2.5 years ago NIK • 0