My DEXSeqDataSetFromSE is returning an Error, but all of my checks to seem to indicate that this error is not present
0
0
Entering edit mode
@277cabd6
Last seen 17 hours ago
United States

I am using DEXSeq to perform a DEU analysis on some bulk RNA-Seq data but I am getting some strange behavior. My data comes from a transgenic animal model, so I've introduced a custom gtf file for the transgene. I am able to follow the vignette for DEXSeq well, but I went back today to try and update my summarized experiment object with rowRanges because I realized this didn't get done the first time.

Here is the code I am trying to run:

library(GenomicFeatures)  # for the exonicParts() function
library(txdbmaker)        # for the makeTxDbFromGFF() function
library(GenomicRanges)

## Flatten GTF
gtf <- makeTxDbFromGFF(file = "/custom.gtf")

flattened_gtf <- exonicParts(gtf, linked.to.single.gene.only = TRUE)

# Subset the relevant ranges
subset_ranges <- subsetByOverlaps(flattened_gtf, ranges = GRanges(seqnames = "chrMouse_BAC", ranges = IRanges(start = 36912, end = 39292)))

# 1. Adjust E23 and E24
# Extend the end of E23 by 50 base pairs
end(subset_ranges[2]) <- end(subset_ranges[2]) + 50

# Shorten the start of E24 by 50 base pairs
start(subset_ranges[3]) <- start(subset_ranges[3]) + 50

# 2. Merge E25 and E26
# Merge the ranges for E25 (row 4) and E26 (row 5)
merged_range <- reduce(subset_ranges[4:5]) # Combine the two ranges into one

# Update metadata for the merged range (using metadata from E25 as a base)
merged_metadata <- subset_ranges[4] # Use metadata from E25
start(merged_metadata) <- start(merged_range) # Update start coordinate
end(merged_metadata) <- end(merged_range)     # Update end coordinate

# Remove the original E25 and E26
subset_ranges <- subset_ranges[-c(4, 5)]

# Add the new merged range back to the subset
subset_ranges <- append(subset_ranges, merged_metadata)

# 3. Replace the modified subset into the original GRanges object
# Remove the original ranges from the full object
flattened_gtf <- flattened_gtf[!flattened_gtf %over% GRanges(seqnames = "chrMouse_BAC", ranges = IRanges(start = 36912, end = 39292))]

# Add the modified ranges back into the GRanges object
flattened_gtf <- c(flattened_gtf, subset_ranges)

names(flattened_gtf) =
  sprintf("%s:E%0.3d", flattened_gtf$gene_id, flattened_gtf$exonic_part)

This all works great, and I then move on to setting up my DEXSeq object

## Load in BAM files
library(GenomicAlignments)
library(DEXSeq)
library(stringr)
library(SummarizedExperiment)
library(BiocParallel)

# load metadata
meta <- read.table(file = "/meta.txt", sep = "\t", header = TRUE)
meta$Pedigree <- paste0("S", meta$Pedigree)

# read exon count data
all_files <- dir("/final_exon_overlaps", pattern = "*.txt$", full.names = TRUE)

counts_list <- lapply(X = all_files, read.table, header = TRUE)
names(counts_list) <- paste0("S", str_extract(string = basename(all_files), pattern = "[[:digit:]]+"))

# merge exon count data
counts_matrix <- do.call(cbind, counts_list)
colnames(counts_matrix) <- gsub(pattern = "X", replacement = "S", str_extract(string = colnames(counts_matrix), pattern = ".*(?=_Aligned)"))

# arrange metadata to match order of counts_matrix
new_order <- match(colnames(counts_matrix), meta$Pedigree)
meta <- meta[new_order,]

if( all(match(colnames(counts_matrix), meta$Pedigree) == seq_len(length(meta$Pedigree))) ){ print("Correct Order") }

# transform into a summarized experiment
counts_matrix <- as.matrix(counts_matrix)

row_data <- DataFrame(flattened_gtf)[-1]
col_data <- DataFrame(meta)
rownames(col_data) <- col_data$Pedigree

se <- SummarizedExperiment(assays = list(counts = counts_matrix), rowData = row_data, colData = col_data)
colData(se)$Condition <- factor(c("baseline", "ifn"))
#rowRanges(se) <- flattened_gtf

# perform DEXSeq analysis
# load a BCPARAM object to allow multi-threading
param <- MulticoreParam(4)

dxd <- DEXSeqDataSetFromSE(se, design = ~ Pedigree + exon + Condition:exon)

If I run DEXSeqDataSetFromSE after adjusting the rowRanges, then I get the following error:

Error in SummarizedExperiment(nCountData, colData = colData, rowRanges = featureRanges) : the rownames of the supplied assay(s) must be NULL or identical to those of the RangedSummarizedExperiment object (or derivative) to construct

but if I don't adjust the rowRanges then I get no error, and the analysis seems to work. I can't quite work out what is causing this behavior, because everything I can think to check seems to be okay; the lengths are equal, the rownames are equal, the order is the same. Is anyone able to make any recommendations?

sessionInfo( )

R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /n/app/openblas/0.3.19-gcc-9.2.0/lib/libopenblas_haswellp-r0.3.19.so;  LAPACK version 3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
 [1] stringr_1.5.1               DEXSeq_1.52.0
 [3] RColorBrewer_1.1-3          DESeq2_1.46.0
 [5] BiocParallel_1.40.0         GenomicAlignments_1.42.0
 [7] Rsamtools_2.22.0            Biostrings_2.74.1
 [9] XVector_0.46.0              SummarizedExperiment_1.36.0
[11] MatrixGenerics_1.18.1       matrixStats_1.5.0
[13] txdbmaker_1.2.1             GenomicFeatures_1.58.0
[15] AnnotationDbi_1.68.0        Biobase_2.66.0
[17] GenomicRanges_1.58.0        GenomeInfoDb_1.42.1
[19] IRanges_2.40.1              S4Vectors_0.44.0
[21] BiocGenerics_0.52.0

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1        dplyr_1.1.4             blob_1.2.4
 [4] filelock_1.0.3          bitops_1.0-9            fastmap_1.2.0
 [7] RCurl_1.98-1.16         BiocFileCache_2.14.0    XML_3.99-0.18
[10] digest_0.6.37           lifecycle_1.0.4         statmod_1.5.0
[13] survival_3.5-8          KEGGREST_1.46.0         RSQLite_2.3.9
[16] magrittr_2.0.3          genefilter_1.88.0       compiler_4.4.0
[19] rlang_1.1.4             progress_1.2.3          tools_4.4.0
[22] utf8_1.2.4              yaml_2.3.10             rtracklayer_1.66.0
[25] prettyunits_1.2.0       S4Arrays_1.6.0          bit_4.5.0.1
[28] curl_4.3.2              DelayedArray_0.32.0     xml2_1.3.6
[31] abind_1.4-8             hwriter_1.3.2.1         grid_4.4.0
[34] fansi_1.0.6             xtable_1.8-4            colorspace_2.1-1
[37] ggplot2_3.5.1           scales_1.3.0            biomaRt_2.62.0
[40] cli_3.6.3               crayon_1.5.3            generics_0.1.3
[43] httr_1.4.7              rjson_0.2.23            DBI_1.2.3
[46] cachem_1.1.0            splines_4.4.0           zlibbioc_1.52.0
[49] parallel_4.4.0          restfulr_0.0.15         vctrs_0.6.5
[52] Matrix_1.7-0            jsonlite_1.8.9          geneplotter_1.84.0
[55] hms_1.1.3               bit64_4.5.2             locfit_1.5-9.10
[58] annotate_1.84.0         glue_1.8.0              codetools_0.2-20
[61] stringi_1.8.4           gtable_0.3.6            BiocIO_1.16.0
[64] UCSC.utils_1.2.0        munsell_0.5.1           tibble_3.2.1
[67] pillar_1.9.0            rappdirs_0.3.3          GenomeInfoDbData_1.2.13
[70] R6_2.5.1                dbplyr_2.5.0            httr2_1.0.7
[73] lattice_0.22-6          png_0.1-8               memoise_2.0.1
[76] Rcpp_1.0.14             SparseArray_1.6.0       pkgconfig_2.0.3
DifferentialSplicing RNASeqData DEXSeq • 35 views
ADD COMMENT

Login before adding your answer.

Traffic: 752 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6