Question

My DEXSeqDataSetFromSE is returning an Error, but all of my checks to seem to indicate that this error is not present

0

Entering edit mode

cftattersfield • 0

@277cabd6

Last seen 8 weeks ago

United States

I am using DEXSeq to perform a DEU analysis on some bulk RNA-Seq data but I am getting some strange behavior. My data comes from a transgenic animal model, so I've introduced a custom gtf file for the transgene. I am able to follow the vignette for DEXSeq well, but I went back today to try and update my summarized experiment object with rowRanges because I realized this didn't get done the first time.

Here is the code I am trying to run:

library(GenomicFeatures)  # for the exonicParts() function
library(txdbmaker)        # for the makeTxDbFromGFF() function
library(GenomicRanges)

## Flatten GTF
gtf <- makeTxDbFromGFF(file = "/custom.gtf")

flattened_gtf <- exonicParts(gtf, linked.to.single.gene.only = TRUE)

# Subset the relevant ranges
subset_ranges <- subsetByOverlaps(flattened_gtf, ranges = GRanges(seqnames = "chrMouse_BAC", ranges = IRanges(start = 36912, end = 39292)))

# 1. Adjust E23 and E24
# Extend the end of E23 by 50 base pairs
end(subset_ranges[2]) <- end(subset_ranges[2]) + 50

# Shorten the start of E24 by 50 base pairs
start(subset_ranges[3]) <- start(subset_ranges[3]) + 50

# 2. Merge E25 and E26
# Merge the ranges for E25 (row 4) and E26 (row 5)
merged_range <- reduce(subset_ranges[4:5]) # Combine the two ranges into one

# Update metadata for the merged range (using metadata from E25 as a base)
merged_metadata <- subset_ranges[4] # Use metadata from E25
start(merged_metadata) <- start(merged_range) # Update start coordinate
end(merged_metadata) <- end(merged_range)     # Update end coordinate

# Remove the original E25 and E26
subset_ranges <- subset_ranges[-c(4, 5)]

# Add the new merged range back to the subset
subset_ranges <- append(subset_ranges, merged_metadata)

# 3. Replace the modified subset into the original GRanges object
# Remove the original ranges from the full object
flattened_gtf <- flattened_gtf[!flattened_gtf %over% GRanges(seqnames = "chrMouse_BAC", ranges = IRanges(start = 36912, end = 39292))]

# Add the modified ranges back into the GRanges object
flattened_gtf <- c(flattened_gtf, subset_ranges)

names(flattened_gtf) =
  sprintf("%s:E%0.3d", flattened_gtf$gene_id, flattened_gtf$exonic_part)

This all works great, and I then move on to setting up my DEXSeq object

## Load in BAM files
library(GenomicAlignments)
library(DEXSeq)
library(stringr)
library(SummarizedExperiment)
library(BiocParallel)

# load metadata
meta <- read.table(file = "/meta.txt", sep = "\t", header = TRUE)
meta$Pedigree <- paste0("S", meta$Pedigree)

# read exon count data
all_files <- dir("/final_exon_overlaps", pattern = "*.txt$", full.names = TRUE)

counts_list <- lapply(X = all_files, read.table, header = TRUE)
names(counts_list) <- paste0("S", str_extract(string = basename(all_files), pattern = "[[:digit:]]+"))

# merge exon count data
counts_matrix <- do.call(cbind, counts_list)
colnames(counts_matrix) <- gsub(pattern = "X", replacement = "S", str_extract(string = colnames(counts_matrix), pattern = ".*(?=_Aligned)"))

# arrange metadata to match order of counts_matrix
new_order <- match(colnames(counts_matrix), meta$Pedigree)
meta <- meta[new_order,]

if( all(match(colnames(counts_matrix), meta$Pedigree) == seq_len(length(meta$Pedigree))) ){ print("Correct Order") }

# transform into a summarized experiment
counts_matrix <- as.matrix(counts_matrix)

row_data <- DataFrame(flattened_gtf)[-1]
col_data <- DataFrame(meta)
rownames(col_data) <- col_data$Pedigree

se <- SummarizedExperiment(assays = list(counts = counts_matrix), rowData = row_data, colData = col_data)
colData(se)$Condition <- factor(c("baseline", "ifn"))
#rowRanges(se) <- flattened_gtf

# perform DEXSeq analysis
# load a BCPARAM object to allow multi-threading
param <- MulticoreParam(4)

dxd <- DEXSeqDataSetFromSE(se, design = ~ Pedigree + exon + Condition:exon)

If I run DEXSeqDataSetFromSE after adjusting the rowRanges, then I get the following error:

Error in SummarizedExperiment(nCountData, colData = colData, rowRanges = featureRanges) : the rownames of the supplied assay(s) must be NULL or identical to those of the RangedSummarizedExperiment object (or derivative) to construct

but if I don't adjust the rowRanges then I get no error, and the analysis seems to work. I can't quite work out what is causing this behavior, because everything I can think to check seems to be okay; the lengths are equal, the rownames are equal, the order is the same. Is anyone able to make any recommendations?

sessionInfo( )

R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /n/app/openblas/0.3.19-gcc-9.2.0/lib/libopenblas_haswellp-r0.3.19.so;  LAPACK version 3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
 [1] stringr_1.5.1               DEXSeq_1.52.0
 [3] RColorBrewer_1.1-3          DESeq2_1.46.0
 [5] BiocParallel_1.40.0         GenomicAlignments_1.42.0
 [7] Rsamtools_2.22.0            Biostrings_2.74.1
 [9] XVector_0.46.0              SummarizedExperiment_1.36.0
[11] MatrixGenerics_1.18.1       matrixStats_1.5.0
[13] txdbmaker_1.2.1             GenomicFeatures_1.58.0
[15] AnnotationDbi_1.68.0        Biobase_2.66.0
[17] GenomicRanges_1.58.0        GenomeInfoDb_1.42.1
[19] IRanges_2.40.1              S4Vectors_0.44.0
[21] BiocGenerics_0.52.0

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1        dplyr_1.1.4             blob_1.2.4
 [4] filelock_1.0.3          bitops_1.0-9            fastmap_1.2.0
 [7] RCurl_1.98-1.16         BiocFileCache_2.14.0    XML_3.99-0.18
[10] digest_0.6.37           lifecycle_1.0.4         statmod_1.5.0
[13] survival_3.5-8          KEGGREST_1.46.0         RSQLite_2.3.9
[16] magrittr_2.0.3          genefilter_1.88.0       compiler_4.4.0
[19] rlang_1.1.4             progress_1.2.3          tools_4.4.0
[22] utf8_1.2.4              yaml_2.3.10             rtracklayer_1.66.0
[25] prettyunits_1.2.0       S4Arrays_1.6.0          bit_4.5.0.1
[28] curl_4.3.2              DelayedArray_0.32.0     xml2_1.3.6
[31] abind_1.4-8             hwriter_1.3.2.1         grid_4.4.0
[34] fansi_1.0.6             xtable_1.8-4            colorspace_2.1-1
[37] ggplot2_3.5.1           scales_1.3.0            biomaRt_2.62.0
[40] cli_3.6.3               crayon_1.5.3            generics_0.1.3
[43] httr_1.4.7              rjson_0.2.23            DBI_1.2.3
[46] cachem_1.1.0            splines_4.4.0           zlibbioc_1.52.0
[49] parallel_4.4.0          restfulr_0.0.15         vctrs_0.6.5
[52] Matrix_1.7-0            jsonlite_1.8.9          geneplotter_1.84.0
[55] hms_1.1.3               bit64_4.5.2             locfit_1.5-9.10
[58] annotate_1.84.0         glue_1.8.0              codetools_0.2-20
[61] stringi_1.8.4           gtable_0.3.6            BiocIO_1.16.0
[64] UCSC.utils_1.2.0        munsell_0.5.1           tibble_3.2.1
[67] pillar_1.9.0            rappdirs_0.3.3          GenomeInfoDbData_1.2.13
[70] R6_2.5.1                dbplyr_2.5.0            httr2_1.0.7
[73] lattice_0.22-6          png_0.1-8               memoise_2.0.1
[76] Rcpp_1.0.14             SparseArray_1.6.0       pkgconfig_2.0.3

DifferentialSplicing RNASeqData DEXSeq • 146 views

ADD COMMENT • link 12 weeks ago cftattersfield • 0