Hello,
This is my first question in Bioconductor support and I am completely new to R and bioinformatics, so I apologize if the reply to my question seems obvious.
I am trying to compare RNA-seq data of two different conditions: LG (5 replicates) vs HG (3 replicates) by using RUVSeq to remove unwanted variation between batches.
The data look like this: The columns are the samples: HG1,HG2,HG3,HG5,HG4,LG1,LG2,LG5 The rows are the genes: NM000014, NM000015... up to 18000 genes.
The code I wrote is the following:
count_tab <- read.table("Human_islets_counts_Refseq_HG_vs_LG.csv",header = TRUE,row.names = 1,sep = ',')
filter <- apply(count_tab, 1, function(x) length(x[x>5])>=2)
filtered <- count_tab[filter,]
genes <- rownames(filtered)[grep("^NM", rownames(filtered))]
x <- as.factor(rep(c("HG", "LG"), each=5,3))
set <- newSeqExpressionSet(as.matrix(filtered),phenoData = data.frame(x, row.names=colnames(filtered)))
Error in data.frame(x, row.names = colnames(filtered)) :
row names supplied are of the wrong length
Anyone could give me a hint on why this is wrong?
Thanks in advance Cecilia
> sessionInfo()
R version 3.6.0 alpha (2019-04-08 r76348)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] RUVSeq_1.17.1 edgeR_3.25.4 limma_3.39.15 EDASeq_2.17.4
[5] ShortRead_1.41.0 GenomicAlignments_1.19.1 SummarizedExperiment_1.13.0 DelayedArray_0.9.9
[9] matrixStats_0.54.0 Rsamtools_1.99.6 GenomicRanges_1.35.1 GenomeInfoDb_1.19.3
[13] lattice_0.20-38 locfit_1.5-9.1 zebrafishRNASeq_1.3.0 Biostrings_2.51.5
[17] XVector_0.23.2 IRanges_2.17.5 S4Vectors_0.21.23 BiocParallel_1.17.18
[21] Biobase_2.43.1 BiocGenerics_0.29.2
loaded via a namespace (and not attached):
Error in x[["Version"]] : subscript out of bounds
In addition: Warning messages:
1: In FUN(X[[i]], ...) :
DESCRIPTION file of package 'RCurl' is missing or broken
2: In FUN(X[[i]], ...) :
DESCRIPTION file of package 'bitops' is missing or broken
I am having a similar problem with my data. Is RUVseq able to deal with replicates with unequal numbers across the sample groups?