"Error in xy.coords(x, y, setLab = FALSE) : 'x' and 'y' lengths differ" using glmQLFit
2
0
Entering edit mode
@michael-degaud-23371
Last seen 4.3 years ago

Hello,

I'm using R to analyse RNA-seq data. As a training, I work on normalized counts of RNA-seq data from the article Recurrent SPI1 (PU.1) fusions in high-risk pediatric T cell acute lymphoblastic leukemia, Nat Genet 2017, Seki et al.

I'm using the edgeR package. When I use the glmQLFit function I have the error Error in xy.coords(x, y, setLab = FALSE) : 'x' and 'y' lengths differ

Here is my script (I'm a beginner, it is probably not perfect) :

> norm.counts <- read.delim("normalized_counts.tsv",
+                           header = T, sep = "\t", row.names = 1)
> 
> colSums(norm.counts)
   TALL001    TALL002    TALL003    TALL004    TALL006    TALL007    TALL008    TALL009 
  33722027   30093322   34278280   37848244   25315474   23804890   35201963   36456236 
   TALL010    TALL011    TALL013    TALL014    TALL015    TALL016    TALL017    TALL018 
  38172855   35779360   32727294   43894216   34418442   36482047   34938025   37990134 
   TALL019    TALL020    TALL024    TALL025    TALL030    TALL031    TALL032    TALL033 
  38960315   25773150   30495549   32640739   35919181   43539093   42042277   41543688 
   TALL034    TALL035    TALL036    TALL037    TALL038    TALL039    TALL040    TALL042 
  30707698   35396923   43606066   33754135   37889274   36725286   36119062   35467025 
   TALL044    TALL045    TALL046    TALL047    TALL048    TALL049    TALL050    TALL051 
  29878225   41663520   32872072   57717587   41764592   36364493   35096191   43286063 
   TALL052    TALL053    TALL054    TALL055    TALL056    TALL057    TALL061    TALL062 
  36609160   31687412   31257274   28374736   35096045   40710641   45411284   58606528 
   TALL063    TALL064    TALL065    TALL066    TALL067    TALL068    TALL069    TALL071 
  63632920   37907728   31373424   48992507   31730288   52096648   18664742   53593920 
   TALL072    TALL073    TALL074    TALL075    TALL076    TALL077    TALL078    TALL079 
  18677207   17232588   29134564   50632375   14496115   59028227  110368139   44027517 
   TALL080    TALL081 TALL_VC001 TALL_VC003 TALL_VC005 TALL_VC010 TALL_VC011 TALL_VC013 
  49852029   39005896   23015426   57767116   34416753   20725098   33770006   23086144 
TALL_VC014 TALL_VC020 TALL_VC021 TALL_VC022 TALL_VC023 TALL_VC024 TALL_VC025 TALL_VC028 
  22658885   17152179    7989542    7992537   27377480   19922179   27544197   24572406 
TALL_VC050 TALL_VC058 TALL_VC059 TALL_VC062 TALL_VC065 TALL_VC067 TALL_VC068 TALL_VC069 
  38924366   35155377   24276703   23203659   30098537   13527342   40998764   27662921 
TALL_VC070 TALL_VC072 TALL_VC073 TALL_VC075 TALL_VC076 TALL_VC078 TALL_VC080 TALL_VC081 
  36701352   16129274   30069837   25375765   20900437   25040643   26063947   20015569 
TALL_VC082 TALL_VC083 TALL_VC084 TALL_VC086 TALL_VC087 TALL_VC088 TALL_VC090 TALL_VC091 
   4885073    8292829   27656565   34987607   25828277   33419992   16890930   32962403 
TALL_VC092 TALL_VC093 TALL_VC094 TALL_VC095 TALL_VC096 TALL_VC099 TALL_VC100 TALL_VC101 
  19414351   34597332   29012888   25723041   29418583   29752930   28421256   30689661 
TALL_VC102 TALL_VC103 TALL_VC104 TALL_VC105    TALL023    TALL028    TALL027    TALL041 
  35541147   28443857   28729420   25288812   35183906   37119618   37099201   40949315 
   TALL059    TALL060    TALL070 
  39259066   48815589    9645969 
> 
> # I create au norm.counts2 file to work on it
> norm.counts2 <- norm.counts
> 
> # I add a column with genes symbols
> Symbol = row.names(norm.counts2)
> head(Symbol)
[1] "5S_rRNA"  "7SK"      "A1BG"     "A1BG-AS1" "A1CF"     "A2M"     
> norm.counts2 <- cbind(norm.counts2, Symbol)
> norm.counts2 <- norm.counts2[, colnames(norm.counts2)[c(124, 1:123)]]
> 
> # I create the matrix
> group <- factor(colnames(norm.counts2), exclude = "Symbol")
> 
> 
> design <- model.matrix(~0 + group)
> colnames(design) <- levels(group)
> 
> 
> # I create the DGEList object, verify librairies sizes and use plotMDS
> norm.counts2 <- norm.counts2[2:124]
> 
> dgeObj <- DGEList(norm.counts2)
> 
> 
> dgeObj$samples$lib.size
  [1]  33722027  30093322  34278280  37848244  25315474  23804890  35201963  36456236  38172855
 [10]  35779360  32727294  43894216  34418442  36482047  34938025  37990134  38960315  25773150
 [19]  30495549  32640739  35919181  43539093  42042277  41543688  30707698  35396923  43606066
 [28]  33754135  37889274  36725286  36119062  35467025  29878225  41663520  32872072  57717587
 [37]  41764592  36364493  35096191  43286063  36609160  31687412  31257274  28374736  35096045
 [46]  40710641  45411284  58606528  63632920  37907728  31373424  48992507  31730288  52096648
 [55]  18664742  53593920  18677207  17232588  29134564  50632375  14496115  59028227 110368139
 [64]  44027517  49852029  39005896  23015426  57767116  34416753  20725098  33770006  23086144
 [73]  22658885  17152179   7989542   7992537  27377480  19922179  27544197  24572406  38924366
 [82]  35155377  24276703  23203659  30098537  13527342  40998764  27662921  36701352  16129274
 [91]  30069837  25375765  20900437  25040643  26063947  20015569   4885073   8292829  27656565
[100]  34987607  25828277  33419992  16890930  32962403  19414351  34597332  29012888  25723041
[109]  29418583  29752930  28421256  30689661  35541147  28443857  28729420  25288812  35183906
[118]  37119618  37099201  40949315  39259066  48815589   9645969
> 
> barplot(dgeObj$samples$lib.size, names = colnames(dgeObj), las = 2)
> 
> plotMDS(dgeObj)
> 
> # I estimate dispersion
> dgeObj<- estimateCommonDisp(dgeObj)
> dgeObj <- estimateGLMTrendedDisp(dgeObj)
> dgeObj <- estimateTagwiseDisp(dgeObj)
> plotBCV(dgeObj)

> # I use glmQLFit
> fit <- glmQLFit(dgeObj, design, robust = TRUE)
Error in xy.coords(x, y, setLab = FALSE) : 'x' and 'y' lengths differ

I don't know what to do. Thank you for your help. Tell me if you need more information.

glmQLFit edger • 2.1k views
ADD COMMENT
1
Entering edit mode
Yunshun Chen ▴ 900
@yunshun-chen-5451
Last seen 5 days ago
Australia

The way you created your group factor does not seem correct. You are treating each sample as a unique group. Therefore, the design matrix also doesn't make sense.

Please read through the edgeR user's guide. There are many detailed case studies in the user's guide that you can follow.

ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 9 hours ago
WEHI, Melbourne, Australia

Hi Michael,

Your code doesn't entirely make sense and I can't see how it would be possible to get an xy.coords error message from glmQLFit because that error message only comes from plotting functions.

The main issue though is that you're not actually doing a differential expression analysis yet. Before you can begin an analysis you first need to identify the patient groups that you want to compare. In order to do that, you first need to have the clinical data on the patients.

Another issue that might or might not be important is that edgeR is designed to run on raw counts rather than on "normalized counts" (whatever that might mean). So you should explain how you obtained the normalized count matrix.

ADD COMMENT
0
Entering edit mode

Thanks a lot for your answers Gordon and Yunshun,

Indeed, rethinking about it, my code is a little bit stupid. I'll rework on it and read better the EdgeR user guide. I'm taking the opportunity to thank you for this user guide too. It is really complete with lot of case studies. You, Yunshun and your colleagues did a great job.

ADD REPLY

Login before adding your answer.

Traffic: 820 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6