using limma package for paired t-test: Error: (subscript) logical subscript too long

0

Entering edit mode

viritha kaza ▴ 580

@viritha-kaza-4318

Last seen 10.7 years ago

Hi group, I am trying to perform paired t-test for 6 samples which are paired one is from normal tissue of the subject and the other is tumor tissue of the same subject. I am following the code as mentioned in the Limma User guide,p.40,8.3 Paired Samples) Code: >source("http://bioconductor.org/biocLite.R") >biocLite("limma") >library(limma) >targets<-readTargets("targets.txt") >head(targets) FileName Pair Treatment 1 GSM675890 1 N 2 GSM675891 1 T 3 GSM675892 2 N 4 GSM675893 2 T 5 GSM675894 3 N 6 GSM675895 3 T >eset<-as.matrix(read.table("6samples.txt",sep='\t',header=TRUE,colCla sses=c(rep('numeric',7)),nrow=133673)) >head(eset) ID_REF GSM675890 GSM675891 GSM675892 GSM675893 GSM675894 GSM675895 [1,] 2315129 30.32278 20.42571 7.60854 17.15130 14.57533 22.22889 [2,] 2315145 12.74657 6.30117 11.43528 4.10696 3.12693 10.96096 [3,] 2315163 175.96267 125.77725 52.19822 102.07567 116.91966 174.41690 [4,] 2315198 6.57030 1.85541 3.34829 1.13516 0.34278 1.83917 [5,] 2315353 88.49511 48.77128 50.60524 62.92448 47.10977 45.06430 [6,] 2315371 2.01707 1.90644 536.07636 2.21359 0.00212 0.43249 >Pair<-factor(targets$Pair) >Treat<-factor(targets$Treatment,levels=c("N","T")) >design<-model.matrix(~Pair+Treat) >fit_pair<-lmFit(eset,design) Error: (subscript) logical subscript too long >sessionInfo() R version 2.14.1 (2011-12-22) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] limma_3.10.1 BiocInstaller_1.2.1 loaded via a namespace (and not attached): [1] tools_2.14.1 Please suggest as to where is the issue? Thanks, Viritha [[alternative HTML version deleted]]

limma limma • 4.5k views

ADD COMMENT • link written 13.3 years ago by viritha kaza ▴ 580

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 3 months ago

United States

On 01/17/2012 01:36 PM, viritha k wrote: > Hi group, > I am trying to perform paired t-test for 6 samples which are paired one is > from normal tissue of the subject and the other is tumor tissue of the same > subject. > I am following the code as mentioned in the Limma User guide,p.40,8.3 > Paired Samples) > > Code: >> source("http://bioconductor.org/biocLite.R") >> biocLite("limma") >> library(limma) >> targets<-readTargets("targets.txt") >> head(targets) > FileName Pair Treatment > 1 GSM675890 1 N > 2 GSM675891 1 T > 3 GSM675892 2 N > 4 GSM675893 2 T > 5 GSM675894 3 N > 6 GSM675895 3 T > >> eset<-as.matrix(read.table("6samples.txt",sep='\t',header=TRUE,colC lasses=c(rep('numeric',7)),nrow=133673)) >> head(eset) > > ID_REF GSM675890 GSM675891 GSM675892 GSM675893 GSM675894 GSM675895 > [1,] 2315129 30.32278 20.42571 7.60854 17.15130 14.57533 22.22889 Hi -- The 'ID_REF' column is being treated as an expression value rather than a row name, so seven samples in eset but 6 in targets. try row.names=1 in read.table (though check that this does the trick with your own data). Martin > [2,] 2315145 12.74657 6.30117 11.43528 4.10696 3.12693 10.96096 > [3,] 2315163 175.96267 125.77725 52.19822 102.07567 116.91966 174.41690 > [4,] 2315198 6.57030 1.85541 3.34829 1.13516 0.34278 1.83917 > [5,] 2315353 88.49511 48.77128 50.60524 62.92448 47.10977 45.06430 > [6,] 2315371 2.01707 1.90644 536.07636 2.21359 0.00212 0.43249 > >> Pair<-factor(targets$Pair) >> Treat<-factor(targets$Treatment,levels=c("N","T")) >> design<-model.matrix(~Pair+Treat) >> fit_pair<-lmFit(eset,design) > Error: (subscript) logical subscript too long > >> sessionInfo() > R version 2.14.1 (2011-12-22) > Platform: i386-pc-mingw32/i386 (32-bit) > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > attached base packages: > [1] stats graphics grDevices utils datasets methods base > other attached packages: > [1] limma_3.10.1 BiocInstaller_1.2.1 > loaded via a namespace (and not attached): > [1] tools_2.14.1 > Please suggest as to where is the issue? > Thanks, > Viritha > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793

ADD COMMENT • link 13.3 years ago Martin Morgan 25k

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 2 days ago

United States

Hi Viritha, On 1/17/2012 4:36 PM, viritha k wrote: > Hi group, > I am trying to perform paired t-test for 6 samples which are paired one is > from normal tissue of the subject and the other is tumor tissue of the same > subject. > I am following the code as mentioned in the Limma User guide,p.40,8.3 > Paired Samples) > > Code: >> source("http://bioconductor.org/biocLite.R") >> biocLite("limma") >> library(limma) >> targets<-readTargets("targets.txt") >> head(targets) > FileName Pair Treatment > 1 GSM675890 1 N > 2 GSM675891 1 T > 3 GSM675892 2 N > 4 GSM675893 2 T > 5 GSM675894 3 N > 6 GSM675895 3 T > >> eset<-as.matrix(read.table("6samples.txt",sep='\t',header=TRUE,colC lasses=c(rep('numeric',7)),nrow=133673)) >> head(eset) At the very least you should add a row.names = 1 to your call to read.table(). You want the ID to be the row.names of your matrix, not the first column. Since the dimensions of your matrix don't match the number of rows of your design matrix, I would expect a different error, Error in lm.fit(design, t(M)) : incompatible dimensions So there might be something else wrong. You don't show the final design matrix, so no telling. Best, Jim > ID_REF GSM675890 GSM675891 GSM675892 GSM675893 GSM675894 GSM675895 > [1,] 2315129 30.32278 20.42571 7.60854 17.15130 14.57533 22.22889 > [2,] 2315145 12.74657 6.30117 11.43528 4.10696 3.12693 10.96096 > [3,] 2315163 175.96267 125.77725 52.19822 102.07567 116.91966 174.41690 > [4,] 2315198 6.57030 1.85541 3.34829 1.13516 0.34278 1.83917 > [5,] 2315353 88.49511 48.77128 50.60524 62.92448 47.10977 45.06430 > [6,] 2315371 2.01707 1.90644 536.07636 2.21359 0.00212 0.43249 > >> Pair<-factor(targets$Pair) >> Treat<-factor(targets$Treatment,levels=c("N","T")) >> design<-model.matrix(~Pair+Treat) >> fit_pair<-lmFit(eset,design) > Error: (subscript) logical subscript too long > >> sessionInfo() > R version 2.14.1 (2011-12-22) > Platform: i386-pc-mingw32/i386 (32-bit) > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > attached base packages: > [1] stats graphics grDevices utils datasets methods base > other attached packages: > [1] limma_3.10.1 BiocInstaller_1.2.1 > loaded via a namespace (and not attached): > [1] tools_2.14.1 > Please suggest as to where is the issue? > Thanks, > Viritha > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD COMMENT • link 13.3 years ago James W. MacDonald 68k

0

Entering edit mode

Hi James, I used row.names =1 as suggested by you and Martin.I did not get any error and included these steps. >fit_pair <- eBayes(fit_pair) > topTable(fit_pair, coef="TreatT") and got a list ID logFC AveExpr t P.Value adj.P.Val B 114659 3807490 1938.54365 4407.17510 93.62245 5.066916e-06 0.6773048 -4.555623 93178 3536336 89.25101 76.95120 57.13317 2.044799e-05 0.9313141 -4.555652 15914 2523632 85.55042 75.01551 48.02453 3.339131e-05 0.9313141 -4.555671 82198 3401197 114.17548 185.84179 46.26832 3.709470e-05 0.9313141 -4.555677 82576 3405396 -112.96339 214.93046 -43.24963 4.487733e-05 0.9313141 -4.555687 44334 2900091 227.49472 197.41073 39.63887 5.739658e-05 0.9313141 -4.555702 1373 2330451 76.19923 155.62751 38.99148 6.012674e-05 0.9313141 -4.555706 124852 3923312 62.34274 96.81574 38.91146 6.047631e-05 0.9313141 -4.555706 6531 2398894 279.18572 372.08333 38.28233 6.332302e-05 0.9313141 -4.555709 34618 2772414 -162.60332 150.62089 -36.12449 7.458547e-05 0.9313141 -4.555722 Here I have considered only 6 samples and tumor vs normal as I would like to try the whole dataset in 64 bit machine( due to memory issues) later if this code works. My actuall intention is to design the paired ttest for multiple subgroups for 80 patients with tumor and their respective normal samples(80). (with in brackets are the no of subjects) Subgroups: Group(160)- Tumor(80), Normal(80) Gender(80)- Female(27), Male(53) Stage(80)- I(4), II(7), III(54), IV(15) Age(77)->=55(53), <55(24), unknown(3) How to include these conditions, is it by just mentioning in the targets file? and how do I have to change the rest to get this design? Is it possible to perform this in one go or should it be performed as different conditions indiviually? waiting for your suggestions, Thanks, Viritha On Tue, Jan 17, 2012 at 4:49 PM, James W. MacDonald <jmacdon@med.umich.edu>wrote: > Hi Viritha, > > > On 1/17/2012 4:36 PM, viritha k wrote: > >> Hi group, >> I am trying to perform paired t-test for 6 samples which are paired one is >> from normal tissue of the subject and the other is tumor tissue of the >> same >> subject. >> I am following the code as mentioned in the Limma User guide,p.40,8.3 >> Paired Samples) >> >> Code: >> >>> source("http://bioconductor.**org/biocLite.R<http: bioconductor.o="" rg="" bioclite.r=""> >>> ") >>> biocLite("limma") >>> library(limma) >>> targets<-readTargets("targets.**txt") >>> head(targets) >>> >> FileName Pair Treatment >> 1 GSM675890 1 N >> 2 GSM675891 1 T >> 3 GSM675892 2 N >> 4 GSM675893 2 T >> 5 GSM675894 3 N >> 6 GSM675895 3 T >> >> eset<-as.matrix(read.table("**6samples.txt",sep='\t',header=** >>> TRUE,colClasses=c(rep('**numeric',7)),nrow=133673)) >>> head(eset) >>> >> > At the very least you should add a row.names = 1 to your call to > read.table(). You want the ID to be the row.names of your matrix, not the > first column. > > Since the dimensions of your matrix don't match the number of rows of your > design matrix, I would expect a different error, > > Error in lm.fit(design, t(M)) : incompatible dimensions > > So there might be something else wrong. You don't show the final design > matrix, so no telling. > > Best, > > Jim > > > ID_REF GSM675890 GSM675891 GSM675892 GSM675893 GSM675894 GSM675895 >> [1,] 2315129 30.32278 20.42571 7.60854 17.15130 14.57533 22.22889 >> [2,] 2315145 12.74657 6.30117 11.43528 4.10696 3.12693 10.96096 >> [3,] 2315163 175.96267 125.77725 52.19822 102.07567 116.91966 174.41690 >> [4,] 2315198 6.57030 1.85541 3.34829 1.13516 0.34278 1.83917 >> [5,] 2315353 88.49511 48.77128 50.60524 62.92448 47.10977 45.06430 >> [6,] 2315371 2.01707 1.90644 536.07636 2.21359 0.00212 0.43249 >> >> Pair<-factor(targets$Pair) >>> Treat<-factor(targets$**Treatment,levels=c("N","T")) >>> design<-model.matrix(~Pair+**Treat) >>> fit_pair<-lmFit(eset,design) >>> >> Error: (subscript) logical subscript too long >> >> sessionInfo() >>> >> R version 2.14.1 (2011-12-22) >> Platform: i386-pc-mingw32/i386 (32-bit) >> locale: >> [1] LC_COLLATE=English_United States.1252 >> [2] LC_CTYPE=English_United States.1252 >> [3] LC_MONETARY=English_United States.1252 >> [4] LC_NUMERIC=C >> [5] LC_TIME=English_United States.1252 >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> other attached packages: >> [1] limma_3.10.1 BiocInstaller_1.2.1 >> loaded via a namespace (and not attached): >> [1] tools_2.14.1 >> Please suggest as to where is the issue? >> Thanks, >> Viritha >> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > > ************************************************************ > Electronic Mail is not secure, may not be read every day, and should not > be used for urgent or sensitive issues > [[alternative HTML version deleted]]

ADD REPLY • link 13.3 years ago viritha kaza ▴ 580

0

Entering edit mode

Hi Viritha, On 1/18/2012 2:47 PM, viritha k wrote: > Hi James, > I used row.names =1 as suggested by you and Martin.I did not get any > error and included these steps. > >fit_pair <- eBayes(fit_pair) > > topTable(fit_pair, coef="TreatT") > and got a list > ID logFC AveExpr > t P.Value adj.P.Val B > 114659 3807490 1938.54365 4407.17510 93.62245 5.066916e-06 0.6773048 > -4.555623 > 93178 3536336 89.25101 76.95120 57.13317 2.044799e-05 0.9313141 > -4.555652 > 15914 2523632 85.55042 75.01551 48.02453 3.339131e-05 0.9313141 > -4.555671 > 82198 3401197 114.17548 185.84179 46.26832 3.709470e-05 0.9313141 > -4.555677 > 82576 3405396 -112.96339 214.93046 -43.24963 4.487733e-05 0.9313141 > -4.555687 > 44334 2900091 227.49472 197.41073 39.63887 5.739658e-05 0.9313141 > -4.555702 > 1373 2330451 76.19923 155.62751 38.99148 6.012674e-05 0.9313141 > -4.555706 > 124852 3923312 62.34274 96.81574 38.91146 6.047631e-05 0.9313141 > -4.555706 > 6531 2398894 279.18572 372.08333 38.28233 6.332302e-05 0.9313141 > -4.555709 > 34618 2772414 -162.60332 150.62089 -36.12449 7.458547e-05 0.9313141 > -4.555722 > > Here I have considered only 6 samples and tumor vs normal as I would > like to try the whole dataset in 64 bit machine( due to memory issues) > later if this code works. > My actuall intention is to design the paired ttest for multiple > subgroups for 80 patients with tumor and their respective normal > samples(80). (with in brackets are the no of subjects) > Subgroups: > > Group(160)- Tumor(80), Normal(80) > Gender(80)- Female(27), Male(53) > Stage(80)- I(4), II(7), III(54), IV(15) > Age(77)->=55(53), <55(24), unknown(3) > > How to include these conditions, is it by just mentioning in the > targets file? and how do I have to change the rest to get this design? > Is it possible to perform this in one go or should it be performed as > different conditions indiviually? > waiting for your suggestions, Hypothetically you could set this up by a correctly-designed targets file, but I generally forgo the targets file for direct construction of the design matrix. That said, your question has diverged IMO from a technical (how do I get the software to work) into a statistical (how do I analyze these data) question. I am more than happy to help with technical issues, but I am not so keen to help with statistical questions. The reasons for this are many, but include the fact that there is much more to a given analysis than setting up a design matrix (and without the data in hand, I cannot say what other issues may exist), as well as the fact that I get paid to do analyses and it isn't in my best interest to give my work away for free. I would suggest a close reading of the limma User's Guide, as well as any number of linear modeling textbooks (or perhaps a consultation with a local statistician). Best, Jim > Thanks, > Viritha > On Tue, Jan 17, 2012 at 4:49 PM, James W. MacDonald > <jmacdon at="" med.umich.edu="" <mailto:jmacdon="" at="" med.umich.edu="">> wrote: > > Hi Viritha, > > > On 1/17/2012 4:36 PM, viritha k wrote: > > Hi group, > I am trying to perform paired t-test for 6 samples which are > paired one is > from normal tissue of the subject and the other is tumor > tissue of the same > subject. > I am following the code as mentioned in the Limma User > guide,p.40,8.3 > Paired Samples) > > Code: > > source("http://bioconductor.org/biocLite.R") > biocLite("limma") > library(limma) > targets<-readTargets("targets.txt") > head(targets) > > FileName Pair Treatment > 1 GSM675890 1 N > 2 GSM675891 1 T > 3 GSM675892 2 N > 4 GSM675893 2 T > 5 GSM675894 3 N > 6 GSM675895 3 T > > eset<-as.matrix(read.table("6samples.txt",sep='\t',heade r=TRUE,colClasses=c(rep('numeric',7)),nrow=133673)) > head(eset) > > > At the very least you should add a row.names = 1 to your call to > read.table(). You want the ID to be the row.names of your matrix, > not the first column. > > Since the dimensions of your matrix don't match the number of rows > of your design matrix, I would expect a different error, > > Error in lm.fit(design, t(M)) : incompatible dimensions > > So there might be something else wrong. You don't show the final > design matrix, so no telling. > > Best, > > Jim > > > ID_REF GSM675890 GSM675891 GSM675892 GSM675893 GSM675894 > GSM675895 > [1,] 2315129 30.32278 20.42571 7.60854 17.15130 14.57533 > 22.22889 > [2,] 2315145 12.74657 6.30117 11.43528 4.10696 3.12693 > 10.96096 > [3,] 2315163 175.96267 125.77725 52.19822 102.07567 116.91966 > 174.41690 > [4,] 2315198 6.57030 1.85541 3.34829 1.13516 0.34278 > 1.83917 > [5,] 2315353 88.49511 48.77128 50.60524 62.92448 47.10977 > 45.06430 > [6,] 2315371 2.01707 1.90644 536.07636 2.21359 0.00212 > 0.43249 > > Pair<-factor(targets$Pair) > Treat<-factor(targets$Treatment,levels=c("N","T")) > design<-model.matrix(~Pair+Treat) > fit_pair<-lmFit(eset,design) > > Error: (subscript) logical subscript too long > > sessionInfo() > > R version 2.14.1 (2011-12-22) > Platform: i386-pc-mingw32/i386 (32-bit) > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > attached base packages: > [1] stats graphics grDevices utils datasets methods > base > other attached packages: > [1] limma_3.10.1 BiocInstaller_1.2.1 > loaded via a namespace (and not attached): > [1] tools_2.14.1 > Please suggest as to where is the issue? > Thanks, > Viritha > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 <tel:734-615-7826> > > ********************************************************** > Electronic Mail is not secure, may not be read every day, and > should not be used for urgent or sensitive issues > > -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD REPLY • link 13.3 years ago James W. MacDonald 68k

Login before adding your answer.