basic R question

0

Entering edit mode

Jing Huang ▴ 380

@jing-huang-4737

Last seen 10.6 years ago

Hi Expert! I am trying to get rid of the features that contain more than half of samples with NA data. Could you help me? Here is the example. > head(exprs(eset.vsn)) GSM48598 GSM48617 GSM48600 GSM48601 GSM48602 GSM48604 GSM48607 GSM48608 GSM48614 GSM48616 GSM48599 GSM48603 GSM48605 GSM48606 GSM48609 GSM48615 1000_at 11.324097 10.881484 11.004342 10.649591 11.196320 11.405016 11.159627 11.144816 11.280698 11.008774 11.158076 10.083978 11.024338 10.641091 10.508528 10.836564 1001_at 6.503407 6.940207 7.776485 6.744207 8.393132 7.994422 7.417291 8.383466 8.278285 8.476460 7.702632 7.811951 6.955951 8.490921 6.632979 7.751188 1002_f_at 6.682602 6.320622 NA 7.503875 5.969647 NA 5.394164 6.293754 7.140539 5.791176 5.493847 8.379308 8.163210 6.900236 6.384235 6.620342 1003_s_at 8.113777 7.298421 NA NA NA NA NA NA NA NA NA 8.243218 NA NA NA NA 1004_at 7.133844 7.052989 6.986067 NA NA NA NA NA 6.712877 7.176983 NA NA 7.252336 NA NA NA 1005_at 8.600065 13.149781 8.636922 8.862644 11.790418 6.276165 10.805382 6.908298 12.894008 10.353165 8.762901 8.135442 NA 9.235085 NA 10.925639 > Many many thanks Jing OHSU [[alternative HTML version deleted]]

• 1.3k views

ADD COMMENT • link updated 13.0 years ago by Tim Triche ★ 4.2k • written 13.0 years ago by Jing Huang ▴ 380

0

Entering edit mode

Tim Triche ★ 4.2k

@tim-triche-3561

Last seen 4.6 years ago

United States

R> eset.vsn[ which(rowSumsis.na(exprs(eset.vsn)))<(0.5*dim(eset.vsn)[1])), ] On Mon, Apr 23, 2012 at 3:55 PM, Jing Huang <huangji@ohsu.edu> wrote: > Hi Expert! > > I am trying to get rid of the features that contain more than half of > samples with NA data. Could you help me? > > Here is the example. > > > > head(exprs(eset.vsn)) > > GSM48598 GSM48617 GSM48600 GSM48601 GSM48602 GSM48604 > GSM48607 GSM48608 GSM48614 GSM48616 GSM48599 GSM48603 GSM48605 > GSM48606 GSM48609 GSM48615 > > 1000_at 11.324097 10.881484 11.004342 10.649591 11.196320 11.405016 > 11.159627 11.144816 11.280698 11.008774 11.158076 10.083978 11.024338 > 10.641091 10.508528 10.836564 > > 1001_at 6.503407 6.940207 7.776485 6.744207 8.393132 7.994422 > 7.417291 8.383466 8.278285 8.476460 7.702632 7.811951 6.955951 > 8.490921 6.632979 7.751188 > > 1002_f_at 6.682602 6.320622 NA 7.503875 5.969647 NA > 5.394164 6.293754 7.140539 5.791176 5.493847 8.379308 8.163210 > 6.900236 6.384235 6.620342 > > 1003_s_at 8.113777 7.298421 NA NA NA NA > NA NA NA NA NA 8.243218 NA NA > NA NA > > 1004_at 7.133844 7.052989 6.986067 NA NA NA > NA NA 6.712877 7.176983 NA NA 7.252336 NA > NA NA > > 1005_at 8.600065 13.149781 8.636922 8.862644 11.790418 6.276165 > 10.805382 6.908298 12.894008 10.353165 8.762901 8.135442 NA > 9.235085 NA 10.925639 > > > > > > > Many many thanks > > > Jing > > > OHSU > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD COMMENT • link 13.0 years ago Tim Triche ★ 4.2k

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 2.1 years ago

United States

Hi, On Mon, Apr 23, 2012 at 6:55 PM, Jing Huang <huangji at="" ohsu.edu=""> wrote: > Hi Expert! > > I am trying to get rid of the features that contain more than half of samples with NA data. Could you help me? > > Here is the example. > > >> head(exprs(eset.vsn)) How about: R> ae <- exprs(eset.vsn) R> good <- ae[rowSumsis.na(ae)) / ncol(ae) < 0.5, ] HTH, -steve > > ? ? ? ? ? GSM48598 ?GSM48617 ?GSM48600 ?GSM48601 ?GSM48602 ?GSM48604 ?GSM48607 ?GSM48608 ?GSM48614 ?GSM48616 ?GSM48599 ?GSM48603 ?GSM48605 ?GSM48606 ?GSM48609 ?GSM48615 > > 1000_at ? 11.324097 10.881484 11.004342 10.649591 11.196320 11.405016 11.159627 11.144816 11.280698 11.008774 11.158076 10.083978 11.024338 10.641091 10.508528 10.836564 > > 1001_at ? ?6.503407 ?6.940207 ?7.776485 ?6.744207 ?8.393132 ?7.994422 ?7.417291 ?8.383466 ?8.278285 ?8.476460 ?7.702632 ?7.811951 ?6.955951 ?8.490921 ?6.632979 ?7.751188 > > 1002_f_at ?6.682602 ?6.320622 ? ? ? ?NA ?7.503875 ?5.969647 ? ? ? ?NA ?5.394164 ?6.293754 ?7.140539 ?5.791176 ?5.493847 ?8.379308 ?8.163210 ?6.900236 ?6.384235 ?6.620342 > > 1003_s_at ?8.113777 ?7.298421 ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ?8.243218 ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA > > 1004_at ? ?7.133844 ?7.052989 ?6.986067 ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ?6.712877 ?7.176983 ? ? ? ?NA ? ? ? ?NA ?7.252336 ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA > > 1005_at ? ?8.600065 13.149781 ?8.636922 ?8.862644 11.790418 ?6.276165 10.805382 ?6.908298 12.894008 10.353165 ?8.762901 ?8.135442 ? ? ? ?NA ?9.235085 ? ? ? ?NA 10.925639 > >> > > > > Many many thanks > > > Jing > > > OHSU > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD COMMENT • link 13.0 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Oops, you are right, I meant ncol(eset.vsn) a.k.a. dim(eset.vsn)[2] in the above. Not sure how the [2] became a [1] en route from R to Gmail. derp On Mon, Apr 23, 2012 at 4:08 PM, Steve Lianoglou < mailinglist.honeypot@gmail.com> wrote: > Hi, > > On Mon, Apr 23, 2012 at 6:55 PM, Jing Huang <huangji@ohsu.edu> wrote: > > Hi Expert! > > > > I am trying to get rid of the features that contain more than half of > samples with NA data. Could you help me? > > > > Here is the example. > > > > > >> head(exprs(eset.vsn)) > > How about: > > R> ae <- exprs(eset.vsn) > R> good <- ae[rowSumsis.na(ae)) / ncol(ae) < 0.5, ] > > HTH, > -steve > > > > > GSM48598 GSM48617 GSM48600 GSM48601 GSM48602 GSM48604 > GSM48607 GSM48608 GSM48614 GSM48616 GSM48599 GSM48603 GSM48605 > GSM48606 GSM48609 GSM48615 > > > > 1000_at 11.324097 10.881484 11.004342 10.649591 11.196320 11.405016 > 11.159627 11.144816 11.280698 11.008774 11.158076 10.083978 11.024338 > 10.641091 10.508528 10.836564 > > > > 1001_at 6.503407 6.940207 7.776485 6.744207 8.393132 7.994422 > 7.417291 8.383466 8.278285 8.476460 7.702632 7.811951 6.955951 > 8.490921 6.632979 7.751188 > > > > 1002_f_at 6.682602 6.320622 NA 7.503875 5.969647 NA > 5.394164 6.293754 7.140539 5.791176 5.493847 8.379308 8.163210 > 6.900236 6.384235 6.620342 > > > > 1003_s_at 8.113777 7.298421 NA NA NA NA > NA NA NA NA NA 8.243218 NA > NA NA NA > > > > 1004_at 7.133844 7.052989 6.986067 NA NA NA > NA NA 6.712877 7.176983 NA NA 7.252336 > NA NA NA > > > > 1005_at 8.600065 13.149781 8.636922 8.862644 11.790418 6.276165 > 10.805382 6.908298 12.894008 10.353165 8.762901 8.135442 NA > 9.235085 NA 10.925639 > > > >> > > > > > > > > Many many thanks > > > > > > Jing > > > > > > OHSU > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD REPLY • link 13.0 years ago Tim Triche ★ 4.2k

0

Entering edit mode

Many THANK!! Works great. Jing From: "Tim Triche, Jr." <tim.triche@gmail.com<mailto:tim.triche@gmail.com>> Reply-To: "ttriche@usc.edu<mailto:ttriche@usc.edu>" <ttriche@usc.edu<mailto:ttriche@usc.edu>> Date: Mon, 23 Apr 2012 16:49:05 -0700 To: Steve Lianoglou <mailinglist.honeypot@gmail.com<mailto:mailinglist .honeypot@gmail.com="">> Cc: Jing Huang <huangji@ohsu.edu<mailto:huangji@ohsu.edu>>, "bioconductor@r-project.org<mailto:bioconductor@r-project.org>" <bioconductor@r-project.org<mailto:bioconductor@r-project.org>> Subject: Re: [BioC] basic R question Oops, you are right, I meant ncol(eset.vsn) a.k.a. dim(eset.vsn)[2] in the above. Not sure how the [2] became a [1] en route from R to Gmail. derp On Mon, Apr 23, 2012 at 4:08 PM, Steve Lianoglou <mailinglist.honeypot @gmail.com<mailto:mailinglist.honeypot@gmail.com="">> wrote: Hi, On Mon, Apr 23, 2012 at 6:55 PM, Jing Huang <huangji@ohsu.edu<mailto:huangji@ohsu.edu>> wrote: > Hi Expert! > > I am trying to get rid of the features that contain more than half of samples with NA data. Could you help me? > > Here is the example. > > >> head(exprs(eset.vsn)) How about: R> ae <- exprs(eset.vsn) R> good <- ae[rowSumsis.na<http: is.na="">(ae)) / ncol(ae) < 0.5, ] HTH, -steve > > GSM48598 GSM48617 GSM48600 GSM48601 GSM48602 GSM48604 GSM48607 GSM48608 GSM48614 GSM48616 GSM48599 GSM48603 GSM48605 GSM48606 GSM48609 GSM48615 > > 1000_at 11.324097 10.881484 11.004342 10.649591 11.196320 11.405016 11.159627 11.144816 11.280698 11.008774 11.158076 10.083978 11.024338 10.641091 10.508528 10.836564 > > 1001_at 6.503407 6.940207 7.776485 6.744207 8.393132 7.994422 7.417291 8.383466 8.278285 8.476460 7.702632 7.811951 6.955951 8.490921 6.632979 7.751188 > > 1002_f_at 6.682602 6.320622 NA 7.503875 5.969647 NA 5.394164 6.293754 7.140539 5.791176 5.493847 8.379308 8.163210 6.900236 6.384235 6.620342 > > 1003_s_at 8.113777 7.298421 NA NA NA NA NA NA NA NA NA 8.243218 NA NA NA NA > > 1004_at 7.133844 7.052989 6.986067 NA NA NA NA NA 6.712877 7.176983 NA NA 7.252336 NA NA NA > > 1005_at 8.600065 13.149781 8.636922 8.862644 11.790418 6.276165 10.805382 6.908298 12.894008 10.353165 8.762901 8.135442 NA 9.235085 NA 10.925639 > >> > > > > Many many thanks > > > Jing > > > OHSU > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org<mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- A model is a lie that helps you see the truth. Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD REPLY • link 13.0 years ago Jing Huang ▴ 380

Login before adding your answer.