Entering edit mode
Vittoria Roncalli
▴
40
@vittoria-roncalli-5633
Last seen 10.5 years ago
Hi,
Thanks a lot for your suggestions, they were helpful.
In order to better understand how the filter really works, I
considered
only a subset of my dataset, consisting in 30 tags.
I have 9 column in total, consisting of 3 different samples(C-L-T)
made by
3 replicates each(1-3).
So, I used the filter command :
> keep <- rowSums (cpm(d)>5) >=3
> d <- d[keep,]
Then I looked at the output of `cpm(d) > 5, and at the list of the
filtered libraries.
Where I get confused is how the 5cpm are calculated. If I am selecting
the
counts that in the single row (gene) are at least grater then 5cpm in
the 3
replicates of 1 sample, how is possible that the filter keep a gene
that
have 0 counts in all the 3 replicates of the sample?
Are the 5cpm calculated for each gene as an average of all the 9
samples in
my case?
How the 5cpm value is calculated? Is it dependent on the library size
of
each sample?
How can I get the minimum value considered for the filter?
gene 1 C1 C2 C3 L1 L2 L3 H1 H2 H3
0 0 0 1 0 0 0
1
2
output cpm>5 F F F T F F F T T
I apologize for the silly questions, but I am really new in this
field, and
I am trying to understand the tool before using it.
Thanks in advance
Vittoria
On Wed, Nov 28, 2012 at 5:54 PM, Steve Lianoglou <
mailinglist.honeypot@gmail.com> wrote:
> Hi,
>
> On Wed, Nov 28, 2012 at 10:14 PM, Vittoria Roncalli
<roncalli@hawaii.edu>
> wrote:
> > Hi,
> >
> > I would like to understand how the filter of low expression tags
works.
> If
> > I run the command
> >
> >>keep <- rowSums (cpm(d)>100) >=2
> > d <- d[keep,]
> > dim(d)
> >
> > as in the use guide page 32, this means that I am using a cutoff
of
> 100cpm,
> > but how are treated the 2 samples? Did are they averaged and then
the low
> > tags are removed?
> > Is each sample considered separate and filtered by itself?
> > Thanks foe the help in advance
>
> How many samples (columns) do you have?
>
> You should first look at the output of `cpm(d) > 100` to see what
you
> are getting -- this will be a logical (boolean) matrix that has the
> same dimensionality as `cpm(d)`.
>
> rowSums( a logical matrix )
>
> returns a vector that is as long as there are rows in the logical
> matrix, and each value indicates how many columns are TRUE in that
> row.
>
> HTH,
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
> | Memorial Sloan-Kettering Cancer Center
> | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
--
Vittoria Roncalli
Graduate Research Assistant
Center Békésy Laboratory of Neurobiology
Pacific Biosciences Research Center
University of Hawaii at Manoa
1993 East-West Road
Honolulu, HI 96822 USA
Tel: 808-4695693
[[alternative HTML version deleted]]