Question

How to filter all column using Loop

0

Entering edit mode

kritikamish99 ▴ 10

@kritikamish99-9648

Last seen 6.8 years ago

India

I have 100 columns 1 column is name and 2 to 100 samples values I want to filter out column 2 to 100 with certain threshold say >=5 using loop. This will iterate for new file name corresponding to sample name.

for example filtering value above 5

Col1 Col2 Col3 Col4 Col5 Col6 Col7
A 5 1 1 2 4 1    
B 6 2 2 5 3 6
C 7 3 3 8 9 3
D 8 4 6 9 1 3

Output (file name will be Col2)

Col1 Col2
B 6
C 7
D 8

This has to be repeated for all the column

loop filter • 4.3k views

ADD COMMENT • link written 7.5 years ago by kritikamish99 ▴ 10

0

Entering edit mode

Do you want all positions with < 5 to be replaced by NA values ?

ADD REPLY • link 7.5 years ago hauken_heyken ▴ 80

score 0 · Answer 1 · 2017-11-14

0

Entering edit mode

hauken_heyken ▴ 80

@hauken_heyken-13992

Last seen 2.5 years ago

Bergen

I dont know if I understand your question correctly, but this will work, if you want to only keep columns with rows >= 5:

library(data.table)

a = c(1,2,3,4,5)
b = c(4,5,6,7,8)
c = c(6,7,8,9,10)

e = c(1,2,3,4,4) # <--- This column will be filtered out, because non is >= 5

d = as.data.table(cbind(a,b,c,e))

indexesToRemove = lapply(1:ncol(d),function(x) ifelse(sum(d[,as.integer(x), with = F] >= 5 ) > 0,as.integer(x), NA ))

indexesToRemove = indexesToRemove[!is.na(indexesToRemove)]

d = d[,unlist(A), with = F] #<--- Columns with >= 5 is now here
output:

   a b  c
1: 1 4  6
2: 2 5  7
3: 3 6  8
4: 4 7  9
5: 5 8 10

ADD COMMENT • link 7.5 years ago hauken_heyken ▴ 80

0

Entering edit mode

Hi hauken_heyken

my query is

suppose i have table with column a , b, c ,d

a=c("A","B","C","D","E","F","G")

b=c(1,2,3,4,5,8,10)

c=c(1,2,3,4,6,7,10)

d=c(1,2,10,5,12,15,10)

e = as.data.table(cbind(a,b,c,d))

What i want is 3 files with name b, c, d because its a column name.

The file "b" will have values 1st column as E,F,G from 2nd column will be filtered value 5,8,10

file "c" will be 1 st column 1st column as E,F,G and 2nd column will be filtered value 6,7,10

file "d" will be 1st column as C,D,E,F,G and 2nd column will be filtered value 10,5,12,15,10

ADD REPLY • link 7.5 years ago kritikamish99 ▴ 10

1

Entering edit mode

Ah, okey, now it makes sence. Then this will work:

New version is:

library(data.table)

b=c(1,2,3,4,5,8,10)

c=c(1,2,3,4,6,7,10)

e=c(1,2,10,5,12,15,10)

d = as.data.table(cbind(b,c,e))

indexesToRemove = lapply(1:ncol(d),function(x) ifelse(d[,as.integer(x), with = F] >= 5,as.integer(x), NA ))

#<----Changed the lapply now, to not sum

#now save the cbind with a:

a=c("A","B","C","D","E","F","G")

# ---> remember to set setwd:  setwd("...Location of files to be saved..")

for(i in ncol(d)){

  out = cbind(d[!is.na(unlist(indexesToRemove[i])),i, with = F],a =  a[!is.na(unlist(indexesToRemove[i]))])

  write.csv(x = out,file = paste0(names(out[,1]), ".csv"), row.names = F) #<--- Remove row.names

   #<--- Choose something else if csv is not format

}

First file created will be b.csv, and looks like this:

b	a
5	E
8	F
10	G

ADD REPLY • link 7.5 years ago hauken_heyken ▴ 80

0

Entering edit mode

Hi Hauken_Heyken

Thank you . The Code is working !!

But its not giving me file b . Out result is the table "e" values

also how will iterate for all the columns (file b and c )

ADD REPLY • link 7.5 years ago kritikamish99 ▴ 10

score 0 · Answer 2 · 2017-11-15

Create a data.frame directly. Using cbind() causes the numeric values to be represented as character vectors, which is not desired. o value in using data.table in the current example

e = data.frame(a,b,c,d)

A data.frame is a list of vectors, so iterate over the columns that you're interested in, i.e., all but the first

result <- lapply(e[-1], function(value) value[value >= 5])

Create files with

for (fname in names(result))
    write.csv(data.frame(result[[fname]]), fname)

but that doesn't seem like a useful thing to do.

A "tidy" approach is to gather the original data.frame and then filter on the column of values, no iteration involved.

library(tidyverse)
gather(e, "filename", "value", -1) %>% filter(value >= 5)

This isn't a Bioconductor question so should be asked elsewhere, on StackOverflow or the R-help mailing list for instance (checking first that similar questions have not already been asked).