How to filter all column using Loop
2
0
Entering edit mode
@kritikamish99-9648
Last seen 6.3 years ago
India

I have 100 columns 1 column is name and 2 to 100 samples values I want to filter out column 2 to 100 with certain threshold say >=5 using loop. This will iterate for new file name corresponding to sample name.

for example filtering value above 5

Col1 Col2 Col3 Col4 Col5 Col6 Col7
A 5 1 1 2 4 1    
B 6 2 2 5 3 6
C 7 3 3 8 9 3
D 8 4 6 9 1 3

Output (file name will be Col2)

Col1 Col2
B 6
C 7
D 8

This has to be repeated for all the column

loop filter • 4.1k views
ADD COMMENT
0
Entering edit mode

Do you want all positions with < 5 to be replaced by NA values ?

ADD REPLY
0
Entering edit mode
@hauken_heyken-13992
Last seen 2.1 years ago
Bergen

I dont know if I understand your question correctly, but this will work, if you want to only keep columns with rows >= 5:

 

library(data.table)

a = c(1,2,3,4,5)
b = c(4,5,6,7,8)
c = c(6,7,8,9,10)

e = c(1,2,3,4,4) # <--- This column will be filtered out, because non is >= 5

d = as.data.table(cbind(a,b,c,e))

indexesToRemove = lapply(1:ncol(d),function(x) ifelse(sum(d[,as.integer(x), with = F] >= 5 ) > 0,as.integer(x), NA ))

indexesToRemove = indexesToRemove[!is.na(indexesToRemove)]

d = d[,unlist(A), with = F] #<--- Columns with >= 5 is now here
output:

   a b  c
1: 1 4  6
2: 2 5  7
3: 3 6  8
4: 4 7  9
5: 5 8 10
ADD COMMENT
0
Entering edit mode

Hi hauken_heyken

my query is

suppose i have table with column a , b, c ,d

a=c("A","B","C","D","E","F","G")

b=c(1,2,3,4,5,8,10)

c=c(1,2,3,4,6,7,10)

d=c(1,2,10,5,12,15,10)

e = as.data.table(cbind(a,b,c,d))

 

What i want is 3 files with name b, c, d because its a column name.

The file "b" will have  values 1st column as E,F,G from 2nd column will be filtered value  5,8,10

file "c" will be 1 st column 1st column as E,F,G and 2nd column will be filtered value 6,7,10

file "d" will be 1st column as C,D,E,F,G and  2nd column will be filtered value 10,5,12,15,10

 

 

ADD REPLY
1
Entering edit mode

Ah, okey, now it makes sence. Then this will work:

New version is:

library(data.table)

b=c(1,2,3,4,5,8,10)

c=c(1,2,3,4,6,7,10)

e=c(1,2,10,5,12,15,10)

d = as.data.table(cbind(b,c,e))

indexesToRemove = lapply(1:ncol(d),function(x) ifelse(d[,as.integer(x), with = F] >= 5,as.integer(x), NA ))

#<----Changed the lapply now, to not sum

#now save the cbind with a:

a=c("A","B","C","D","E","F","G")

# ---> remember to set setwd:  setwd("...Location of files to be saved..")

for(i in ncol(d)){

  out = cbind(d[!is.na(unlist(indexesToRemove[i])),i, with = F],a =  a[!is.na(unlist(indexesToRemove[i]))])

  write.csv(x = out,file = paste0(names(out[,1]), ".csv"), row.names = F) #<--- Remove row.names

   #<--- Choose something else if csv is not format

}

First file created will be b.csv, and looks like this:

b a
5 E
8 F
10 G
ADD REPLY
0
Entering edit mode

Hi Hauken_Heyken

Thank you . The Code is working !!

But its not giving me file b . Out result is the table "e" values

also how will iterate for all the columns (file b and c )

ADD REPLY
0
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States

Create a data.frame directly. Using cbind() causes the numeric values to be represented as character vectors, which is not desired. o value in using data.table in the current example

e = data.frame(a,b,c,d)

A data.frame is a list of vectors, so iterate over the columns that you're interested in, i.e., all but the first

result <- lapply(e[-1], function(value) value[value >= 5])

Create files with

for (fname in names(result))
    write.csv(data.frame(result[[fname]]), fname)

but that doesn't seem like a useful thing to do.

A "tidy" approach is to gather the original data.frame and then filter on the column of values, no iteration involved.

library(tidyverse)
gather(e, "filename", "value", -1) %>% filter(value >= 5)

This isn't a Bioconductor question so should be asked elsewhere, on StackOverflow or the R-help mailing list for instance (checking first that similar questions have not already been asked).

ADD COMMENT
0
Entering edit mode

Hi Martin Morgan

I agree its not bioconductor questions. Actually i have Gene expression data . And it has 103 samples with FC value and 20000 probes

What i wanted is filtering all sample at cut off 1.5 FC . So i queried here.

 

 

ADD REPLY

Login before adding your answer.

Traffic: 821 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6