Question

Dendogram from read count to show correlation between biological replicate and samples

0

Entering edit mode

Jitendra ▴ 10

@nabiyogesh-11718

Last seen 10 months ago

United Kingdom

I have a read count for 34 samples including biological replicate. I need to make dendogram from read count to show correlation between biological replicates. Please share biocundutor packgae name and code. Do I need to do normalization before dendogram construction? Thanks

bioconductor • 1.0k views

ADD COMMENT • link updated 7.9 years ago by James W. MacDonald 68k • written 7.9 years ago by Jitendra ▴ 10

score 0 · Answer 1 · 2017-06-09

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 2 hours ago

United States

You can find an example of an RNA-Seq analysis here.

ADD COMMENT • link 7.9 years ago James W. MacDonald 68k

0

Entering edit mode

I am trying following code to make dendogram based on raw read count but getting some error:

nonzero_row <- mydatanew[rowSums(mydatanew) > 0, ] # removed 0 read count across the the all column

> dim(nonzero_row)

[1] 48538 33

> str(nonzero_row)

'data.frame': 48538 obs. of 33 variables:

$ 216_5W_Ca1: int 100 0 0 8 285 0 253 0 0 339 ...
$ 216_5W_Ca2: int 71 0 0 48 258 0 204 0 0 484

x1= as.matrix(nonzero_row) # converted x into matrix
> x=log2(x1+1) # transfrom read count into log value
> head(x)

> d <- dist(x, method="euclidean")

> h <- hclust(d, method="complete")

Error:

*** caught segfault ***
address 0x7f8ca9becf28, cause 'memory not mapped'

Traceback:
1: hclust(d, method = "complete")

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:

Thanks

ADD REPLY • link 7.9 years ago Jitendra ▴ 10

0

Entering edit mode

Can anyone suggest why I am getting the error in above code?

ADD REPLY • link 7.9 years ago Jitendra ▴ 10

0

Entering edit mode

R shouldn't segfault; can you add the output of sessionInfo() to your post? Here's mine

> sessionInfo()
R version 3.4.0 Patched (2017-05-24 r72729)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

Matrix products: default
BLAS: /home/mtmorgan/bin/R-3-4-branch/lib/libRblas.so
LAPACK: /home/mtmorgan/bin/R-3-4-branch/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.4.0

It seems that you have made a very large distance matrix, with

> 48538 * (48538 - 1) / 2
[1] 1177944453

elements. Likely this is causing an overflow in R's internal code. But it doesn't really make sense to cluster across all genes, many of which will have extremely small influence on the result. Instead filter based on appropriate criteria, e.g., selecting the most variable (matrixStats::rowVars()) or more sophisticated.

ADD REPLY • link 7.9 years ago Martin Morgan 25k