I have a read count for 34 samples including biological replicate. I need to make dendogram from read count to show correlation between biological replicates.
Please share biocundutor packgae name and code.
Do I need to do normalization before dendogram construction?
Thanks
R shouldn't segfault; can you add the output of sessionInfo() to your post? Here's mine
> sessionInfo()
R version 3.4.0 Patched (2017-05-24 r72729)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS
Matrix products: default
BLAS: /home/mtmorgan/bin/R-3-4-branch/lib/libRblas.so
LAPACK: /home/mtmorgan/bin/R-3-4-branch/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.4.0
It seems that you have made a very large distance matrix, with
> 48538 * (48538 - 1) / 2
[1] 1177944453
elements. Likely this is causing an overflow in R's internal code. But it doesn't really make sense to cluster across all genes, many of which will have extremely small influence on the result. Instead filter based on appropriate criteria, e.g., selecting the most variable (matrixStats::rowVars()) or more sophisticated.
I am trying following code to make dendogram based on raw read count but getting some error:
nonzero_row <- mydatanew[rowSums(mydatanew) > 0, ] # removed 0 read count across the the all column
> dim(nonzero_row)
[1] 48538 33
> str(nonzero_row)
'data.frame': 48538 obs. of 33 variables:
$ 216_5W_Ca1: int 100 0 0 8 285 0 253 0 0 339 ...
$ 216_5W_Ca2: int 71 0 0 48 258 0 204 0 0 484
x1= as.matrix(nonzero_row) # converted x into matrix
> x=log2(x1+1) # transfrom read count into log value
> head(x)
> d <- dist(x, method="euclidean")
> h <- hclust(d, method="complete")
Error:
*** caught segfault ***
address 0x7f8ca9becf28, cause 'memory not mapped'
Traceback:
1: hclust(d, method = "complete")
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:
Thanks
Can anyone suggest why I am getting the error in above code?
R shouldn't segfault; can you add the output of sessionInfo() to your post? Here's mine
It seems that you have made a very large distance matrix, with
elements. Likely this is causing an overflow in R's internal code. But it doesn't really make sense to cluster across all genes, many of which will have extremely small influence on the result. Instead filter based on appropriate criteria, e.g., selecting the most variable (
matrixStats::rowVars()
) or more sophisticated.