Hello,
I am having some technical difficulties with my Affymetrix gene
expression analysis. I recently received an account on a Linux cluster
because I want to do some analysis on large data sets. I installed the
extra packages I need locally and I compiled them in 64 bit and I
tested them to make sure the libraries load in an R session. I am
using R compiled as a 64 bit application.
I have access to an account on one other cluster as well but I don't
have much disk space left to spare in the other account. I also have
my personal MacBook and at work I have my Windows computer. I would
now like to do my analysis on the new cluster. To start out I want to
put my AffyBatch object on the new system. I tried to transfer several
different ways.
So transferring with command line scp and using non-compressed objects
from Linux to Linux got me the best results, but I will describe the
problem I still have when the AffyBatch object loads.
I am still a bit confused about 32bit vs 64bit systems. Do objects
carry with them them information about the operating system?
Another side note is that I had to load each library separately,
including the dependencies in the proper order. For example:
library (puma, lib.loc= 'path/to_my_local/R_libraries')
will fail if I don't first do
library (ROCR, lib.loc = 'same_path')
library (gtools, lib.loc = 'same_path')
etc.
When I load the data in R on the head node (64bit login) I am able to
load and display the AffyBatch and all the packages load properly to
display the AffyBatch correctly. Then to do my real work I need to
submit a script to the queue. I submit this script to the 64bit
processors. The script copies the R object to the temporary directory
where I am supposed to be doing my work. At this point I use an R CMD
BATCH file to load the AffyBatch object and it does not display the
object properly . It loads the object but it does not have the cdf
information attached to it. When I display the AffyBatch object it
looks like this:
AffyBatch object
size of arrays=1164x1164 features
cdf=HG-U133_Plus_2 (??? affyids)
number of samples=55
Error in getCdfInfo(object) :
Could not obtain CDF environment, problems encountered:
Specified environment does not contain HG-U133_Plus_2
Library - package hgu133plus2cdf not installed
Data for package affy did not contain hgu133plus2cdf
Bioconductor - could not connect
Calls: <anonymous> ... <anonymous> -> cat -> featureNames ->
featureNames -> getCdfInfo
In addition: Warning message:
missing cdf environment! in show(AffyBatch)
Execution halted
Any ideas or clarifications about what is going on would be helpful.
The computer support people don't know much about Bioconductor or R. I
would appreciate any advice or even questions to ask the computer
support people.
Thank you in advance.
Hi Donna,
I don't think this issue is related to whether you are using a 64bit
or
32bit system. When you are installing R packages locally, which is a
good idea, you need to tell R explicitly that from now on it should
look
for packages there, too.
You can do this when attaching the package using the lib.loc argument
as
you apparently do. Threre are, however, better solutions.
One is at start of your R session to add paths to packages using the
function .libPaths (the dot at the beginning is part of the function
name, see
?.libPaths
or on Linux even better, define an environment variable called R_LIBS
in
your shell startup script, which is called ".cshrc" or ".bashrc"
depending on which shell you use with the command
setenv R_LIBS "path/to_your_local/R_libraries" # for csh and tcsh
or something like
export R_LIBS= "path/to_your_local/R_libraries" # or similar for bash,
please check that I am using only tcsh
You can also type that in the command line before starting R and from
then on R will look in that directory as well for installed packages,
which you can check again in your R session typing
.libPaths()
This is also the reason why you cannot display the AffyBatch, since
the
cdf environment is needed to display it and
since the package "hgu133plus2cdf " is not installed in R's standard
package directory but probably elsewhere, it cannot find it.
Using the R_LIBS variable (or the .libPaths function) should solve
that
issue.
Best regards,
Joern
Donna Toleno wrote:
> Hello,
>
> I am having some technical difficulties with my Affymetrix gene
expression analysis. I recently received an account on a Linux cluster
because I want to do some analysis on large data sets. I installed the
extra packages I need locally and I compiled them in 64 bit and I
tested them to make sure the libraries load in an R session. I am
using R compiled as a 64 bit application.
>
> I have access to an account on one other cluster as well but I don't
have much disk space left to spare in the other account. I also have
my personal MacBook and at work I have my Windows computer. I would
now like to do my analysis on the new cluster. To start out I want to
put my AffyBatch object on the new system. I tried to transfer several
different ways.
>
> So transferring with command line scp and using non-compressed
objects from Linux to Linux got me the best results, but I will
describe the problem I still have when the AffyBatch object loads.
>
> I am still a bit confused about 32bit vs 64bit systems. Do objects
carry with them them information about the operating system?
>
> Another side note is that I had to load each library separately,
including the dependencies in the proper order. For example:
>
> library (puma, lib.loc= 'path/to_my_local/R_libraries')
>
> will fail if I don't first do
>
> library (ROCR, lib.loc = 'same_path')
> library (gtools, lib.loc = 'same_path')
>
> etc.
>
> When I load the data in R on the head node (64bit login) I am able
to load and display the AffyBatch and all the packages load properly
to display the AffyBatch correctly. Then to do my real work I need to
submit a script to the queue. I submit this script to the 64bit
processors. The script copies the R object to the temporary directory
where I am supposed to be doing my work. At this point I use an R CMD
BATCH file to load the AffyBatch object and it does not display the
object properly . It loads the object but it does not have the cdf
information attached to it. When I display the AffyBatch object it
looks like this:
>
> AffyBatch object
> size of arrays=1164x1164 features
> cdf=HG-U133_Plus_2 (??? affyids)
> number of samples=55
> Error in getCdfInfo(object) :
> Could not obtain CDF environment, problems encountered:
> Specified environment does not contain HG-U133_Plus_2
> Library - package hgu133plus2cdf not installed
> Data for package affy did not contain hgu133plus2cdf
> Bioconductor - could not connect
> Calls: <anonymous> ... <anonymous> -> cat -> featureNames ->
featureNames -> getCdfInfo
> In addition: Warning message:
> missing cdf environment! in show(AffyBatch)
> Execution halted
>
> Any ideas or clarifications about what is going on would be helpful.
The computer support people don't know much about Bioconductor or R. I
would appreciate any advice or even questions to ask the computer
support people.
>
> Thank you in advance.
>