Hi, I'm using R 2.5.0 on openSUSE 10.2 x86_64.
I'm struggling as to what to use for the chromosome and maploc
arguments for the
CNA function. I've got data from 3 agilent 44k CGH arrays. I've
created the
marrayNorm and Raw objects using read.Agilent. The CNA function usage
is as
follows:
CNA(genomdat, chrom, maploc, data.type=c("logratio","binary"),
sampleid=NULL)
I've read the CNA help page but am still struggling.
For genomdat I've worked out this is mnorm at maM, the average log
ratios, this
will be my data.type.
The column headers in my raw data file include systematic name in the
following
format:
chr3:175483690-175483749
This seems to have been read into my work session using read.Agilent
but how do
use this and it isn't ordered, is this important? I've looked at the
coriell
data example but this is all nicely ordered and the headers are
different to my
data file.
If anyone could point me in the right direction that would be great.
John
jhs1jjm at leeds.ac.uk wrote:
> Hi, I'm using R 2.5.0 on openSUSE 10.2 x86_64.
>
> I'm struggling as to what to use for the chromosome and maploc
arguments for the
> CNA function. I've got data from 3 agilent 44k CGH arrays. I've
created the
> marrayNorm and Raw objects using read.Agilent. The CNA function
usage is as
> follows:
>
> CNA(genomdat, chrom, maploc, data.type=c("logratio","binary"),
> sampleid=NULL)
>
> I've read the CNA help page but am still struggling.
> For genomdat I've worked out this is mnorm at maM, the average log
ratios, this
> will be my data.type.
>
> The column headers in my raw data file include systematic name in
the following
> format:
>
> chr3:175483690-175483749
>
> This seems to have been read into my work session using read.Agilent
but how do
> use this and it isn't ordered, is this important? I've looked at the
coriell
> data example but this is all nicely ordered and the headers are
different to my
> data file.
>
> If anyone could point me in the right direction that would be great.
This is the information that you will need to use, yes. It contains
the
chromosome and location information. You will need to manipulate this
column to get the chromosome and locations into separate columns. You
can do this in R or in Excel.
Sean
jhs1jjm at leeds.ac.uk wrote:
> Could you possibly tell me what functions/package I need to look at
in R in
> order to do this as I do not have excel and may well need to handle
data that
> exceeds the maximum number of rows in openoffice.
`extractAgilentInfo` <-
function(charvec) {
tmp <- do.call(rbind,strsplit(charvec,':')) #split chrom from
locations
tmp2 <- do.call(rbind,strsplit(tmp[,2],'-')) #split locations
tmp3 <- sub('chr','',tmp[,1]) #convert to numeric chromosome if
wanted
tmp3[tmp3=='X'] <- 23 # May need to change these numbers to
tmp3[tmp3=='Y'] <- 24 # match your species
tmp3 <- as.integer(tmp3)
tmp[is.na(tmp3),1] <- NA
return(data.frame(chromosome=tmp[,1],location=as.integer(tmp2[,1]),Num
Chrom=tmp3))
}
Use like so:
agilentInfo <- extractAgilentInfo(as.character(rawdat$SystematicName)
And you will get back a data.frame of what you need, I think.
Sean
jhs1jjm at leeds.ac.uk wrote:
> Sean,
>
> Thanks for that. Couldn't get it to work but not too worry as I
wouldn't want to
> take credit for writing a function like that and my tutor wouldn't
expect it.
> Someone has written some perl code to do it for him but I want to
get to grips
> with R. I've tried to decipher what you've done and daresay I can
get there
> although in a slightly long winded method. I can bring up the
Systematic names
> with the following:
>
> x <- manorm at maGnames@maInfo[,3]
>
> I've had a look at the strsplit help:
>
> ch_loc_split <- strsplit(x,":")
>
> I'll have a look at the rest of the code and functions you've used
then get back
> to you. If there's any potential pitfalls for a newbie then by all
means let me
> know.
>
Jim,
What did you try and what didn't work? Error messages and actual
commands will help here.
Sean
jhs1jjm at leeds.ac.uk wrote:
> Sean,
>
> Awesome, seems to have worked. There were 2 warnings, NAs introduced
by
> coercion. Just changed the end (after messing around with importing
the raw
> data) to the marray object as follows:
>
> agilentInfo <- extractAgilentInfo(as.character(mnorm at
maGnames@maInfo[,3]))
>
> Guessing that would have taken me a while to work out. Is there any
reason why
> this wouldn't work for the 244k array just for future reference?
Will try start
> the DNAcopy analysis now.
>
John,
Great to hear that it worked for you. The NAs introduced are expected
and are associated with the control probes on the array. It should
work
just fine for all Agilent arrays as long as the systematic name is in
the same format. Agilent is actually pretty good about keeping things
stable over different arrays and over time.
Sean
P.S. In the future, feel free to reply back to the list. Doing so
allows everyone to learn from the interaction and has the added
benefit
of creating a lasting record of any answers in the archive.