You are right about the vignette, and the help page for readBismark
is about as unhelpful. And I also appreciate you re-posting rather than relying on a comment posted to a years-old post. Although you could have re-posted here what you posted there.
Anyway, in the other post you tried
rrbs <- readBismark("RRBS_C1_S19_bismark_bt2.bismark.cov", "RRBS_C2_S20_bismark_bt2.bismark.cov", colData = DataFrame(row.names = c("control_1", "control_2"), group = "control"))
And you got the predictable error that you had an unused argument. This is because there are only two arguments to readBismark
and you have passed three! The first argument is 'files' and the second is 'colData'. You have used a combination of positional and named arguments (where a positional argument is inferred by its position in the function call, and a named argument is directly named. So you have essentially done this:
rrbs <- readBismark(files = "RRBS_C1_S19_bismark_bt2.bismark.cov", colData = DataFrame(row.names = c("control_1", "control_2"), group = "control"), <some unnamed argument that R has no idea about> = "RRBS_C2_S20_bismark_bt2.bismark.cov")
What is not clear in the help page, and is super not helpful in the vignette is whether or not one can read in more than one file, and if so how one might do that. You could check the function body to see how the 'files' argument is processed, and that would make things clear, but that's not something that should be expected of an end user; hence the existence of help files and stuff.
Anyway, the 'files' argument is supposed to be a character vector of file names that you wish to read in. So the correct thing to do is either
rrbs <- readBismark(files = c("RRBS_C1_S19_bismark_bt2.bismark.cov", "RRBS_C2_S20_bismark_bt2.bismark.cov"), colData = DataFrame(row.names = c("control_1", "control_2"), group = "control"))
Or you can use positional arguments
rrbs <- readBismark(c("RRBS_C1_S19_bismark_bt2.bismark.cov", "RRBS_C2_S20_bismark_bt2.bismark.cov"), DataFrame(row.names = c("control_1", "control_2"), group = "control"))
Or you could mix'n'match, so long as you use a character vector for the first argument.
Thank you for the reply! I think I am getting closer. Now, when I enter
I get an error message that says
Do you have any idea why this might be?
Yes. You are specifying a
DataFrame
with two rows, but only one rows worth of data.Thanks for getting back to me. I tried this:
and I get this output:
I tried it a second time by removing the files = and colData = and got the exact same error message. I think the software is trying to read the colData instead of the files, but I am not sure. How might I fix this?
So when you get a message that says
Why do you think it's doing something wrong rather than that there might not be any records in the file? Did you check "RRBSC1S19bismarkbt2.bismark.cov" to make sure that file actually has anything in it?
Put a different way, consider
Where if I read an empty file,
scan
tells me it's empty.Or more to the point, since
readBismark
usesscan
with a list specifying the "what" argument:Which is exactly what you get, indicating that you have some empty files.
I have checked the files, they are not empty. Each file has approx. 677642 elements, so I am fairly certain they are not empty. Also, if I was attempting to read the "RRBSC1S19bismarkbt2.bismark.cov" and "RRBSC2S20bismarkbt2.bismark.cov" files, shouldn't those be the file names that appear next to processing?
Processing sample "RRBS_C1_S19_bismark_bt2.bismark.cov"
instead ofProcessing sample control_1
?No. You called those samples control_1 and control_2 in your
colData
object, so you have already said they are called 'control_1' and `control_2', not whatever the file name is.Anyway, all
readBismark
does is callscan
on the file, so you can try that yourself to see what happens:I got the readBismark command to work just now. The files had data, but for some reason it wasn't working. I deleted them, and unzipped the .gz files, and now it's working. Thank you so much for your help!