Entering edit mode
lpascual
▴
50
@lpascual-4906
Last seen 10.3 years ago
Hi all,
I am new with RNA-seq data analysis. I have data from illumina DGE
experiments from 8 different cultivars (duplicates). I also have 4
extra
samples, the F1 hybrids (duplicates), obtained from crossing the 8
cultivars as follows (1x2, 3x4, 5x6, 7x8). In total I have 24 DGE
illumina reactions, where all the samples came from the same
developmental stage and treatment. So the only different factor I have
is the genotype.
I am trying to analyze the data with the edgeR package, however I am
not
sure if I can use the GML function for multiple testings or how should
I
made my design matrix.
To find the differences among the 8 parental lines, I know I can made
two ways testing between the different samples, but I was wondering,
if
I could be able to do a test were the null hypothesis will be there is
no difference between the cultivars, and the alternative hypothesis,
at
least there is difference for one cultivar.
To do that, it is possible to use one design matrix like these:
Cultivar1 Cultivar2 Cultivar3 cultivar4 Cultivar5 Cultivar6 Cultivar7
Cultivar8
1-rep1 1 0 0 0 0 0 0 0
1-rep2 1 0 0 0 0 0 0 0
2-rep1 0 1 0 0 0 0 0 0
2-rep2 0 1 0 0 0 0 0 0
3-rep1 0 0 1 0 0 0 0 0
3-rep2 0 0 1 0 0 0 0 0
4-rep1 0 0 0 1 0 0 0 0
4-rep2 0 0 0 1 0 0 0 0
5-rep1 0 0 0 0 1 0 0 0
5-rep2 0 0 0 0 1 0 0 0
6-rep1 0 0 0 0 0 1 0 0
6-rep2 0 0 0 0 0 1 0 0
7-rep1 0 0 0 0 0 0 1 0
7-rep2 0 0 0 0 0 0 1 0
8-rep1 0 0 0 0 0 0 0 1
8-rep2 0 0 0 0 0 0 0 1
I have try it, but to calculate the null hypothesis the gmlLRT seams
just to remove the last column??
Besides I also want to test the differences between two parent lines
and
the F1 that I have derived from them I have try a design matrix as
follows,
Cultivar1 Cultivar2 F1
1-rep1 1 0 0
1-rep2 1 0 0
2-rep1 0 1 0
2-rep2 0 1 0
F1-rep1 0 0 1
F1-rep2 0 0 1
But I again face the above mentioned problem.
Besides I was wondering if I can employ a design matrix where I will
be
modeling how many alleles of each cultivar have my sample, doing
something like these:
Cultivar1 Cultivar2
1-rep1 2 0
1-rep2 2 0
2-rep1 0 2
2-rep2 0 2
F1-rep1 0 1
F1-rep2 0 1
I will appreciate any advice about the proper way to face these
analysis.
An just an extra question, in the manual it is recommended to
eliminate
all the tags with low counts, as they are no going to be sadistically
significance and they will slow the process. First it is said that the
tags with less than 5 counts are not useful, but then all the tags
with
lees than one tag per million (TPM) are removed, however in my case
having 50 million sequences that is all the tags with less than 50
counts.
In fact when I do it my tags decrease a lot
> dim(d)
[1] 2029802 16
> d <-d[rowSums(1e+06 * d$counts/expandAsMatrix(d$samples$lib.size,
dim(d)) >1) >= 2, ]
> dim (d)
[1] 73310 16
That is normal? Why did you choose to remove less than 1TPM?
Thanks in advance
Laura Pascual
[[alternative HTML version deleted]]