I am new to R an DESeq2. I am trying to analyze differential gene expression of a paired-end RNAseq experiment. I did not do the alignment and I am starting from a gene count matrix and a colData that I made
I want to collapse replicates however I am in doubt on how to deal with the mate or paired-end counts for each sample.
my count matrix looks like this (I haven't filter for low reads): L001 and L002 represent R1 and R2
Y1_L001 Y1_L002 Y2_L001 Y2_L002 Y3_L001 Y3_L002 Y4_L001 Y4_L002 Y5_L001
ENSMUSG00000102693 0 0 0 0 0 0 0 0 0
ENSMUSG00000064842 0 0 0 0 0 0 0 0 0
ENSMUSG00000051951 0 0 0 0 0 0 0 0 0
ENSMUSG00000102851 0 0 0 0 0 0 0 0 0
ENSMUSG00000103377 0 0 0 0 7 12 0 0 0
ENSMUSG00000104017 0 0 4 0 0 0 0 0 0
Y5_L002 Y6_L001 Y6_L002 Y7_L001 Y7_L002 Y8_L001 Y8_L002 Y9_L001 Y9_L002
ENSMUSG00000102693 0 0 0 0 0 0 0 0 0
ENSMUSG00000064842 0 0 0 0 0 0 0 0 0
ENSMUSG00000051951 0 0 0 0 0 0 0 0 0
ENSMUSG00000102851 0 0 0 0 0 0 0 0 0
ENSMUSG00000103377 0 0 0 0 0 20 21 30 38
ENSMUSG00000104017 0 0 0 0 0 0 0 0 0
.... and so on
my coldata looks like this
sample donor condition
Y1_L001 WT1 WT naive
Y1_L002 WT1 WT naive
Y2_L001 WT1 WT naive
Y2_L002 WT1 WT naive
Y25_L001 WT2 WT cisplatin
Y25_L002 WT2 WT cisplatin
and so on.. for different donor and condition
questions: should I collapse the count for the 2 mate reads first, or choose one? if so which one why do I have the counts for each mate, shouldn't they be just one for each sample? or should I collapse replicates of a sample but all R1(L001) replicates and R2(L002) replicates of the sample? If I do that should I need to change my coldata dataframe and add a column specifying the sample mate?
the data were aligned using star and both ends but I am not understanding the output or how to use it for downstream analysis.
appreciate any help,