Question

mate pairs read counts as input for DESeq2

0

Entering edit mode

jnaviapelaez ▴ 10

@jnaviapelaez-23060

Last seen 5.1 years ago

I am new to R an DESeq2. I am trying to analyze differential gene expression of a paired-end RNAseq experiment. I did not do the alignment and I am starting from a gene count matrix and a colData that I made

I want to collapse replicates however I am in doubt on how to deal with the mate or paired-end counts for each sample.

my count matrix looks like this (I haven't filter for low reads): L001 and L002 represent R1 and R2

  Y1_L001 Y1_L002 Y2_L001 Y2_L002 Y3_L001 Y3_L002 Y4_L001 Y4_L002 Y5_L001
ENSMUSG00000102693       0       0       0       0       0       0       0       0       0
ENSMUSG00000064842       0       0       0       0       0       0       0       0       0
ENSMUSG00000051951       0       0       0       0       0       0       0       0       0
ENSMUSG00000102851       0       0       0       0       0       0       0       0       0
ENSMUSG00000103377       0       0       0       0       7      12       0       0       0
ENSMUSG00000104017       0       0       4       0       0       0       0       0       0
                   Y5_L002 Y6_L001 Y6_L002 Y7_L001 Y7_L002 Y8_L001 Y8_L002 Y9_L001 Y9_L002
ENSMUSG00000102693       0       0       0       0       0       0       0       0       0
ENSMUSG00000064842       0       0       0       0       0       0       0       0       0
ENSMUSG00000051951       0       0       0       0       0       0       0       0       0
ENSMUSG00000102851       0       0       0       0       0       0       0       0       0
ENSMUSG00000103377       0       0       0       0       0      20      21      30      38
ENSMUSG00000104017       0       0       0       0       0       0       0       0       0

.... and so on

my coldata looks like this

 sample donor condition
Y1_L001    WT1    WT     naive
Y1_L002    WT1    WT     naive
Y2_L001    WT1    WT     naive
Y2_L002    WT1    WT     naive
Y25_L001    WT2    WT     cisplatin
Y25_L002    WT2    WT     cisplatin

and so on.. for different donor and condition

questions: should I collapse the count for the 2 mate reads first, or choose one? if so which one why do I have the counts for each mate, shouldn't they be just one for each sample? or should I collapse replicates of a sample but all R1(L001) replicates and R2(L002) replicates of the sample? If I do that should I need to change my coldata dataframe and add a column specifying the sample mate?

the data were aligned using star and both ends but I am not understanding the output or how to use it for downstream analysis.

appreciate any help,

deseq2 • 497 views

ADD COMMENT • link updated 5.1 years ago by James W. MacDonald 68k • written 5.1 years ago by jnaviapelaez ▴ 10

score 1 · Answer 1 · 2020-03-06

Generally speaking, a sample name like Y1_L001 looks more like 'sample Y1, from lane 1', and Y2_L002 would then be 'sample Y1 from lane 2', which is a different thing than read counts from the paired end data that were aligned separately.

But this isn't the place to ask about what's up with your data, because how would we know? You need to talk to whomever did the alignment and have them tell you if those are the same data on different lanes (in which case you should sum the counts) or the paired ends aligned separately.