Here is my clean data , I could not post the dput here
https://gist.github.com/anonymous/1f8788a5f0f3c40e55995d5c303970c6
Here I try to find up and down regulated genes based on LFQ intensities using limma
design <- model.matrix(~c(rep(1,2),rep(0,2))) fit <- lmFit(data, design) fit2 <- eBayes(fit) myt <- topTable(fit2, coef=2, n=Inf)
which are empty , it is because I don't have any adj.P.Val smaller than 0.05 but I don't know what criteria to select
where do I make mistake ??
@Laurent Gatto it is right. I guess having the zeros for some proteins comes back to the fact that I analysis few groups of samples together. So, they might have not found for all samples of a group but could have intensities for another group.
I read somewhere that he discarded proteins that had less than 50% zero values means I have 4 samples here and if there is not intensities for equal or more than 2 samples then I discard them. However, I am afraid how much this assumption hold because we have 4 samples 2 control and 2 treated. which means if I have one intensity value out of 4 in treated one, it might be ok! No?
that is why I removed all genes which had no intensities over all samples
do you have any suggestion ?
The number of zeros in your data is concerning. Debating on the number of allowed 0s is not going to help, because filtering is not going to fix your issue. You should probably assess your data processing strategy in the light of this problem.
@Laurent Gatto I accepted your answer and I appreciate your help. I found were those zeros are coming from and I solved the issue.
however, I have two questions which are off topic here but seems like you know proteomics and I wanted to ask if you know or not. In a label free quantification. I have used MaxQuant and I identified many proteins. however, some of the genes are missing for some proteins , how do you handle this when you want to do pathway analysis using IPA?
The other question is that when you want to do pathway analysis using IPA, do you use the LFQ intensities for control with all samples (biological replicate) and treated with all samples (biological replicate) or do you take the average of them and then perform pathway analysis ?
I am not familiar with IPA, so can't comment on that aspect.
I am not sure what leads to the absence of gene names. Where do the other ones come from? An online query, the protein fasta file, ...? I guess that tracking the provenance of that information will give a clue about the absence of some gene names.