Hello,
I am using the EBSeq to look at differential expression in RNA seq data. I have 99 samples and 20531 genes and four conditions. I made my ran the code largely following the example in the "Gene level DE analysis (more than two conditions)" from the EBSeq vignette pdf. Specifically, for my conditions matrix "Conditions" and my sequencing counts matrix "data", I ran:
PosParti=GetPatterns(Conditions) MultiSize=MedianNorm(data) MultiOut=EBMultiTest(data,NgVector=NULL,Conditions=Conditions,AllParti=PosParti,sizeFactors=MultiSize, maxround=10) MultiPP=GetMultiPP(MultiOut) MultiFC=GetMultiFC(MultiOut)
When I looked at my results, I saw many rows of the MultiOut$PPMat as well as the corresponding rows of MultiPP had NaN values for all conditions. There were three particular genes I cared about and all three were NaN. When I looked at the data matrix for these three values, there were a decent number of 0s, but there was plenty of samples with expression as well.
Can anyone help me understand what these NaN values mean? I have included the expression matrix for the genes of interest below. The sample names is on the top, and the gene (designated G#) on the side with the counts in the middle. If you want the conditions matrix, I can include that as well, but I rean out of characters.
Thank you!
DATA
06-0171 26-1442 06-0747 06-5414 14-1829 06-0190 26-5139 06-0221 G1 5 1 5 249 2 18 27 0 G2 5 0 1 0 1 0 3 1 G3 0 0 0 0 1 0 0 17 27-1832 12-3652 06-0211 06-0129 12-0616 06-0882 06-0210 06-0178 G1 43 79 6 3 1 0 441 3 G2 0 5 3 1 2 0 40 0 G3 0 0 0 22 1 1 1 0 14-1034 06-0174 27-2528 06-0125 19-4065 26-5132 14-1825 15-0742 G1 0 115 138 0 1 3 1 0 G2 0 6 1 0 4 0 0 0 G3 0 0 1 2 0 0 0 0 06-5408 06-5410 26-5136 06-0750 14-0789 06-5413 26-5134 08-0386 G1 1 2 5 0 54 53 0 6 G2 0 0 0 0 0 1 0 10 G3 2 0 0 0 0 1 0 5 19-2620 41-5651 14-0871 41-4097 06-0158 06-0187 12-3650 06-0219 G1 0 297 7 30 221 0 8 114 G2 0 4 883 0 0 0 300 0 G3 1 1 5 0 0 0 1 0 19-2629 27-1835 27-1834 14-0817 06-0745 06-0743 06-5858 06-0156 G1 7 4 37 2 55 510 1 124 G2 0 0 3 4 0 10 11 0 G3 0 1 1 0 1 0 0 0 27-2519 06-0141 06-5416 27-1837 06-0138 06-0645 06-5856 06-0184 G1 2 1 1 0 104 3 21 0 G2 0 0 0 0 6 0 0 0 G3 5 0 1 1 0 0 0 0 06-0152 06-5859 12-3653 19-0957 27-2524 06-0878 02-0047 15-1444 G1 0 1383 3 50 1 0 0 1 G2 0 70 1 1 0 0 0 3 G3 1 1 0 465 2 1 1 28 06-5411 27-2521 41-2571 06-0238 14-0787 06-0649 19-2619 14-1823 G1 2 4 1 281 153 6 8 0 G2 0 5 0 22 0 0 0 0 G3 2 1 0 2 0 0 0 0 06-5417 06-5415 27-2523 19-5960 26-5135 06-0130 12-0619 06-0744 G1 0 3 1 305 6 9 4 27 G2 1 0 1 0 0 2 992 3 G3 4 0 10 0 304 0 2 0 06-5412 06-0644 27-1831 19-2624 06-0646 19-1389 14-1402 41-3915 G1 2 24 0 55 1 4 83 2 G2 0 0 2 1 0 0 3 0 G3 0 0 1 2 6 0 0 1 06-0168 12-0618 12-0821 06-5418 06-0157 06-0749 27-2526 27-1830 G1 19 631 4 1 0 0 1 2 G2 0 25 1 0 14 0 0 1 G3 0 1 2728 0 3 0 0 8 06-0686 41-2572 19-2625 G1 0 2 0 G2 4 0 0 G3 5 0 0
I don't know if this is the answer, but you might check the 0.75 quantile for these rows. From the EBSeq homepage:
"2014-1-30 In EBSeq 1.3.3, the default setting of EBTest function will remove low expressed genes (genes whose 75th quantile of normalized counts is less than 10)"
I tried to follow up on Michael Love' suggestion, but when I changed
PoolLower=0, PoolUpper = 1
to include everything, I still got all NaNs.