Starting from featureCounts generated raw counts file, I used edgeR to estimate the DE analysis and it went well. Now I use CPM normalized files to explore some specific genes expression in multiple pathways. I am aware that CPM are corrected for library size without considering gene length. Is that OK to use this file for individual gene analysis and generate plots for publication OR do I need another normalized file? Keeping it in mind, I was trying to get RPKM normalized file. But even after reading similar posts, I am not sure how can I get input gene length to rpkm() function. This discussion tells that recent version of edgeR can directly find gene length from DGEList object. I am using edgeR_3.28.1 and can anyone direct me how to get the gene length so that I can export RPKM? Related info: I downloaded rice genome from MSU and reference assembly was done with Hisat2. Currently, I have only raw counts files with me(ie, no .bam files available).
Here is the code I used to generate CPM. normalization,
raw_counts<-read.delim("rawcounts.txt",row.names="Locus",check.names = TRUE)
targets<-read.table("targets.txt",header=T,sep="\t")
group<-factor(paste(targets$Genotype,targets$Time,targets$Treatment,sep="."))
cbind(targets,Group=group)
y<-DGEList(counts = raw_counts, group = group)
#Filterout low count genes
keep <-rowSums(cpm(y)>=2) >=2
y <- y[keep, , keep.lib.sizes=FALSE]
y<-calcNormFactors(y)
CPM<-cpm(y)
#How can I incorporate gene length in rpkm()?
RPKM<-rpkm(y)
Hello all,
I got this error when run the code above:
Error in data[[rowvar]] : attempt to select less than one element in get1index
Would you suggest a solution to fix it? Thank you.
Update: I figured it out.
Hi @aniking. Would you tell me where to get targets.txt? I am trying to calculate RPKM too. Thank you so much!