Entering edit mode
Hello everyone, I am struggling to get an FPKM matrix using countToFPKM library. everything seems running here is my script I get error.
library("devtools")
library("biomaRt")
library("dplyr")
library(countToFPKM)
file.readcounts<- as.matrix(read.csv("Matrix.csv", header = TRUE, row.names = 1))
nrow(file.readcounts)
ens_build = "sep2015"
dataset="hsapiens_gene_ensembl"
mart <- useEnsembl(biomart = "ENSEMBL_MART_ENSEMBL", dataset = dataset, version = 80)
gene.annotations <- biomaRt::getBM(mart = mart, attributes=c("ensembl_gene_id", "external_gene_name",
"start_position", "end_position"))
gene.annotations <- dplyr::transmute(gene.annotations, external_gene_name, ensembl_gene_id, length = end_position - start_position)
convert order column in gene.annotation and make first column as row.name
# Filter and re-order gene.annotations to match the order in feature counts matrix
gene.annotations <- gene.annotations %>% dplyr::filter(gene.annotations$ensembl_gene_id %in% row.names(file.readcounts))
gene.annotations <- gene.annotations[order(match(gene.annotations$ensembl_gene_id, rownames(file.readcounts))),]
# Assign feature lenghts into a numeric vector.
featureLength <- gene.annotations$length
the future length seems to be ok as integer value but at the end here is the error
fpkm_matrix <- fpkm (file.readcounts, featureLength=featureLength, meanFragmentLength=NULL)
Error in fpkm(file.readcounts, featureLength = featureLength, meanFragmentLength = NULL) :
length(featureLength) == nrow(counts) is not TRUE
I can't understand the error the match function seems working fine...
I accept any help, thank you
Hi swbarnes2 I was following the script suggested by the author of the library(countToFPKM) to do that
here it is:
https://github.com/AAlhendi1707/countToFPKM/issues/2
he uses BioMart for extracting gene length informations
thanks
Just because someone posts code doesn't mean it makes sense. Are you really sure that you want gene lengths, and not transcript lengths?
you're right transcript length still has introns though maybe would be better to have exons lengths
thanks
Are you sure that just adding up every single exon of a gene is correct?
What else should I take in consideration? generally for expression analysis as far as I know is considered the processed transcript
Genes in eukaryotes do not have one single processed transcript per gene. They have many, and they can be different lengths.
What I'm trying to get across to you is that you cannot generate FPKM from gene counts alone.
What tools do you suggest to introduce all this variables and for a better analysis? I m using what the "market" offers
Since no one knows what your end analysis goal is, no one can help you. All I can tell you is you can't correct for transcript lengths using gene counts.