Dear community, I have encountered a problem when trying to import the data using tximport
From what I know about the data, it is obtained through RSEM technology, but the columns are a little different from what I read in the forum. First of all, I have 100 files of .gene.fpkm.csv and another 100 of .transcript.fpkm.csv "nature", which from the beginning are rather different from the tximport and tximportData vignette files.
What do I mean?
- Files in tximportData look like .genes.results.gz and .isoforms.results.gz
- my files, even if they are .csv are tab-delimited
- they don't have an "effective_length" column only the "length" column, no "TPM" column only "FPKM" column, and the "gene_id" column is a number (question 1 - does the "gene_id" column have to be character datatype? question 2 - is there a problem that I don't have the "effective_length" column?
- Header looks like this: gene_id transcript_id(s) length expected_count FPKM SymbolID Cellular Component Molecular Function Biological Process Kegg Orthology Nr Description Desc,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
- str() -- of any file is very convoluted and not smooth like tximportData files - if it is necessary I'll show
- Do these files need to be cleaned more?? If Yes, how?
- I know that the tximport pipeline is recommended, but if I can extract "expected_count" columns from the files using another method and use them with DESeqDataSetFromMatrix, is it a problem? (some other guy told me that the expected_count column is not the same as raw_counts so it's not the same).
library(readr)
library(tximportData)
library(tximport)
dir <- "data/CountData"
list.files(dir)
samples <- read.csv(file.path("data", "colData.csv"), header = TRUE)
samples
files <- file.path(dir, paste0(samples$Sample, "_PBMC_24hLPS.gene.fpkm.csv"))
files
names(files) <- paste0(samples$Sample)
txi.rsem <- tximport(files, type = "none", txIn = F, txOut = F,
geneIdCol = "gene_id", abundanceCol = "FPKM",
lengthCol = "length", countsCol = "expected_count")
output:
reading in files with read_tsv
1 --- >"silent!!!"
txi.rsem <- tximport(files, type = "rsem", txIn = F, txOut = F)
output:
reading in files with read_tsv
1 Error in computeRsemGeneLevel(files, importer, geneIdCol, abundanceCol, :
all(c(geneIdCol, abundanceCol, lengthCol) %in% names(raw)) is not TRUE
In addition: Warning message:
One or more parsing issues, call `problems()` on your data frame for details, e.g.:
dat <- vroom(...)
problems(dat)