I am doing DE analysis using DESeq2, and found a new error when I re-ran my commands from tximport to make the raw count file again.
I am getting a new parsing error that results in a failure to select columns from the loaded feature_table to create the tx2gene object.
Based on the error message, it looks like it fails when it encounters genes on the "X" and "Y" chromosomes and is anticipating a double object - the alphabetical object is returning an error?
This did not happen before, and my code has not changed, but I would like to sort it out so I can be sure my results are reproducible.
Below is the code:
>feat_table <- read_tsv('GCF_000001405.39_GRCh38.p13_feature_table.txt')
Parsed with column specification:
cols(
.default = col_character(),
chromosome = col_double(),
start = col_double(),
end = col_double(),
`non-redundant_refseq` = col_logical(),
GeneID = col_double(),
locus_tag = col_logical(),
feature_interval_length = col_double(),
product_length = col_double()
)
See spec(...) for full column specifications.
|=================================================================================| 100% 62 MB
Warning: 12784 parsing failures.
row col expected actual file
317601 chromosome a double X 'GCF_000001405.39_GRCh38.p13_feature_table.txt'
317602 chromosome a double X 'GCF_000001405.39_GRCh38.p13_feature_table.txt'
317603 chromosome a double X 'GCF_000001405.39_GRCh38.p13_feature_table.txt'
317604 chromosome a double X 'GCF_000001405.39_GRCh38.p13_feature_table.txt'
317605 chromosome a double X 'GCF_000001405.39_GRCh38.p13_feature_table.txt'
...... .......... ........ ...... ...............................................
See problems(...) for more details.
> feat_table <- dplyr::select(feat_table, feat_table$product_accession, feat_table$symbol, feat_table$GeneID)
Error: Can't subset columns that don't exist.
x The columns NA, etc. don't exist.
Run `rlang::last_error()` to see where the error occurred.
The remainder of my code, to create the tx2gene object for use with tximport and my quant.sf (salmon) alignment files is below:
table(quant$Name %in% feat_table$product_accession)
write.table(feat_table, "humantx2gene.tsv", quote = F, row.names = F, sep = "\t")
tx2gene <- read_tsv("humantx2gene.tsv")
colnames(tx2gene)
unique(x = tx2gene$symbol)
tx2gene <- dplyr::select(tx2gene, "product_accession", "symbol")
#load quant.sf files by directory location in the files$files column of samples doc
fx <- tximport(files = files$files, type = "salmon", tx2gene = tx2gene) #, ignoreTxVersion=TRUE)
fxcounts <- fx$counts
# change rownames for provenance
colnames(fxcounts) <- files$sample
# write to file
write.csv(fxcounts, "temp/4.13.20_3primeTagSeq_Salmon_nonribo_counts_attempttorecreate_01.csv", quote = F, row.names = T)
OK I'll continue looking, thank you!