Question

Parsing error creating tx2gene object for tximport

0

Entering edit mode

knholm • 0

@knholm-18825

Last seen 4.3 years ago

I am doing DE analysis using DESeq2, and found a new error when I re-ran my commands from tximport to make the raw count file again.

I am getting a new parsing error that results in a failure to select columns from the loaded feature_table to create the tx2gene object.

Based on the error message, it looks like it fails when it encounters genes on the "X" and "Y" chromosomes and is anticipating a double object - the alphabetical object is returning an error?

This did not happen before, and my code has not changed, but I would like to sort it out so I can be sure my results are reproducible.

Below is the code:

>feat_table <- read_tsv('GCF_000001405.39_GRCh38.p13_feature_table.txt')
Parsed with column specification:
cols(
  .default = col_character(),
  chromosome = col_double(),
  start = col_double(),
  end = col_double(),
  `non-redundant_refseq` = col_logical(),
  GeneID = col_double(),
  locus_tag = col_logical(),
  feature_interval_length = col_double(),
  product_length = col_double()
)
See spec(...) for full column specifications.
|=================================================================================| 100%   62 MB
Warning: 12784 parsing failures.
   row        col expected actual                                            file
317601 chromosome a double      X 'GCF_000001405.39_GRCh38.p13_feature_table.txt'
317602 chromosome a double      X 'GCF_000001405.39_GRCh38.p13_feature_table.txt'
317603 chromosome a double      X 'GCF_000001405.39_GRCh38.p13_feature_table.txt'
317604 chromosome a double      X 'GCF_000001405.39_GRCh38.p13_feature_table.txt'
317605 chromosome a double      X 'GCF_000001405.39_GRCh38.p13_feature_table.txt'
...... .......... ........ ...... ...............................................
See problems(...) for more details.

> feat_table <- dplyr::select(feat_table, feat_table$product_accession, feat_table$symbol, feat_table$GeneID)
Error: Can't subset columns that don't exist.
x The columns NA, etc. don't exist.
Run `rlang::last_error()` to see where the error occurred.

The remainder of my code, to create the tx2gene object for use with tximport and my quant.sf (salmon) alignment files is below:

table(quant$Name %in% feat_table$product_accession)
write.table(feat_table, "humantx2gene.tsv", quote = F, row.names = F, sep = "\t")


tx2gene <- read_tsv("humantx2gene.tsv")
colnames(tx2gene)
unique(x = tx2gene$symbol)
tx2gene <- dplyr::select(tx2gene, "product_accession", "symbol")

#load quant.sf files by directory location in the files$files column of samples doc
fx <- tximport(files = files$files, type = "salmon", tx2gene = tx2gene) #, ignoreTxVersion=TRUE) 
fxcounts <- fx$counts

# change rownames for provenance
colnames(fxcounts) <- files$sample

# write to file
write.csv(fxcounts, "temp/4.13.20_3primeTagSeq_Salmon_nonribo_counts_attempttorecreate_01.csv", quote = F, row.names = T)

tximport GCF feature table parsing error chromosome • 1.1k views

ADD COMMENT • link updated 5.0 years ago by Michael Love 43k • written 5.0 years ago by knholm • 0

score 0 · Answer 1 · 2020-04-13

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 9 days ago

United States

I dont have suggestions for parsing the file, so however you can read in a table that matches transcripts to genes, that will work.