error reading ballgown object
2
0
Entering edit mode
l.j.leach • 0
@ljleach-14914
Last seen 6.8 years ago
University of Birmingham

Please could someone advise on how to solve the following error when trying to read in a ballgown object.

Tue Jan 30 13:15:04 2018
Tue Jan 30 13:15:04 2018: Reading linking tables
Tue Jan 30 13:15:05 2018: Reading intron data files
Tue Jan 30 13:15:08 2018: Merging intron data
Tue Jan 30 13:15:09 2018: Reading exon data files
Tue Jan 30 13:15:17 2018: Merging exon data
Tue Jan 30 13:15:18 2018: Reading transcript data files
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  line 70810 did not have 12 elements
In addition: Warning messages:
1: package ‘dplyr’ was built under R version 3.3.3 
2: package ‘devtools’ was built under R version 3.3.3 

The traceback was not helpful to me as follows...
> traceback()
8: scan(file = file, what = what, sep = sep, quote = quote, dec = dec, 
       nmax = nrows, skip = 0, na.strings = na.strings, quiet = TRUE, 
       fill = fill, strip.white = strip.white, blank.lines.skip = blank.lines.skip, 
       multi.line = FALSE, comment.char = comment.char, allowEscapes = allowEscapes, 
       flush = flush, encoding = encoding, skipNul = skipNul)
7: read.table(file, header = TRUE, sep = "\t", colClasses = cc, 
       quote = "")
6: .readTranscript(f, meas)
5: ballgown(samples = samples, pData = pheno_data) at rnaseq_ballgownLL.R#16
4: eval(expr, envir, enclos)
3: eval(ei, envir)
2: withVisible(eval(ei, envir))
1: source("rnaseq_ballgownLL.R")

The phenotype data file was read into R correctly as 

> pheno_data
  ids population
1  HJ     China1
2  HL     China2
3  LS     China3
4  V7     China4

I have checked the gtf files for each sample and cannot find a problem with the format. 

The code used was as follows:

Also, I was able to run the tutorial data and read in the bg object correctly, so the problem must relate to my input files somehow.

library(ballgown)
library(RSkittleBrewer)
library(genefilter)
library(dplyr)
library(devtools)

inputdir="C:/Users/leachlj/Documents/Data/Potato/RNA_seq_Round2/BallGownAnalysis/"
pheno_data_file=paste(inputdir,"round2_rnaseq.txt", sep="");

## Read phenotype sample data
pheno_data <- read.csv(pheno_data_file)

## Read in expression data to create a BallGown object
samples<-c("HJ","HL","LS","V7")
bg_pot <- ballgown(samples=samples, pData=pheno_data)

Many thanks for any advice you can give.

Lindsey

R ballgown • 1.9k views
ADD COMMENT
2
Entering edit mode
Alyssa Frazee ▴ 210
@alyssa-frazee-6710
Last seen 4.0 years ago
San Francisco, CA, USA

How was the ballgown input data created? (do you have .ctab files? did you use cufflinks, stringtie, or something else)? 

This looks like an issue with the formatting of one of the .ctab files used as ballgown input -- it's expecting TSV, but finding something with characters it can't parse (possibly quotes, too many tabs, etc). If we can figure out that issue, either we can try to make ballgown handle it (since this is a bug if it's valid TSV) or we can be sure the input programs are all producing valid TSVs.

ADD COMMENT
0
Entering edit mode

Dear Alyssa,

Thank you so much for your rapid reply. The input data was created using stringtie using the same commands in the tutorial procedure from the Pertea paper (2016). This worked well for me on the tutorial data and could be read by ballgown. 

I do have all the .ctab files in addition to the gtf for each sample (potato RNAseq), and will try to check their format now and get back to you.

ADD REPLY
0
Entering edit mode

I have figured it out now thanks to your response :-) thank you so much for helping to think it through!

I checked the t_data.ctab files produced by stringtie and they are tab separated with the correct number of fields.

However, at line 70810 it was:

70810 ST4.03ch12 - 4082665 4086448 PGSC0003DMT400000957 5 897 MSTRG.24856 B5 #5 (cytochrome b5 family protein #5) 0.528714 0.065659

I somehow thought that maybe ballgown did not like the # as part of the gene name so I replaced it with "num" as follows:

original gene name: B5 #5 (cytochrome b5 family protein #5)

replacement gene name: B5 num5 (cytochrome b5 family protein num5)

with these changes then the ballgown object is read in no problem :-) 

ADD REPLY
0
Entering edit mode
l.j.leach • 0
@ljleach-14914
Last seen 6.8 years ago
University of Birmingham

Dear Alyssa,

Thank you so much for your rapid reply. The input data was created using stringtie using the same commands in the tutorial procedure from the Pertea paper (2016). This worked well for me on the tutorial data and could be read by ballgown. 

I do have all the .ctab files in addition to the gtf for each sample (potato RNAseq), and will try to check their format now and get back to you.

ADD COMMENT

Login before adding your answer.

Traffic: 755 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6