Question

transcriptR ChIP-seq RNA-seq bam Error in mergeNamedAtomicVectors

0

Entering edit mode

jason.williams ▴ 10

@jasonwilliams-21306

Last seen 5.0 years ago

Hi,

I'm following this vignette (https://rdrr.io/bioc/transcriptR/f/vignettes/transcriptR.Rmd) for transcriptR, when estimating gap distance I get the following error.

# create a range of gap distances to test
# from 0 bp to 10000 bp with the step of 100 bp
gd <- seq(from = 0, to = 10000, by = 100)
estimateGapDistance(object = tds, annot = knownGene.genes, filter.annot = TRUE, 
                    fpkm.quantile = 0.25, gap.dist.range = gd)
# view the optimal gap distance minimazing the sum of two errors
tds

Error in mergeNamedAtomicVectors(seqlengths(x), seqlengths(y), what = c("sequence", : sequences chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chrM have incompatible seqlengths: - in 'x': 249250621, 243199373, 198022430, 191154276, 180915260, 171115067, 159138663, 146364022, 141213431, 135534747, 135006516, 133851895, 115169878, 107349540, 102531392, 90354753, 81195210, 78077248, 59128983, 63025520, 48129895, 51304566, 155270560, 59373566, 16571 - in 'y': 248956422, 242193529, 198295559, 190214555, 181538259, 170805979, 159345973, 145138636, 138394717, 133797422, 135086622, 133275309, 114364328, 107043718, 101991189, 90338345, 83257441, 80373285, 58617616, 64444167, 46709983, 50818468, 156040895, 57227415, 16569

I suspect this has something to do with (https://support.bioconductor.org/p/101694/).

But I am really not sure, any help would be appreciated. I processed these paired end reads using STAR to generate this bam file. There doesn't seem to be any advice for preprocessing to generate bam files as input for transcriptR. I assume reads should be aligned to the genome (in the case of this vignette hg19)?

Best,

Jason

TranscriptR RNA-seq ChIP-seq • 1.8k views

ADD COMMENT • link updated 5.5 years ago by James W. MacDonald 68k • written 5.5 years ago by jason.williams ▴ 10

0

Entering edit mode

When you are writing out your post, there is a box right below where you are typing that shows what your post will eventually look like. If you re-read your post, do you think that is particularly legible? Ideally you would use the preview box to ensure that what you are posting is legible. Part of getting people to help is on you to make it easy for them to do so.

Also, when you post a question there is this sentence right above the box where you type that says

This site uses markdown. See tutorial post for advise on post formatting.*

Please read that tutorial post and then fix your question.

ADD REPLY • link 5.5 years ago James W. MacDonald 68k

0

Entering edit mode

Hi James. I have edited it now. The preview actually wasn't the same as what was being posted. When I removed the code chunk title, the post became the same as the preview.

ADD REPLY • link 5.5 years ago jason.williams ▴ 10

score 1 · Answer 1 · 2019-10-31

So if I do

> library(BSgenome.Hsapiens.UCSC.hg19)
> seqlengths(Hsapiens)[1:24]
     chr1      chr2      chr3      chr4      chr5      chr6      chr7      chr8
249250621 243199373 198022430 191154276 180915260 171115067 159138663 146364022
     chr9     chr10     chr11     chr12     chr13     chr14     chr15     chr16
141213431 135534747 135006516 133851895 115169878 107349540 102531392  90354753
    chr17     chr18     chr19     chr20     chr21     chr22      chrX      chrY
 81195210  78077248  59128983  63025520  48129895  51304566 155270560  59373566
> library(BSgenome.Hsapiens.UCSC.hg38)

Attaching package: ‘BSgenome.Hsapiens.UCSC.hg38’

The following object is masked from ‘package:BSgenome.Hsapiens.UCSC.hg19’:

    Hsapiens

> seqlengths(Hsapiens)[1:24]
     chr1      chr2      chr3      chr4      chr5      chr6      chr7      chr8
248956422 242193529 198295559 190214555 181538259 170805979 159345973 145138636
     chr9     chr10     chr11     chr12     chr13     chr14     chr15     chr16
138394717 133797422 135086622 133275309 114364328 107043718 101991189  90338345
    chr17     chr18     chr19     chr20     chr21     chr22      chrX      chrY
 83257441  80373285  58617616  64444167  46709983  50818468 156040895  57227415

You can see that one of the things you are using is based on GRCh37, and the other is based on GRCh38 I would hazard a guess that you have aligned your data to GRCh38, and are blindly following the vignette, which uses TxDb.Hsapiens.UCSC.hg19.knownGene, where hg19 is the UCSC identifier for GRCh37. You seem to have realized that you have mis-matched your genomes, but I am not sure I agree with your prescription.

You could

A.) Re-align your data to GRCh37 in order to be able to exactly follow what it says in the vignette.

B.) Use the TxDb.Hsapiens.UCSC.hg38.knownGene, which matches your aligned data, and is not based on a ten year old genome build, which (apparently little known fact) is something like 70% based on sequencing from one dude from western New York state? H/T Kevin Blighe

The vignette is intended to be an example of how one could use a package to do a common task, not an edict of how that task must be accomplished, so all things equal I would choose B.