Error in featureCounts matrix
2
0
Entering edit mode
ryann • 0
@b1071729
Last seen 10 months ago
Canada

I am doing the analysis for RNASeq on 24 mouse samples (beginner to both coding and RNASeq analysis; I got the mouse GTF file from NCBI). I have attached screenshots of the summary of the featureCounts process as well as the .txt file I received as an output. Most of the columns of the .txt file look normal but there are a few that have blank columns from what looks to be a matrix alignment issue going from .txt to .xlsx in R. Am I missing something in this conversion or is there something wrong with my featureCounts output file? As you can see from the attachment, it did not get rid of the chromosome information and there are count values missing from a couple samples.

# Read text file, I have to specify fill because otherwise I get an error message: Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  line 218 did not have 30 elements
data <- read.table("featurecounts.txt", header = TRUE, fill = TRUE)

# Omit columns 2 to 6
columns_to_keep <- c(1, (7:ncol(data)))
data_subset <- data[, columns_to_keep]

# Write as Excel file
write.xlsx(data_subset, "featurecounts_final.xlsx")

output from R, a featureCounts Excel with matrix issues

output from featureCounts program, the text file

summary file from featureCounts program

summary file from featureCounts program

GenomeWideAssociation countsimQC RNASeq • 834 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 hours ago
United States

As a general rule, there is no reason to write out data and then read back into R. After running featureCounts, you can instanciate a DGEList object, and then analyze using edgeR or the limma-voom pipeline. At the end you might want to output the results from topTags or topTable in an Excel workbook, but I find it's better to go straight to Glimma to make interactive MA plots, which are usually more informative than a static Excel workbook.

You might also consider using the internal SAF file for mouse rather than the GTF. NCBI uses things like NC_000067 instead of say, chr1, as the name for chr1.

ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 3 hours ago
WEHI, Melbourne, Australia

You can download up-to-date Rsubread SAF files for the latest NCBI RefSeq annotation from https://bioinf.wehi.edu.au/Rsubread/annot.

Continuing James MacDonald's comments, the errors you have are not from featureCounts itself but rather from the steps used to convert the output to Excel. You can avoid all the Excel problems by using R code:

library(Rsubread)
library(edgeR)
fc <- featureCounts(...)
y <- featureCounts2DGEList(fc)
ADD COMMENT

Login before adding your answer.

Traffic: 796 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6