Error in DGEList(counts = cnt, group = group) : non-numeric values found in counts
2
0
Entering edit mode
Luis • 0
@681bb58b
Last seen 18 months ago
Costa Rica

When I tried to create a list with EdgeR, I encountered a error:"Error in DGEList(counts = cnt, group = group) : non-numeric values found in counts" . I could not solve this error and I hope you can help me: enter image description here

SessionInfo():

enter image description here

Data base information: enter image description here enter image description here

I hope you can help me to find out what the error is and how to solve it.

edgeR • 3.3k views
ADD COMMENT
0
Entering edit mode

Please show code and output as text rather than as screen shots. The error message concerns cnt, but you haven't shown the code by which cnt was created, so we can't tell you how to solve the problem.

ADD REPLY
0
Entering edit mode

cnt:

query <- GDCquery(project = cancer, data.category = "Transcriptome Profiling", 
                  data.type = "Gene Expression Quantification", 
                  workflow.type = "STAR - Counts")
GDCdownload(query, method = "api", files.per.chunk = 100)
cnt <- GDCprepare(query = query)
cnt <- assay(cnt)
head(cnt)
ADD REPLY
0
Entering edit mode
Output:
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------
ooo Project: TCGA-CHOL
--------------------
oo Filtering results
--------------------
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Checking if there are duplicated cases
ooo Checking if there are results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-CHOL
Of the 44 files for download 44 already exist.
All samples have been already downloaded
|================================================|100%                      Completed after 4 s 
Starting to add information to samples
 => Add clinical information to samples
 => Adding TCGA molecular information from marker papers
 => Information will have prefix 'paper_' 
chol subtype information from:doi:10.1016/j.celrep.2017.02.033
Available assays in SummarizedExperiment : 
  => unstranded
  => stranded_first
  => stranded_second
  => tpm_unstrand
  => fpkm_unstrand
  => fpkm_uq_unstrand
> head(cnt) #After execution, the cholangiocarcinoma RNA sequencing count data can be downloaded and named ??cnt??, where rows represent ensemble gene IDs and columns represent samples?? IDs. Please notice the numbers at positions 14-15 in the sample IDs, numbers range from 01 to 09 indicate tumors, and 10 to 19 indicate normal tissues.
                   TCGA-W5-AA38-01A-11R-A41I-07
ENSG00000000003.15                         8228
ENSG00000000005.6                             1
ENSG00000000419.13                         1694
ENSG00000000457.14                          385
ENSG00000000460.17                          214
ENSG00000000938.13                          181
                   TCGA-ZH-A8Y6-01A-11R-A41I-07
ENSG00000000003.15                          573
ENSG00000000005.6                             0
ENSG00000000419.13                         1309
ENSG00000000457.14                         1193
ENSG00000000460.17                          227
ENSG00000000938.13                          103
                   TCGA-4G-AAZT-01A-11R-A41I-07
ENSG00000000003.15                         7504
ENSG00000000005.6                             0
ENSG00000000419.13                         1120
ENSG00000000457.14                          300
ENSG00000000460.17                           71
ENSG00000000938.13                          254
                   TCGA-W5-AA30-11A-11R-A41I-07
ENSG00000000003.15                         3075
ENSG00000000005.6                             1
ENSG00000000419.13                          635
ENSG00000000457.14                          313
ENSG00000000460.17                           81
ENSG00000000938.13                          233
                   TCGA-W5-AA31-01A-11R-A41I-07
ENSG00000000003.15                        13662
ENSG00000000005.6                             0
ENSG00000000419.13                         2135
ENSG00000000457.14                         1841
ENSG00000000460.17                          568
ENSG00000000938.13                          511
                   TCGA-W5-AA31-11A-11R-A41I-07
ENSG00000000003.15                         4394
ENSG00000000005.6                             6
ENSG00000000419.13                          859
ENSG00000000457.14                          450
ENSG00000000460.17                          108
ENSG00000000938.13                          201
                   TCGA-W5-AA34-01A-11R-A41I-07
ENSG00000000003.15                         8124
ENSG00000000005.6                             0
ENSG00000000419.13                         1769
ENSG00000000457.14                          954
ENSG00000000460.17                          280
ENSG00000000938.13                          222
                   TCGA-ZH-A8Y1-01A-11R-A41I-07
ENSG00000000003.15                         2038
ENSG00000000005.6                             0
ENSG00000000419.13                          839
ENSG00000000457.14                         1271
ENSG00000000460.17                          390
ENSG00000000938.13                          298
                   TCGA-W5-AA2T-01A-12R-A41I-07
ENSG00000000003.15                        10316
ENSG00000000005.6                             1
ENSG00000000419.13                         1501
ENSG00000000457.14                         1330
ENSG00000000460.17                          482
ENSG00000000938.13                           56
                   TCGA-W5-AA36-01A-11R-A41I-07
ENSG00000000003.15                         2492
ENSG00000000005.6                             0
ENSG00000000419.13                         1267
ENSG00000000457.14                          365
ENSG00000000460.17                           94
ENSG00000000938.13                          191
                   TCGA-W5-AA2H-01A-31R-A41I-07
ENSG00000000003.15                          226
ENSG00000000005.6                             5
ENSG00000000419.13                         1139
ENSG00000000457.14                          461
ENSG00000000460.17                          130
ENSG00000000938.13                         1530
                   TCGA-W5-AA2Q-01A-11R-A41I-07
ENSG00000000003.15                         4649
ENSG00000000005.6                             0
ENSG00000000419.13                         1306
ENSG00000000457.14                         3975
ENSG00000000460.17                          810
ENSG00000000938.13                           83
                   TCGA-W5-AA2U-01A-11R-A41I-07
ENSG00000000003.15                         7422
ENSG00000000005.6                             1
ENSG00000000419.13                         1192
ENSG00000000457.14                          986
ENSG00000000460.17                          243
ENSG00000000938.13                          303
                   TCGA-3X-AAVA-01A-11R-A41I-07
ENSG00000000003.15                         4252
ENSG00000000005.6                             1
ENSG00000000419.13                         1251
ENSG00000000457.14                          581
ENSG00000000460.17                          211
ENSG00000000938.13                          334
                   TCGA-W5-AA2U-11A-11R-A41I-07
ENSG00000000003.15                         2462
ENSG00000000005.6                             1
ENSG00000000419.13                          653
ENSG00000000457.14                          279
ENSG00000000460.17                           28
ENSG00000000938.13                          310
                   TCGA-ZD-A8I3-01A-11R-A41I-07
ADD REPLY
0
Entering edit mode

All pipeline:

source("Scripts/GDCquery.R") # Run the R code from the "GDCquery.R" file to download the data. file "GDCquery.R" can be acquired from Supplementary files/Scripts
head(cnt) #After execution, the cholangiocarcinoma RNA sequencing count data can be downloaded and named ??cnt??, where rows represent ensemble gene IDs and columns represent samples?? IDs. Please notice the numbers at positions 14-15 in the sample IDs, numbers range from 01 to 09 indicate tumors, and 10 to 19 indicate normal tissues.

# 1.2 Conversion of ensemble gene IDs to gene symbols
gtf_v22 <- rtracklayer::import('gencode.v22.annotation.gtf') # Import the annotation file into R according to its?? storage path, the annotation file (gencode.v22.annotation.gtf) can be acquired from Supplementary files
source("Scripts/gtf_v22.R") # Run the R code from the "gtf_v22.R" file, which can be acquired from Supplementary files/Scripts
cnt=ann(cnt,gtf_v22) # Apply function "ann" to convert the ensemble gene IDs to gene symbols

# 1.3 Filter low-expressed genes
# Click run to install the R package "edgeR"
BiocManager::install("edgeR")
# Click run to load the R package "edgeR"
library(edgeR)
# Run the following R code to keep genes with counts per million (CPM) values greater than one in at least two samples
keep <- rowSums(cpm(cnt)>1)>=2 
cnt <- as.matrix(cnt[keep,])

## 2 Differential expression analysis through ??limma??
BiocManager::install("limma") # Click run to install R package "limma"
# Click run to load R packages "limma", "edgeR"
library(limma)
library(edgeR)
# Run the following R code to create design matrix
group <- substring(colnames(cnt),14,15) # Extract group information
group [group %in% "01"] <- "Cancer" # set ??01?? as tumor tissue
group [group %in% "11"] <- "Normal" # set ??11?? as normal tissue
group <- factor (group, levels = c("Normal","Cancer"))
design <- model.matrix (~group) # Create design matrix
rownames(design) <- colnames(cnt)
dge <- DGEList(counts = cnt, group = group) # Create the DGEList object
ADD REPLY
1
Entering edit mode
Yunshun Chen ▴ 900
@yunshun-chen-5451
Last seen 5 days ago
Australia

As the error message suggests, your cnt is not a numeric count matrix. Based on your screenshot, your cnt seems to only contain the sample names.

ADD COMMENT
0
Entering edit mode

Thank you so much!

ADD REPLY
1
Entering edit mode
@gordon-smyth
Last seen 9 hours ago
WEHI, Melbourne, Australia

It is clear from a Google search that you are following a published script from Liu et al (2021). If the script does not work for you, then you should write to the authors of that article. We cannot debug the scripts for you here. The problem is not with edgeR or DGEList() -- the edgeR functions are working correctly. My guess is that there is a problem with the line cnt=ann(cnt,gtf_v22).

Reference

Liu, S., Wang, Z., Zhu, R., Wang, F., Cheng, Y., Liu, Y. Three Differential Expression Analysis Methods for RNA Sequencing: limma, EdgeR, DESeq2. J. Vis. Exp. (175), e62528, doi:10.3791/62528 (2021)

ADD COMMENT
0
Entering edit mode

I understand, thank you very much for all the support and collaboration!

ADD REPLY

Login before adding your answer.

Traffic: 820 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6