Hopefully you are familiar with basic R data types like vectors, factors, matrix, list, and data.frame, as well as basic data input functions like read.table(), scan(), and related functions. If not a starting point is the documentation that comes with R, especially "An Introduction to R" and "R Data Import / Export".
If you had a matrix of values, where rows were independent of one another and values in each row conformed to the usual t-test assumptions, you might use the genefilter package rowttests() function.
> m <- matrix(rnorm(2667 * 16), nrow=2667, dimnames=list(NULL, LETTERS[1:16]))
> f <- factor(c(rep("Control", 11), rep("Treatment", 5)))
> ttests <- rowttests(m, f)
> head(ttests)
statistic dm p.value
1 -3.73436323 -1.56396205 0.002221023
2 0.33806524 0.22477913 0.740330027
3 0.57756344 0.38064961 0.572731198
4 1.56461409 0.90778264 0.139990940
5 -0.07785181 -0.04329018 0.939047670
6 -0.80640959 -0.36288556 0.433492328
HOWEVER, your rows are not independent and individual rows do not follow the assumptions of standard t-tests; the above approach will be under-powered and result in inflated false discovery rate. Fortunately, you can 'stand on the shoulders of giants' and benefit from sophisticated understanding of microarrays. I used the GEOquery package to retrieve the data, and the limma package for analysis.
library(GEOquery)
library(limma)
I retrieved the data using getGEO(). This returns a list of experiments, although in the present case there is only one experiment.
> gse_list = getGEO("GSE53179")
> class(gse_list)
[1] "list"
> names(gse_list)
[1] "GSE53179_series_matrix.txt.gz"
> gse = gse_list[["GSE53179_series_matrix.txt.gz"]]
> class(gse)
[1] "ExpressionSet"
attr(,"package")
[1] "Biobase"
I took a look at the 'pData' (description of sample phenotypes) of the expression set
> View(pData(gse))
and guessed that you were interested in the characteristics_ch1.1 column. I specified the statistical model I'd like to apply (care needs to be taken here!)
> design = model.matrix(~characteristics_ch1.1, pData(gse))
and then let limma fit the statistical model to each gene
> fit = lmFit(gse, design) ## THIS STEP IS NOT CORRECT!
Finally, I assessed statistical significance and generated a 'top table' of differentially expressed miRNAs.
> topTable(eBayes(fit))
I do not have to use an ExpressionSet in lmFit(); I could have used a matrix() of expression values and created the design matrix from any factor or factors describing the allocation of samples (matrix columns) to treatment groups. Getting the matrix of expression values from files downloaded from GEO, or from some other source, takes some level of R knowledge.
HOWEVER, returning to the lmFit() step, the help page for lmFit says that the expression values should be log-ratios or log-values of expression, and the general assumption is that these have been pre-processed (possibly background corrected and normalized) in a standard microarray work flow. But when I look at the distribution of expression values
> range(exprs(gse))
[1] -18.59629 431352.00000
> hist(exprs(gse))
> table(exprs(gse) < 0)
FALSE TRUE
22099 19757
it's clear that I do not have log ratios or log values of expression (the range is too large for log-ratios, and in addition log values would be strictly positive). So you will have to explore the provenance of the data, identify how it was processed before adding to GEO, and from there determine what remedial steps are required. If the data have been pre-processed in a reasonable manner there may be a simple transformation of the data that makes it appropriate for analysis in limma. On the other hand it may be that the provenance of the data is not clear, or the steps taken inappropriate or sub-optimal. You might then start from the supplementary files included with the GEO record, do your own pre-processing, and then use limma as above. A starting point for a new analysis might be this article and associated package, but I am not an expert in the analysis of miRNAs.
I recommend you read the Limma User's Guide, which explains in detail how to use limma to analyze microarray data. Anything else I write here would just be duplicating the explanation in the guide.
I read everything and I m not sure I can use limma
my data neither is raw file nor from a gds experiments I think I should use another method in r for t test o you have any idea ?
If your data is from GEO, you can use the GEOQuery package to get it into an ExpressionSet. But you can also use limma directly on your data matrix.