Short answer: You don't use spike-ins in RNA-Seq normalization for the same reason as why you use an internal control gene when doing qPCR rather than adding a spike-in and using that as control.
Long answer:
The idea that spike-ins are more reliable for normalisation is based on a misunderstanding on the purpose of RNA-Seq normalisation. As it is such a common question, I better answer a bit at length.
First, note that in a typical experiment, the total amount of RNA extracted from each sample is usually of little interest. If we extract a bit more RNA from one sample than from another, this just means that there might have been a few more cells in it, which may have been caused by the treatment but also may be be because we have seeded a few more cells initally, or pipetted a bit differently or whatever. Even if it was the treatment that caused more cells to grow and hence more RNA to be yielded, this is not what we want to measure in an RNA-Seq experiment. There are other assays to measure growth.
Furthermore, the amount of RNA yielded by a sample has little to do with the amount of reads obtained form the library.
The ratio of technical spike-ins to biological genes, however, does depend on the sample's total RNA yield, because we always spike in the exact same amount of the spike-in mix, while the biological amount varies from sample to sample.
In a typical experiment comparing treated and control samples, one hopes to find a number of genes which respond to the treatment while one assumes that a large number of genes, especially the so-called house-keeping genes, stay at the same expression level. If we know that a given gene is 10x the expression of the house-keeping gene in treated samples and only 4x in control samples, it is differentially expressed. If, however, we know that the gene's transcripts have a total amount of 8 femtomoles in one sample and only 6 fmol in another, this could as well be because there were more cells in the second sample.
Normalization by comparing to the bulk of other genes removes differences in initial total material or total number of reads, and this is usually what we want.
Normalization with technical spike-ins, however, preserves differences in starting amount, and usually, this is not what we want!
There are cases where we want to preserve information on the exact starting amount, namely, if we have ensured (e.g. by flow cytometry) that each sample contains exactly the same number of cells, and we are expressly interested not in relative but in absolute changes of transcript material. For example, if the treatment is expected to affect transcription globally, i.e., to reduce the expression of all genes simultaneously, and we want to know how strongly overall mRNA abundance goes down. (However, in this case, RNA-Seq might not be the best assay.)
Exercise question: Why did the RNA-Seq data agree well with the qPCR measurements after normalising conventionally but not after normalising with spike-ins?
Answer: Because qPCR curves are also always compared to a biological control gene (maybe actin or GAPDH or the like). When comparing two qPCR samples, we do not compare the ct values directly, but their respective differences to this housekeeping gene. If OP had used one of the spike-ins rather than one of the housekeeping genes as internal qPCR control, the spike-in-normalized RNA-Seq data would have matched better with qPCR than the conventionally matched one.
There is, of course, a reason why nobody uses spike-in in that way for qPCR: one would mainly measure the dilution of the sample rather than the expression of the target gene.
I very much agree but I do however think there are two other uses for (ERCC-) spikeins:
1) (Extreme) Cases where there are large changes in the RNA composition (such as knock down/out of decay factors etc)
2) For selecting expression cutoffs - since we know the exact concentration of the spikeins it is quite easy to see at which approximate level our expression estimates becomes unreliable.
You are saying that "For example, if the treatment is expected to affect transcription globally, i.e., to reduce the expression of all genes simultaneously, and we want to know how strongly overall mRNA abundance goes down. (However, in this case, RNA-Seq might not be the best assay.)" Why is RNAseq not the best assay? which assays are better?
If you're cheap, you could just measure the RNA concentration with a Nanodrop (or Bioanalyzer, or whatever your tool of choice is). Divide the concentration by the number of cells in your sample to get the RNA content per cell. This allows you to quantify changes in RNA content between conditions - job done.
Or you could spike-in RNA proportional to the number of cells in your sample, and do standard RNA-seq on the resulting mixture of spike-in and endogenous RNA. Normalization based on the spike-in coverage will preserve differences in total RNA content that would be lost with standard methods. Or, if you can't be bothered getting an accurate measure of the number of cells, you could do single-cell RNA-seq and just add the same amount of spike-in RNA to each cell. Bit more expensive but sexier.
Of course, all of this is discussing changes in RNA content. If you want specifically changes in global transcription (i.e., creation of new transcripts), then the whole thing becomes harder. I guess you'd have to use a variant of the protocols used to capture nascent RNAs, e.g., GRO-seq.
You are saying that "For example, if the treatment is expected to affect transcription globally, i.e., to reduce the expression of all genes simultaneously, and we want to know how strongly overall mRNA abundance goes down. (However, in this case, RNA-Seq might not be the best assay.)" Why is RNAseq not the best assay? which assays are better?