Hi!
I have an RNA-Seq dataset of samples infected with different viruses (virus 1-3
) and an uninfected control sample (control
).
To some extent, the infectivity of my starting samples varies depending on the virus used to infect the samples. I would like to know if there is a way to incorporate the information about the infection level of the samples into the RNA-Seq analysis?
In short, I assume that e.g. samples infected with virus 1
, which show low infectivity, will also show a lower response in my RNA-Seq data. In contrast, samples infected with virus 3
, which show high infectivity, will show a greater response in my RNA-Seq data. I also assume that, for example, if virus 1
samples had the same level of infection, these samples would be more similar to virus 3
samples. (An assumption that can easily be criticized, but I want to make it anyway). I know the extent of the infection and I could express this as e.g. virus 1
= 1/10 of virus 3
and so on. Is there a way to incorporate this information into the RNA-Seq, resulting in a dataset that models equally infected samples?
Thanks!
Thanks James for your comment!
Yes, I realize that if infectivity were another factor in my model, it would be identical to the already present factor of the virus itself, so... not adding anything in the end.
What I think I am looking for is something like a numerical variable that normalizes my RNA-Seq dataset. Something along the lines of normalizing count data based on infectivity. Or on the other hand, providing a numerical variable to the model that contains e.g. scaled fold changes of infectivity? Or weighing low infectivity samples differently than high infectivity samples?
Again, I think there are assumptions here that are questionable, and my current feeling is that this is not possible at all, but this was a comment I received and I want to make sure I'm not missing something.
I hope this clarifies what I am looking for.
Thanks!
What you want to do probably doesn't make sense. I get the idea that you might want to identify genes that change in excess of the expected infectivity. So you want to zero out the role of infectivity to show differences that are due to other factors. Or maybe virus 1 will only infect 1% of the cells, and virus 3 will infect 10%, and you want to control for the level of infection. But it's probably much more complicated than that.
I recently did an analysis of cell culture that was infected with a virus, and they did scRNA-Seq, which I originally thought was the dumbest thing ever (I mean scRNA-Seq on fibroblast cell culture? Come on). But they had the same goal - to identify changes that occur in the infected cells - and they couldn't really get at that using bulk analysis because it's a variable mixture of infected and uninfected cells, and there is no way to accurately assess the level of infection. And even if you could, how do you adjust for it in a bulk analysis in a way that is defensible?
By using scRNA-Seq and a hybrid genome that included the virus genome, we could identify infected and uninfected cells and then do pseudobulk analyses on the different cell types.
But even that analysis turned out to be way more complicated than it had any right to be. But then I'm just a dumb master's guy. Probably there are others here who might have good ideas?