In an article (link coming soon) we referred to a method to fish and identify all RNAs bound to a Cas protein (crRNA, tracrRNA and others) within a cell.
The „fishhook“ is an antibody, which in this case recognizes and binds the Cas protein. The production of antibodies against a specific protein is laborious. It is easier to attach a „tag“ to the Cas gene in the cells. The „tag“ is a short DNA sequence that codes for a short piece of another protein. This modified gene is introduced into cells and replaces the normal Cas gene. It is transcribed into RNA together with the gene and then translated into protein along with the Cas sequence. The Cas enzyme has gotten a little bit bigger, but it still works fine in most cases.
There are quite a few such tags that are used by scientists around the world for various purposes. And antibodies directed against these tags have been made. Some companies produce these antibodies in large quantities, couple them to different „handles“ for different purposes, and sell them. This saves scientists a lot of time, money and work!
What exactly are the researchers doing?
This is shown on the flow chart below.
The cells with the „tagged“ Cas protein are opened up (lysed), care must be taken to ensure that the complexes of RNA and protein do not disintegrate. The complexes are often stabilized by the addition of chemicals.
The cell lysate contains all the components of the cell from which we want to extract only the Cas proteins and their attached RNAs. We now add antibodies against the tag, which are coupled as a „handle“ to a tiny magnetic bead. The antibodies bind to the Cas proteins and can be easily pulled out of the lysate with a magnet.
Now you have all the Cas proteins, all the attached RNAs, the antibodies and the magnetic beads in a test tube – but nothing else. All other cell components do not stick to the magnet and are discarded. The proteins, including the antibodies can be degraded and destroyed by en enzyme (a protease), the magnetic beads fall off, and the RNA remains intact and in solution. A few more purification steps follow, mostly using „empty“ test tubes – the amount of RNA is so small that it cannot be seen.
The RNA is then put into a high-throughput sequencing machine. This is again a complicated process that we may describe elsewhere. Some may know that you can’t sequence RNA that easily – it has to be reverse transcribed into DNA in another step. But then you’re ready to go!
Today’s sequencing machines can analyze a few million DNA molecules in parallel – and that’s completely automated. More challenging is the bioinformatic analysis of the vast amount of data generated by an experiment! Here, too, there are programs that make work easier – but especially in basic research, every question is different and the programs have to be adapted accordingly.
1. cell with various cell components, including the Cas protein (yellow) with the tag (red) and various RNAs (light blue) .
2. the cell is lysed, the cell components flow out.
3. antibodies (red) with a metal bead (blue) are added to the cell components in a test tube. The antibodies bind specifically to the Cas-complex.
4. the antibodies with the attached complexes are held by a magnet, the rest is poured off and discarded.
5. an enzyme solution is added to the mixture of antibodies, Cas protein and RNA. The enzyme, a protease, destroys all proteins (Cas protein and antibodies).
6. all remaining RNA is sequenced (after another intermediate step) in a sequencing machine.
7. the data from the sequencing machine are analyzed and sorted in the computer.
Note also that it is not done with one sequencing! There are usually several samples and in addition several controls. And of course the whole thing has to be repeated at least once or twice to exclude experimental errors! This is then really „Big Data“ with many millions of DNA sequences!
What looks simple and unspectacular in the graph below has a gigantic data background!
What do the data look like?
After a RIP-seq experiment, all „reads“ (sequenced RNA pieces) are assigned to the appropriate location in the genome. The image shows in A a tiny section of the genome of a unicellular organism. This particular genome has a size of 34 million nucleotides. The figure shows the section between nucleotides 2,366,820 and 2,366,900 with the As, Gs, Cs and Ts in the lower row. Between position 2,366,862 and position 2,366,884, one finds almost 2,000 RNAs (see scale on the left) that had bound to a specific protein. In B, a few of these „reads“ are shown as linear RNA pieces. That they are not all exactly the same length has technical reasons. If you stack 2,000 of them on top of each other, you will get a block like in C. Such blocks can also be seen in the article (Link coming soon).
With RIP-seq you can identify all RNAs that are associated to a specific protein. By targeting an RNA polymerase with the antibody, you can fish for all RNAs that are currently transcribed, by targeting a spliceosome you can pull out all RNAs that are currently spliced or you can pull out all RNAs that are associated with the Cas protein.
RIP-seq is mostly used in basic science but could become a routine tool to detect misregulation of RNA-Protein interaction in some human diseases.
Author: W. Nellen
Translation by DeepL with modifications by BioWissKomm.
Cover image: BioWissKomm by Midjourney.
Fig. 1 and 2: Copyright BioWissKomm
Data Fig. 2A: Dissertation Sara Müller (2011)