Write a Perl script which transcribes a DNA sequence to a complementary RNA sequence.
RNA
The DNA in your chromosomes encodes all the information needed to make every bit of your body. Actually there is some more information stored in the mitochondria which contributes a small part to this, but we will not worry about that at this time. Sequences of DNA in the chromosomes which encode for proteins are called genes. The organization of genes can be a little complex, but we will not worry about these complexities here. Genes contain the information needed for the body to make proteins. The first step in using this information to create a protein is called transcription. During transcription an RNA sequence which is complementary to the DNA sequence of the gene is created.
There is one important difference between RNA and DNA which we will focus on in this lesson. RNA is composed of the nucleotides adenine (A), uracil (U), cytosine (C), and guanine (G). As you will recall, DNA is composed of the same nucleotides except instead of uracil, DNA uses thymine (T). So, in RNA we will see U's instead of T's. For instance, while a sequence of DNA might look like this:
CGATTACCGAGCCTAa similar RNA sequence would look like this:
CGAUUACCGAGCCUAThe process of transcription is where the DNA is copied into RNA. The RNA will be complementary to the DNA and any T nucleotide will become a U nucleotide.
The student should realize that there are many types of RNA. Our discussion here pertains to mRNA (messenger RNA). We will also be discussing tRNA (transfer RNA) in another lesson. There are still more types of RNA which are involved in the whole process of making proteins out of genes.
Here's a short Perl script which makes simple substitutions:
Here's another, not biological, example:
ASSIGNMENT:
Your job is to create an RNA sequence based on a DNA template. In other words, you will simulate the process of transcription. For example, if you had this strand of DNA:
TGACCGATAGATACCAGTYou would have this strand of RNA as your output:
ACUGGCUAUCUAUGGUCAThe easiest way to produce this output is to first create a complement to the original strand of DNA (covered in the last lesson) and then to substitute a U for each T in the resulting string.