String Processing
Transcription and Translation
In this lesson the student will learn how to:
- Use pattern matching to read a string three letters at a time
- Use if-elsif-else construct to simulate translation
- Use unless construct
By the end of this lesson the student will be able to:
Write a perl script to simulate the processes of
transcription and translation.
64 Codons
A GCU GCC GCA GCG
R CGU CGC CGA CGG AGA AGG
N AAU AAC
D GAU GAC
C UGU UGC
Q CAA CAG
E GAA GAG
G GGU GGC GGA GGG
H CAU CAC
I AUU AUC AUA
L UUA UUG CUU CUC CUA CUG
K AAA AAG
M AUG
F UUU UUC
P CCU CCC CCA CCG
S UCU UCC UCA UCG AGU AGC
T ACU ACC ACA ACG
W UGG
Y UAU UAC
V GUU GUC GUA GUG
. UAA UAG UGA
The column to the right shows the one-letter abbreviation for the twenty
amino acids plus the dot represents stop codons. The groups of three
nucleotide letters on the right are the 64 possible codons.
Here are the names of the amino acids along with their three and one letter
abbreviations:
One-letter Code | Three-letter Code | Amino Acid
Name |
A | ala | alanine |
R | arg | arginine |
N | asn | asparagine |
D | asp | aspartic acid |
C | cys | cysteine |
Q | gln | glutamine |
E | glu | glutamic acid |
G | gly | glycine |
H | his | histidine |
I | ile | isoleucine |
L | leu | leucine |
K | lys | lysine |
M | met | methionine |
F | phe | phenylalanine |
P | pro | proline |
S | ser | serine |
T | thr | threonine |
W | trp | tryptophan |
Y | tyr | tyrosine |
V | val | valine |
Sample Scripts:
#!/usr/bin/perl
$str = "aaacccgggaaatttaaacccggg";
$out = "";
print $str . "\n";
while(length $str > 2){
$sb = substr($str,0,3);
if($sb =~ /aaa/){
$out .= "K";
}
elsif($sb =~ /ccc/){
$out .= "P";
}
elsif($sb =~ /ggg/){
$out .= "G";
}
elsif($sb =~ /uuu/){
$out .= "F";
}
else{
$out .= "BAD CODON";
print "$sb\n";
}
$str =~ s/$sb//; #removes the currect codon from the original sequence
}
print $out . "\n";
This script will translate four different codons into their corresponding
amino acids. Notice that an if-elsif-else construct is used to consider each
possibly match one at a time. A script which actually took any codon and
translated it into its corresponding amino acid would require 64 possible
patterns. (There are other approaches which will work with fewer lines of
code which we will consider in future lessons.)
Here's a little script which prints out all 64 possible codons:
#!/usr/bin/perl
@nuc = ( "A","U", "G", "C" );
$one=0; $two=0; $three=0;
for( $i = 0; $i<64; $i++){
$one=$i/16;
$two=($i/4)%4;
$three=$i%4;
print "" . ($i+1) . ") " . $nuc[$one] . $nuc[$two] . $nuc[$three] . "\n";
}
There are better ways of writing a script to output the 64 possible codons,
but this one works. The most confusing part of this one is the math
involving the division and modulus operations within the loop. It might take
you a while to realize why it works, but take some time to compare the
output of this script with these operations and make sure that you
understand why they work as they do.
ASSIGNMENT:
Write a perl script which simulates the entire process of transcription and
translation. Prompt the user for an input string and produce output like
this:
DNA: (whatever the user provides)
RNA: (transcription product)
AA: (translation product)
Make sure that you have a special output message for bad input.