String Processing
Recursion
In this lesson the student will learn how to:
- Write subroutines
- Pass variables to subroutines
- Write recursive subroutines
By the end of this lesson the student will be able to:
Write a recursive subroutine which creates a
random nucleotide sequence of specified length.
Random Sequences
Exon sequences can often be recognized by their GC content. Exons have a
higher GC content than introns or nucleotide sequences occurring between
gene sequences. This is because there is a tendency for methylated C's to be
deaminated to produce T's. Outside of exons, this rarely makes any
difference and so there is a tendency for these areas to drift towards a
lower GC content (remember that G's pair with C's). Within an exon, however,
such changes do make a difference and may render a gene non-functional. A
non-functional gene will not produce any offspring and so will not
proliferate.
While there were still many genes to locate in the human genome, one process
used to locate potential genes was to search for GC-rich portions of the
genome. While there are some rule-breaking genes, for the most part, genes
have an above average GC content as compared to non-gene portions of the
genome.
Subroutines
Subroutines are a useful way to avoid writing the same code over and over.
They allow you to reuse the lines of code contained within them. Try out the
following example:
#!/usr/bin/perl
sub name(){
print "ENTER YOUR NAME PLEASE: ";
$hold = ;
chomp $hold;
return $hold;
}
sub street(){
print "ENTER YOUR STREET ADDRESS: ";
$hold = ;
chomp $hold;
return $hold;
}
sub zip(){
print "ENTER YOUR CITY: ";
$c = ;
chomp $c;
print "ENTER YOUR STATE: ";
$s = ;
chomp $s;
print "ENTER YOUR ZIP: ";
$z = ;
chomp $z;
return "$c, $s $z";
}
sub addr(){
print "\n\n";
print $na . "\n";
print $str . "\n";
print $z . "\n";
}
$na = name();
$str = street();
$z = zip();
addr();
There are four subroutines in this example. You should notice the keyword
return at the end of the first three. Also understand that the subroutines
are called in the last four lines of the program and that this is the only
time they actually execute.
Now try out this example:
#!/usr/bin/perl
sub madd{
($a,$b) = @_;
$sum = $a + $b;
return $sum;
}
sub getnum{
($p) = @_;
print "$p";
$n = ;
chomp $n;
return $n;
}
$a = getnum("ENTER NUMBER: ");
$b = getnum("ENTER ANOTHER: ");
$ans = madd($a,$b);
print "ANSWER: $ans\n";
This example actually passes values to the subroutines. You can reuse a
subroutine as many times as you want to.
Finally, try this subroutine.
#!/usr/bin/perl
@letters = ( "a", "g", "u", "c" );
$nuc = "aug";
srand(time);
sub addnuc(){
$hold = "";
for($i=0; $i<3; $i++){
$n = int(rand(4));
$hold .= $letters[$n];
}
$nuc .= $hold;
if($hold ne "uaa" && $hold ne "uag" && $hold ne "uga"){
addnuc($hold);
}
return $hold;
}
$nuc .= addnuc();
print $nuc . "\n";
print "LENGTH: " . length($nuc) . "\n";
This subroutine is recursive. This means that it calls itself. The addnuc
subroutine calls itself until a specific condition is satisfied. If you ran
this one a few times, then you probably noticed that the length of the
nucleotide it creates varies from execution to execution of the script.
ASSIGNMENT:
Write a script which first of all asks the user to specify the length of a
nucleotide sequence. Write a subroutine which recursively calls itself to
create a random nucleotide sequence of the length specified. (Obviously, a
more direct way of create a sequence of fixed length is with a for-loop, but
that's not how you are to do it here!) Your recursive function should only
add one nucleotide per call!