String Processing
Working With Substrings
In this lesson the student will learn how to:
- Use the substr function in a variety of situations
- Use the substr function to insert into a string
- Know how to use the index and rindex functions
By the end of this lesson the student will be able to:
Write a script which allows the user to insert
strings within another string.
Start and Stop Codons
The genes of humans (and other eukaryotes) are not continuous sequences.
Rather eukaryotic genes are broken into coding sequences called exons
and noncoding sequences called introns. A single gene may be broken into
several exons interrupted by several introns. To help you remember the terms
exon and intron think of exons as expressed sequences and introns and
interruptions between expressed sequences. Exons in the human genome have an
average length of about 200 base pairs. Introns can be very long or very
short and are very inconsistent in length.
EXons --> EXpressed sequences
INTrons --> INTerruptions
Signaling the beginning of an exon is an AUG codon (which also codes for
methionine). Signaling the end of an exon is any of three stop codons (UAA,
UAG, UGA). There are several other complications to consider, but for now we
will focus on exons and introns and start and stop codons.
The following script shows how specific codons can be located within a DNA
sequence.
#!/usr/bin/perl
$seq = "aacggacuaaauguaagcuacacuacgagcuacgaucacgacucaauaaggagcaaaa";
print $seq . "\n";
$pos = index($seq, "aug"); #starting sequence
$output="";
for($i=0; $i<$pos; $i++){
$output .= " ";
}
$output .= "^";
$p2 = rindex($seq, "uaa"); #stop sequence
$pos = $p2-$pos-1;
for($i=0; $i<$pos; $i++){
$output .= " ";
}
$output .= "^";
print $output . "\n";
The index function returns the location of the pattern specified as the
second argument. The rindex function does more or less the same thing except
it searches from the right end of the string.
The following example shows how to insert using the substr function.
#!/usr/bin/perl
$str = "xxxxxx";
print "ORIGINAL: $str\n";
substr($str, 3,0) = "VVV";
print "INSERT W/O OVERWRITE: $str\n";
substr($str, 3, 3) = "WWW";
print "INSERT WITH OVERWRITE: $str\n";
Notice that you can overwrite or not overwrite depending on what you do with
the third variable.
ASSIGNMENT:
Write a script which allows the user to insert a string into another string
before or after a user defined pattern. When the script first begins prompt
the user to enter an initial string. Next enter a loop which prompts the
user for the following input:
- Action: insert before, insert after, or quit
- Pattern: The location to insert the input
- Input: The string to be inserted
As the script writer you can create your own command symbols (just make sure
that they are documented so the user can figure out what to do without you
standing there providing that information). Make sure that the modified
string is displayed following each loop iteration.