String Processing

Working With Substrings

In this lesson the student will learn how to:
  1. Use the substr function in a variety of situations
  2. Use the substr function to insert into a string
  3. Know how to use the index and rindex functions
By the end of this lesson the student will be able to:

     Write a script which allows the user to insert
     strings within another string.

Start and Stop Codons

The genes of humans (and other eukaryotes) are not continuous sequences. Rather eukaryotic genes are broken into coding sequences called exons and noncoding sequences called introns. A single gene may be broken into several exons interrupted by several introns. To help you remember the terms exon and intron think of exons as expressed sequences and introns and interruptions between expressed sequences. Exons in the human genome have an average length of about 200 base pairs. Introns can be very long or very short and are very inconsistent in length.


   EXons  --> EXpressed sequences

   INTrons  --> INTerruptions

Signaling the beginning of an exon is an AUG codon (which also codes for methionine). Signaling the end of an exon is any of three stop codons (UAA, UAG, UGA). There are several other complications to consider, but for now we will focus on exons and introns and start and stop codons.

The following script shows how specific codons can be located within a DNA sequence.

#!/usr/bin/perl $seq = "aacggacuaaauguaagcuacacuacgagcuacgaucacgacucaauaaggagcaaaa"; print $seq . "\n"; $pos = index($seq, "aug"); #starting sequence $output=""; for($i=0; $i<$pos; $i++){ $output .= " "; } $output .= "^"; $p2 = rindex($seq, "uaa"); #stop sequence $pos = $p2-$pos-1; for($i=0; $i<$pos; $i++){ $output .= " "; } $output .= "^"; print $output . "\n";
The index function returns the location of the pattern specified as the second argument. The rindex function does more or less the same thing except it searches from the right end of the string.

The following example shows how to insert using the substr function.

#!/usr/bin/perl $str = "xxxxxx"; print "ORIGINAL: $str\n"; substr($str, 3,0) = "VVV"; print "INSERT W/O OVERWRITE: $str\n"; substr($str, 3, 3) = "WWW"; print "INSERT WITH OVERWRITE: $str\n";
Notice that you can overwrite or not overwrite depending on what you do with the third variable.

ASSIGNMENT:

Write a script which allows the user to insert a string into another string before or after a user defined pattern. When the script first begins prompt the user to enter an initial string. Next enter a loop which prompts the user for the following input:

  1. Action: insert before, insert after, or quit
  2. Pattern: The location to insert the input
  3. Input: The string to be inserted
As the script writer you can create your own command symbols (just make sure that they are documented so the user can figure out what to do without you standing there providing that information). Make sure that the modified string is displayed following each loop iteration.