Perl Basics and Biostatistics

Correlation Coefficient

In this lesson the student will learn how to:
  1. use for-loops
  2. calculate the correlation coefficient
By the end of this lesson the student will be able to:

  Write scripts containing simple for-loops

The equation for calculating r (the correlation coefficient) is programmed into many cheap calculators, so you don't really need to know how to calculate it yourself. The equation can be expressed in several ways, here's one:

          summation i to N [ (Xi - mean of X)/ SDx * (Yi - mean of Y) / SDy ]
   r = ------------------------------------------------------------------------
                                 ( N - 1 )

As you will recall:
   
    SD = sqrt( summation i to N (Yi - mean)2 / (N - 1) )

So calculating r requires several steps:
  1. Find the mean of your groups of numbers.
        
        X = ( 3, 5, 6, 7, 10, 12 )
        mean of X = 7.17
    
        Y = ( 5, 6, 7, 9, 10, 13 )
        mean of Y = 8.33
    
    
  2. Calculate the SD for each group.
    
      SDx = [ (3-7.17)2+(5-7.17)2+(6-7.17)2+(7-7.17)2+(10-7.17)2+(12-7.17)2 ] /  5
      SD = 1.48
    
      SDx = [ (5-8.33)2+(6-8.33)2+(7-8.33)2+(9-8.33)2+(10-8.33)2+(13-8.33)2 ] /  5
      SD = 1.32
    
    
  3. Calculate r:
    
      r = [((3-7.17)/1.48*(5-8.33)/1.32)/5+((5-7.17)/1.48*(6-8.33)/1.32)/5+((6-7.17)/1.48*(7-8.33)/1.32)/5+((7-7.17)/1.48*(9-8.33)/1.32)/5+((10-7.17)/1.48*(10-8.33)/1.32)/5+((12-7.17)/1.48*(13-8.33)/1.32)/5 ] / 5
      r = 0.978
    
    

Nonparametric Methods

Statistical methods that do not make assumptions about the distribution of the population are called nonparametric tests. Most nonparametric ideas are based on a simple idea. List the values in order from low to high, and assign each value a rank. Base all further analyses on the ranks. By analyzing ranks rather than values, you don't need to care about the distribution of the population.

PERL: For Loops

You've seen this construct many times:

#!/usr/bin/perl -w foreach $n (1..10){ print "$n\n"; } exit;
You've also seen it used like this:
#!/usr/bin/perl -w @stuff = ("hairpin", 33, "PIG", "lantern", "33-34", "earth"); foreach $n (@stuff){ print "$n\n"; } exit;
So, that's the foreach loop. Now let's look at the for loop.
#!/usr/bin/perl -w for($i = 0; $i < 10; $i++){ print "$i\n"; } exit;
Let's look at each portion of the for-loop construct:

The first part: $i = 0;

  This is the initializer. The variable which is being used to keep
  track of the count is initialized. Commonly this variable is initialized
  to zero, but it can be initialized to any integer value.

The middle part: $i < 10; 

  This is the exit condition. The loop will continue as long as this
  condition evaluates to true. Once this condition becomes false the
  loop ends.

The end: $i++

  This is the increment amount. In this case the value stored in the
  counter variable is increased by one on every iteration of the loop.
  The counter can be increased by any integer amount on each loop or
  it can even be decreased in value! 

Now consider what we can do when we utilize the comma operator:
#!/usr/bin/perl -w for($i = 0, $v = 10; $i < 10; $i++, $v--){ print "$i, $v\n"; } exit;
Here the loop is under control of the $i variable and the $v variable is just sort of along for the ride. That is, exiting the loop is contingent upon the value of $i and $v isn't a factor in deciding when to end the loop.

Now consider this useful trick using a foreach loop:

#!/usr/bin/perl -w @list = (1,2,3,4,5); print "@list\n"; foreach $n (@list){ $n = $n * 10; } print "@list\n"; exit;
Consider this reverse-sorting program:
#!/usr/bin/perl -w $str = "norman zoolander always wanted to purchase four umbrellas and seven turnips"; foreach $word (reverse sort split(/ /, $str)){ print "$word "; } print "\n"; exit;
Finally, take a look at this little gem:
#!/usr/bin/perl -w $str = "AGTABAVDADGADEDGATGADKLLPILIGADS"; $pat = "GAD"; for($i = 0; $i< length($str)-3; $i++){ $seg = substr($str,$i,3); # string, start pt, length if($pat eq $seg){ $output .= "^"; } else{ $output .= " "; } } print "SEARCH PATTERN: $pat\n"; print "$str\n"; print "$output\n\n"; exit;

ASSIGNMENT:

  1. Write a for-loop (not foreach loop) which outputs the multiples of five from five to 100
  2. Write a for-loop (not foreach loop) which outputs the first 15 powers of two
  3. Create a written explanation of how a for-loop could be used to calculate the correlation coefficient