Perl Basics and Biostatistics

Variability and Range

In this lesson the student will learn how to:
  1. Create and populate arrays
  2. Find the range of a group of numbers
  3. Use the foreach loop
  4. Use the comma operator
By the end of this lesson the student will be able to:

   Write a Perl script which will calculate the mean 
   and range for sets of numbers of any size.

Let's say that we conduct a study of 100 subjects with high blood pressure. Fifty of these subjects get an experimental treatment and the other 50 get a traditional treatment. Let's say for our purposes we are only interested in the systolic pressure (the higher of the two values in a blood pressure reading). How do we analyze the data we've collected? One thing we can do is to calculate the mean of each group. Another thing we can do is to determine the amount of variability within the group. The crudest and least informative way of doing this is to determine the range for each set of data points. The range is simply the differnce between the highest and lowest value for a data set. We have two data sets and just as we can figure out the mean for each data set, we also can figure out the range for each data set.

Here's a simple script which calculates the mean and range for two sets of ten numbers:

#!/usr/bin/perl #data points @g1 = ( 120, 135, 140, 150, 133, 141, 146, 155, 137, 144 ); @g2 = ( 133, 136, 139, 152, 139, 140, 147, 152, 150, 149 ); #calculate means $t1 = 0; foreach $v (@g1){ $t1 = $t1 + $v; } $n = @g1; $m1 = $t1 / $n; $t2 = 0; foreach $v (@g2){ $t2 = $t2 + $v; } $n = @g2; $m2 = $t2 / $n; #find highest $h1 = 0; foreach $v (@g1){ $h1 = ( $h1 > $v ? $h1 : $v ); } $h2 = 0; foreach $v (@g2){ $h2 = ( $h2 > $v ? $h2 : $v ); } #find lowest $l1 = 1000; #set to value higher than any of the data points foreach $v (@g1){ $l1 = ( $l1 < $v ? $l1 : $v ); } $l2 = 1000; foreach $v (@g2){ $l2 = ( $l2 < $v ? $l2 : $v ); } #calculate ranges $r1 = $h1 - $l1; $r2 = $h2 - $l2; #generate report print "GROUP ONE: @g1\n", "MEAN: $m1, HIGH: $h1, LOW: $l1, RANGE: $r1\n\n"; print "GROUP TWO: @g2\n", "MEAN: $m2, HIGH: $h2, LOW: $l2, RANGE: $r2\n\n"; exit;
Get this script up and run it.

There are several new constructs in this script which we need to discuss.

ASSIGNMENT:

You will alter the example script in the following ways:

  1. You will populate your arrays with 20 values each. Your array values will represent the weights of individuals in a population. You should specify the age group, gender, and species of your population and then produce a list of reasonable weights (in pounds, kilograms, ounces, or grams) for that population.
  2. You may use only two foreach loops. That is one foreach loop for each group. The sample script uses six foreach loops which is a bit redundant since each group only needs one foreach loop which performs all the calculations within that single loop. So your loops will find the highest value, lowest value, and total for each group.