Perl Basics and Biostatistics
Variability and Range
In this lesson the student will learn how to:
- Create and populate arrays
- Find the range of a group of numbers
- Use the foreach loop
- Use the comma operator
By the end of this lesson the student will be able to:
Write a Perl script which will calculate the mean
and range for sets of numbers of any size.
Let's say that we conduct a study of 100 subjects with high blood pressure.
Fifty of these subjects get an experimental treatment and the other 50 get a
traditional treatment. Let's say for our purposes we are only interested in
the systolic pressure (the higher of the two values in a blood pressure
reading). How do we analyze the data we've collected? One thing we can do is
to calculate the mean of each group. Another thing we can do is to determine
the amount of variability within the group. The crudest and least
informative way of doing this is to determine the range for each set of data
points. The range is simply the differnce between the highest and lowest
value for a data set. We have two data sets and just as we can figure out
the mean for each data set, we also can figure out the range for each data
set.
Here's a simple script which calculates the mean and range for two sets of
ten numbers:
#!/usr/bin/perl
#data points
@g1 = ( 120, 135, 140, 150, 133, 141, 146, 155, 137, 144 );
@g2 = ( 133, 136, 139, 152, 139, 140, 147, 152, 150, 149 );
#calculate means
$t1 = 0;
foreach $v (@g1){
$t1 = $t1 + $v;
}
$n = @g1;
$m1 = $t1 / $n;
$t2 = 0;
foreach $v (@g2){
$t2 = $t2 + $v;
}
$n = @g2;
$m2 = $t2 / $n;
#find highest
$h1 = 0;
foreach $v (@g1){
$h1 = ( $h1 > $v ? $h1 : $v );
}
$h2 = 0;
foreach $v (@g2){
$h2 = ( $h2 > $v ? $h2 : $v );
}
#find lowest
$l1 = 1000; #set to value higher than any of the data points
foreach $v (@g1){
$l1 = ( $l1 < $v ? $l1 : $v );
}
$l2 = 1000;
foreach $v (@g2){
$l2 = ( $l2 < $v ? $l2 : $v );
}
#calculate ranges
$r1 = $h1 - $l1;
$r2 = $h2 - $l2;
#generate report
print "GROUP ONE: @g1\n",
"MEAN: $m1, HIGH: $h1, LOW: $l1, RANGE: $r1\n\n";
print "GROUP TWO: @g2\n",
"MEAN: $m2, HIGH: $h2, LOW: $l2, RANGE: $r2\n\n";
exit;
Get this script up and run it.
There are several new constructs in this script which we need to discuss.
- Arrays are sometimes called lists. They can contain several values. You
designate an array variable by using the @ sign instead of the $ sign. Two
arrays are created at the beginning of the sample script.
- foreach loops are convenient ways of iterating through the contents of
an array. Basically "foreach $i (@A){ ... }" is an easy way of saying for
each item in the array do whatever is specified between the delimiting curly
braces. We use foreach loops a lot in the example script.
- You will notice that the print statements at the end of the script span
over two lines. The first line of each pair ends with a comma. The comma
allows us to continue the print line for as many lines as we like. To make a
single print command print several lines all we have to do is end each line
with a comma instead of a semi-colon until we reach the last line at which
time we use the semi-colon to end the statement.
ASSIGNMENT:
You will alter the example script in the following ways:
- You will populate your arrays with 20 values each. Your array values
will represent the weights of individuals in a population. You should
specify the age group, gender, and species of your population and then
produce a list of reasonable weights (in pounds, kilograms, ounces, or grams) for that
population.
- You may use only two foreach loops. That is one foreach loop for each
group. The sample script uses six foreach loops which is a bit redundant
since each group only needs one foreach loop which performs all the
calculations within that single loop. So your loops will find the highest
value, lowest value, and total for each group.