Perl Basics and Biostatistics
Deviation From The Mean
In this lesson the student will learn how to:
- input values from standard input
- use the abs function
- use the chomp function
- compare non-numeric values
- calculate average deviation from the mean
By the end of this lesson the student will be able to:
Write a Perl script to calculate the deviation from the mean for a data set.
Finding the range of a list of numbers is one way to get information about the
variability of a group of numbers. Calculating the average deviation from
the mean is another way to measure variability. The first thing we have to
do to figure out the average deviation from the mean is to calculate the
mean. Then we can find the difference between each individual item and the
mean. We take the absolute values of these differences and add them all up
to come up with the total deviation. The average deviation from the mean is
derived by dividing the total deviation by the number of data points.
The following script calculates the mean deviation from the mean.
#!/usr/bin/perl
#data points
@vals = ( );
$input = 0;
while( $input ne 'q' ){
print "Enter a single value: ";
$input = ;
chomp($input);
if($input ne 'q'){
push(@vals, $input);
}
}
#determine the mean for the group of numbers:
$n = @vals;
$total = 0;
foreach $item (@vals){
$total += $item;
}
$mean = $total / $n;
#determine the deviation from the mean
$dev = 0;
foreach $item (@vals){
$dev += abs( $item - $mean );
}
#determine mean deviation from the mean
$mean_dev = $dev / $n;
#REPORT
print "DATA SET: @vals\n";
print "MEAN: $mean\n";
print "MEAN DEVIATION: $mean_dev\n";
exit;
There are several new constructs introduced in this script:
- First of all we create an empty list like this: "@vals = ( );". We
populate this list using the push function. The push function takes two
arguments: The first is the list into which we add the second argument.
- The while loop allows for iteration until a closing condition is met. In
this case the loop continues until the user enters a single 'q'. So, as long
as the input value is NOT EQUAL to 'q' the loop continues. The ne between
$input and 'q' stands for not equal.
- The <STDIN> construct stands for standard input. In this script
the contents of standard input are stored in the variable called $input.
- When standard input is collected a newline character (\n) is always
appended to the end of the input. The chomp function gets rid of this
newline character.
- The abs function returns the absolute value of its argument. In this
case the argument is an equation ($item - $mean). We must use the absolute
value because otherwise the total deviatin will always equal zero.
Now let's look at two sets of numbers which have the same mean. Here's the
output of running these numbers through the script:
DATA SET: 13 15 22 24 27 25 18 16 17 21 23 19
MEAN: 20
MEAN DEVIATION: 3.66666666666667
DATA SET: 14 16 19 26 24 21 15 17 25 23 21 19
MEAN: 20
MEAN DEVIATION: 3.33333333333333
You will notice that the mean of the deviation from the mean of each group
of numbers is a bit different. This indicates that the variability of one
group is greater than the variability of the other.
ASSIGNMENT:
You will extend the sample script so that it also reports the range, high
value and low value. Next you will create two lists of twelve numbers. The
first list will have a greater range than the second. The second list will
have a greater mean deviation from the mean than the first. These are
conflicting demands. To make the creation of these lists even more
difficult, each list must have the same mean. Further, you may have no
repeated values within the same list and all values must lie between 50 and
80.