Perl Basics and Biostatistics

Deviation From The Mean

In this lesson the student will learn how to:
  1. input values from standard input
  2. use the abs function
  3. use the chomp function
  4. compare non-numeric values
  5. calculate average deviation from the mean
By the end of this lesson the student will be able to:

Write a Perl script to calculate the deviation from the mean for a data set.

Finding the range of a list of numbers is one way to get information about the variability of a group of numbers. Calculating the average deviation from the mean is another way to measure variability. The first thing we have to do to figure out the average deviation from the mean is to calculate the mean. Then we can find the difference between each individual item and the mean. We take the absolute values of these differences and add them all up to come up with the total deviation. The average deviation from the mean is derived by dividing the total deviation by the number of data points.

The following script calculates the mean deviation from the mean.

#!/usr/bin/perl #data points @vals = ( ); $input = 0; while( $input ne 'q' ){ print "Enter a single value: "; $input = <STDIN>; chomp($input); if($input ne 'q'){ push(@vals, $input); } } #determine the mean for the group of numbers: $n = @vals; $total = 0; foreach $item (@vals){ $total += $item; } $mean = $total / $n; #determine the deviation from the mean $dev = 0; foreach $item (@vals){ $dev += abs( $item - $mean ); } #determine mean deviation from the mean $mean_dev = $dev / $n; #REPORT print "DATA SET: @vals\n"; print "MEAN: $mean\n"; print "MEAN DEVIATION: $mean_dev\n"; exit;
There are several new constructs introduced in this script:

Now let's look at two sets of numbers which have the same mean. Here's the output of running these numbers through the script:


DATA SET: 13 15 22 24 27 25 18 16 17 21 23 19
MEAN: 20
MEAN DEVIATION: 3.66666666666667

DATA SET: 14 16 19 26 24 21 15 17 25 23 21 19
MEAN: 20
MEAN DEVIATION: 3.33333333333333

You will notice that the mean of the deviation from the mean of each group of numbers is a bit different. This indicates that the variability of one group is greater than the variability of the other.

ASSIGNMENT:

You will extend the sample script so that it also reports the range, high value and low value. Next you will create two lists of twelve numbers. The first list will have a greater range than the second. The second list will have a greater mean deviation from the mean than the first. These are conflicting demands. To make the creation of these lists even more difficult, each list must have the same mean. Further, you may have no repeated values within the same list and all values must lie between 50 and 80.