Python Basics and Biostatistics

Deviation From The Mean

Back to Index
In this lesson the student will learn or review how to:
  1. input values from standard input
  2. use the abs function
  3. use the int and float functions
  4. calculate average deviation from the mean
By the end of this lesson the student will be able to:

Write a Python script to calculate the deviation from the mean for a data set.

Finding the range of a list of numbers is one way to get information about the variability of a group of numbers. Calculating the average deviation from the mean is another way to measure variability. The first thing we have to do to figure out the average deviation from the mean is to calculate the mean. Then we can find the difference between each individual item and the mean. We take the absolute values of these differences and add them all up to come up with the total deviation. The average deviation from the mean is derived by dividing the total deviation by the number of data points.

The following script calculates the mean deviation from the mean.

#!/usr/bin/python vals = [] total = 0 num = 0 #populate list while True: n = raw_input("Enter an integer value (hit q to quit): ") if(n == 'q'): break else: vals.append(int(n)) num = len(vals) #calculate the mean for i in range(0,num): total += vals[i] mean = float(total)/num #calculate the deviation from the mean dev = 0 for i in range(0,num): dev += abs(vals[i] - mean) mean_dev = float(dev)/num #REPORT print "DATA SET: " , vals print "MEAN: " , mean print "MEAN DEVIATION: " , mean_dev
There are several constructs worth reviewing in this script:

Now let's look at two sets of numbers which have the same mean. Here's the output of running these numbers through the script:


DATA SET: 13 15 22 24 27 25 18 16 17 21 23 19
MEAN: 20
MEAN DEVIATION: 3.66666666666667

DATA SET: 14 16 19 26 24 21 15 17 25 23 21 19
MEAN: 20
MEAN DEVIATION: 3.33333333333333

You will notice that the mean of the deviation from the mean of each group of numbers is a bit different. This indicates that the variability of one group is greater than the variability of the other.

ASSIGNMENT:

You will extend the sample script so that it also reports the range, high value and low value. Next you will create two lists of twelve numbers. The first list will have a greater range than the second. The second list will have a greater mean deviation from the mean than the first. These are conflicting demands. To make the creation of these lists even more difficult, each list must have the same mean. Further, you may have no repeated values within the same list and all values must lie between 50 and 80. (You will notice that the focus here is on data and you should realize that creating data sets to test software with is an important part of verifying that a script works correctly.)