Python Basics and Biostatistics
Deviation From The Mean
Back to Index
In this lesson the student will learn or review how to:
- input values from standard input
- use the abs function
- use the int and float functions
- calculate average deviation from the mean
By the end of this lesson the student will be able to:
Write a Python script to calculate the deviation from the mean for a data set.
Finding the range of a list of numbers is one way to get information about the
variability of a group of numbers. Calculating the average deviation from
the mean is another way to measure variability. The first thing we have to
do to figure out the average deviation from the mean is to calculate the
mean. Then we can find the difference between each individual item and the
mean. We take the absolute values of these differences and add them all up
to come up with the total deviation. The average deviation from the mean is
derived by dividing the total deviation by the number of data points.
The following script calculates the mean deviation from the mean.
#!/usr/bin/python
vals = []
total = 0
num = 0
#populate list
while True:
n = raw_input("Enter an integer value (hit q to quit): ")
if(n == 'q'):
break
else:
vals.append(int(n))
num = len(vals)
#calculate the mean
for i in range(0,num):
total += vals[i]
mean = float(total)/num
#calculate the deviation from the mean
dev = 0
for i in range(0,num):
dev += abs(vals[i] - mean)
mean_dev = float(dev)/num
#REPORT
print "DATA SET: " , vals
print "MEAN: " , mean
print "MEAN DEVIATION: " , mean_dev
There are several constructs worth reviewing in this script:
- First of all we create an empty list like this: "vals = []". We
populate this list using the append function. The append function simply
inserts an item at the end of an array.
- The while loop allows for iteration until a closing condition is met. In
this case the loop would continue forever except that a break statement is
triggered if the user enters a "q" for quit.
- The raw_input function allows for input to be taken at the command line.
- The raw_input must be transformed into an integer value to make it
possible to perform arithmetic operations using the data later in the
program.
- The abs function returns the absolute value of its argument. In this
case the argument is an equation (vals[i] - mean). We must use the absolute
value because otherwise the total deviation will always equal zero (or
possibly for some data sets a value extremely close to zero).
Now let's look at two sets of numbers which have the same mean. Here's the
output of running these numbers through the script:
DATA SET: 13 15 22 24 27 25 18 16 17 21 23 19
MEAN: 20
MEAN DEVIATION: 3.66666666666667
DATA SET: 14 16 19 26 24 21 15 17 25 23 21 19
MEAN: 20
MEAN DEVIATION: 3.33333333333333
You will notice that the mean of the deviation from the mean of each group
of numbers is a bit different. This indicates that the variability of one
group is greater than the variability of the other.
ASSIGNMENT:
You will extend the sample script so that it also reports the range, high
value and low value. Next you will create two lists of twelve numbers. The
first list will have a greater range than the second. The second list will
have a greater mean deviation from the mean than the first. These are
conflicting demands. To make the creation of these lists even more
difficult, each list must have the same mean. Further, you may have no
repeated values within the same list and all values must lie between 50 and
80. (You will notice that the focus here is on data and you should realize
that creating data sets to test software with is an important part of
verifying that a script works correctly.)