Python Basics and Biostatistics

Finding the Median Value

Back to Index
In this lesson the student will learn how to:
  1. find the median of a data set
  2. use the sort function
  3. implement numeric or alphabetic sorting
  4. use an if/else construct
  5. use the modulus operator
  6. use array indexes
By the end of this lesson the student will be able to:

  Write a Python script to find the 
  median of a group of values.

The Median

The median of a group of numbers is the middle value. Half the data points will be above the median and half will be below. For data sets containing an odd number of values it is really easy to find the median. Consider this list of values:


  5, 11, 14, 20, 22, 23, 24, 25, 30.

There are nine values in this list. They are sorted in order of size and so the middle value is the fifth value which is 22. There are four values less than this number and four numbers greater than this number. Twenty-two is the middle value and is therefore the median.

Now what do we do if we have an even number of values? Consider the following list:


  5, 11, 14, 20, 22, 23, 24, 25.

Here the middle two values are 20 and 22. To find the median we add these two values together and divide by two like this:

  20 + 22 = 42
  42 / 2 = 21

So, 21 is the median for this group of eight numbers.

The following script will find the median for any list of numeric values stored in the variable vals.

#!/usr/bin/python #data points vals = [33, 23, 55, 39, 41, 46, 38, 52, 34, 29, 27, 51, 33, 28] output=0 print "UNSORTED: ", vals #sort data points vals.sort() print "SORTED: ", vals #test to see if length of vals is even or odd if(len(vals)%2==0): #if even then: sum = vals[len(vals)/2-1] + vals[len(vals)/2] output = float(sum)/2 else: #if odd then: output = vals[len(vals)/2] print "MEDIAN VALUE: " + str(output)
There are a number of things we need to discuss in order to understand this script:

ASSIGNMENT:

Write a script which finds the mean and the median for a group of numbers. Next your script will count the number of values over the mean and under the mean and issue a report that looks about like this:


  MEAN: 22.25
  MEDIAN: 24
  OVER MEAN: 8
  UNDER MEAN: 12

Your array will contain 20 values.