Perl Basics and Biostatistics

Finding the Median Value

In this lesson the student will learn how to:
  1. find the median of a data set
  2. use the sort function
  3. implement numeric or alphabetic sorting
  4. use an if/else construct
  5. use the modulus operator
  6. use array indexes
By the end of this lesson the student will be able to:

  Write a Perl script to find the 
  median of a group of values.

The Median

The median of a group of numbers is the middle value. Half the data points will be above the median and half will be below. For data sets containing an odd number of values it is really easy to find the median. Consider this list of values:


  5, 11, 14, 20, 22, 23, 24, 25, 30.

There are nine values in this list. They are sorted in order of size and so the middle value is the fifth value which is 22. There are four values less than this number and four numbers greater than this number. Twenty-two is the middle value and is therefore the median.

Now what do we do if we have an even number of values? Consider the following list:


  5, 11, 14, 20, 22, 23, 24, 25.

Here the middle two values are 20 and 22. To find the median we add these two values together and divide by two like this:

  20 + 22 = 42
  42 / 2 = 21

So, 21 is the median for this group of eight numbers.

The following script will find the median for any list of numeric values.

#!/usr/bin/perl #data points @vals = ( 33, 23, 55, 39, 41, 46, 38, 52, 34, 29, 27, 51, 33, 28 ); print "UNSORTED: @vals\n"; #sort data points @vals = sort(@vals); print "SORTED: @vals\n"; #test to see if there are an even number of data points if( @vals % 2 == 0){ #if even then: $sum = $vals[(@vals/2)-1] + $vals[(@vals/2)]; $med = $sum/2; print "The median value is $med\n"; } else{ #if odd then: print "The median value is $vals[@vals/2]\n"; } exit;
There are a number of things we need to discuss in order to understand this script:

There are some problems with the sort function. It only performs an alphabetical sort. This is a problem when we are working with numbers because alphabetically speaking 11 comes before 2. As long as all our numbers are two digit numbers we don't notice any problem, but when you start mixing numbers with different numbers of digits, you see the problem. The following script shows how to fix this problem:

#!/usr/bin/perl @vals = ( "1", "11", "25", "13", "22", "23", "24", "2", "33", "3", "35", "36" ); print "ORIGINAL ARRAY: @vals\n"; @normal_sort = sort(@vals); print "NORMAL SORT RESULTS: @normal_sort\n"; @numeric_sort = sort{$a <=> $b}(@vals); print "NUMERIC SORT RESULTS: @numeric_sort\n";

ASSIGNMENT:

Write a script which finds the mean and the median for a group of numbers. Next your script will count the number of values over the mean and under the mean and issue a report that looks about like this:


  MEAN: 22.25
  MEDIAN: 24
  OVER MEAN: 8
  UNDER MEAN: 12

Your array will contain 20 values.

Also don't forget to use a proper numeric sort.