Python Basics and Biostatistics

CI of the Difference Between Means

In this lesson the student will learn how to:

coordinate the use of values stored in arrays with if/elif/else constructs
calculate the 95% CI for the difference between means

By the end of this lesson the student will be able to:


  Write a script to calculate the 95% CI for the difference
  between means based on basic input data.

Let's suppose that we have the results of a weight loss study reported to us using only the following data:

              Mean Weight Loss   Standard Deviation    Number of Subjects
  Group One:        8.5 lbs          1.8 lbs                 64
  Group Two:        6.1 lbs          2.4 lbs                 64

In this study both groups were provided with organized exercise activities, nutritional guidance, and a supervised diet (typical fat camp activities in which case we should add guided imagery, self-esteen building, and goal setting activities). The key difference was that group one received an experimental weight loss drug and that group two received a placebo.

From the numbers we are provided with we can calculate confidence intervals for each group. Recall the following equations:


   SEM = SD / sqrt(N)

   95% CI = MEAN +/- t * SEM

The 95% CI t value for a study with groups containing 64 subjects is 2.000. So, plugging in the numbers we get:


  SEM G1: 1.8/8 = .23
  SEM G2: 2.4/8 = .30

  95% CI G1: 8.5 +/- 2.0 * .23 
       high: 8.96
        low: 8.04

  95% CI G2: 6.1 +/- 2.0 * .30
       high: 6.70
        low: 5.50

You can see from these numbers that there is no overlap between our groups. From a quick inspection of the data we can see that the difference between our means is 8.5 - 6.1 = 2.4 . Further, we can see that the difference between the low of the CI for the upper mean and the high of the CI for the lower mean is: 8.04 - 6.7 = 1.34. Which numbers do we use when reporting our results? What is the true difference between the means and how confident can we be of these numbers? How should we handle these numbers?

Here are the equations we use to evaluate the difference between means of unpaired groups:


First, we calculate the SE of the difference:

  SE of the difference = sqrt( SEM1² + SEM2² )
                       = sqrt( .23² + .30² ) 
                       = sqrt( .1429 )
                       = .378

Second, we subtract our raw means:

  dm = 8.5 - 6.1 
     = 2.4

Third, we look up our value for t using:

  df = N1 + N2 - 2
     = 64 + 64 - 2
     = 126

The t value for 95% CI given df = 126 is 1.96 (assignment 11 has a table of t values).

Finally, we apply the following formula:

  CI of mean difference = dm +/- SE of diff

                        = 2.4 +/- .378
                       
                 high = 2.778
                  low = 2.022

Things get a whole lot more complicated if the N for our groups are not equal. (This is a situation we are not going to deal with in this unit.)

Arrays

Pay particular attention to the way that arrays are created and the way that values are retrieved from them. Further, consider how this method could be adapted for the look up of t values. (The assignment called "Confidence Interval of the Mean" has a table of t values.)

#!/usr/bin/python type = ["dork", "nerd", "geek", "dweeb", "moron", "jerk", "pukeface"] res ="" r = raw_input("Please enter the number of cans of soda you can drink in a week: ") r = int(r) if r<3: res = type[r] elif r<6: res = type[3] elif r<9: res = type[4] elif r<15: res = type[5] elif r<20: res = type[6] else: res = type[0] print "You are a total " + res

ASSIGNMENT:

Write a Python script to calculate the 95% CI for the Difference of the Means of two studies. Here is the data from three studies for you to run your script on:

   STUDY ONE:
                mean          SD               N
     G1          66          10.2              22
     G2          70          11.4              22

   STUDY TWO:

     G1          10.4         2.3              30
     G2           8.3         1.9              30

   STUDY THREE:
 
     G1          125          9.5              45
     G2          105         12.2              45

You can make more of your own data to make sure everything is working correctly. Use the technique demonstrated above to look up the correct t value for your N. NOTE: You must use an array to store your t values. ALSO: Don't worry about interpolating t values. (Even if you know about Python dictionaries, you may not use a Python dictionary for this assignment.)