Python Basics and Biostatistics

Confidence Interval of the Mean

Back to index
In this lesson the student will learn how to:
  1. interpolate between two values
  2. calculate the 95% CI of the mean for a data set
  3. use the pop function
By the end of this lesson the student will be able to:

  Write a Python script which calculates the 95% CI for a data
  set given the mean, sample SD, and N.

How sure can we be that the mean of our sample matches the mean of the population we are sampling from? Assuming the use of random sampling techniques, we can be sure that the larger our sample size the more confident we can be that any statistics we generate from our sample data is an accurate reflection of the population we are sampling from. But how sure can we be? That's where the calculation of a confidence level of the mean comes in. Before we delve into this topic we will establish that the correct syntax to express a CI is to denote a range as in 64.6 to 66.7 inches or [64.6, 66.7].

Calculating the Confidence Interval of a Mean

The basic formula to calculate the confidence interval of a mean looks like this:


  95%CI: (m - t * ( s/ sqrt(N) ) ) to (m + t * ( s/ sqrt(N) ) )

In this equation m stands for mean, t is the coefficient for 95% CI, s is our sample SD, and N is the number of items in our data set. The value for t is derived from the degrees of freedom (which is just N - 1) as displayed in the following table:

dft
112.706
24.303
33.182
42.776
52.571
62.447
72.365
82.306
92.262
102.228
112.201
122.179
132.160
142.145
152.131
202.086
252.060
302.042
402.021
602.000
1201.980
infinity1.960

Here's an example:


  s = 10.0
  m = 100
  N = 34
  
To find t we must interpolate from the table:

  We know that for df 30 t = 2.042 and that for df 40 t = 2.021

  For N = 34, df = 33
 
  33 is 7/10ths of the way from 40 down to 30. So to find t we just do the following calculation:
    
   t =  .7 * (2.042 - 2.021) + 2.021
     =  .7 * .021 + 2.021
     =  .0147 + 2.021
     =  2.036

So now we can apply this to our equation:

  95%CI(high) = 100 + 2.036 * ( 10/sqrt(34) )

  95%CI(low) = 100 - 2.036 * ( 10/sqrt(34) )

Using pop - The following script shows you a useful method to prompt the user and to then use the information collected from the user.

#!/usr/bin/python prompts = ["YOUR NAME: ", "YOUR AGE: ", "YOUR HEIGHT: ", "YOUR WEIGHT: "] responses = [] for p in prompts: responses.append(raw_input(p)) print responses for r in range(0,len(responses)): print prompts[r] + responses.pop() print responses
If you run the script you will notice a fundamental problem that can be easily remedied by adding the number zero between the parantheses following pop.
#!/usr/bin/python prompts = ["YOUR NAME: ", "YOUR AGE: ", "YOUR HEIGHT: ", "YOUR WEIGHT: "] responses = [] for p in prompts: responses.append(raw_input(p)) print responses for r in range(0,len(responses)): print prompts[r] + responses.pop(0) print responses
Remember that the user's responses will be stored as: responses[0] --> contains response for name prompt responses[1] --> contains response for age prompt responses[2] --> contains response for height prompt responses[3] --> contains response for weight prompt The pop function returns the value stored in the last item of an array and deletes that item from the array unless told to do otherwise as in pop(0) which pops (and removes) from the front of the array.

ASSIGNMENT:

Write a script which takes as input the mean, sample SD, and N. The t value will be interpolated from the df (N-1) and then the 95% CI will be calculated. The results will be displayed like this:

mean: 44 sample SD: 2.3 sample size: 67 95% CI: [39.6,48.4] You must store your responses and prompts in arrays and use the pop function to iterate through your prompts.