Perl Basics and Biostatistics

Confidence Interval of the Mean

In this lesson the student will learn how to:
  1. interpolate between two values
  2. calculate the 95% CI of the mean for a data set
  3. use the shift function
By the end of this lesson the student will be able to:

  Write a Perl script which calculates the 95% CI for a data
  set given the mean, sample SD, and N.

How sure can we be that the mean of our sample matches the mean of the population we are sampling from? Assuming the use of random sampling techniques, we can be sure that the larger our sample size the more confident we can be that any statistics we generate from our sample data is an accurate reflection of the population we are sampling from. But how sure can we be? That's where the calculation of a confidence level of the mean comes in. Before we delve into this topic we will establish that the correct syntax to express a CI is to denote a range as in 64.6 to 66.7 inches or [64.6, 66.7].

Calculating the Confidence Interval of a Mean

The basic formula to calculate the confidence interval of a mean looks like this:


  95%CI: (m - t * ( s/ sqrt(N) ) ) to (m + t * ( s/ sqrt(N) ) )

In this equation m stands for mean, t is the coefficient for 95% CI, s is our sample SD, and N is the number of items in our data set. The value for t is derived from the degrees of freedom (which is just N - 1) as displayed in the following table:

dft
112.706
24.303
33.182
42.776
52.571
62.447
72.365
82.306
92.262
102.228
112.201
122.179
132.160
142.145
152.131
202.086
252.060
302.042
402.021
602.000
1201.980
infinity1.960

Here's an example:


  s = 10.0
  m = 100
  N = 34
  
To find t we must interpolate from the table:

  We know that for df 30 t = 2.042 and that for df 40 t = 2.021

  For N = 34, df = 33
 
  33 is 7/10ths of the way from 40 down to 30. So to find t we just do the following calculation:
    
   t =  .7 * (2.042 - 2.021) + 2.021
     =  .7 * .021 + 2.021
     =  .0147 + 2.021
     =  2.036

So now we can apply this to our equation:

  95%CI(high) = 100 + 2.036 * ( 10/sqrt(34) )

  95%CI(low) = 100 - 2.036 * ( 10/sqrt(34) )

Using Shift - The following script shows you a useful method to prompt the user and to then use the information collected from the user.

#!/usr/bin/perl @prompts = ("YOUR NAME: ", "YOUR AGE: ", "YOUR HEIGHT: ", "YOUR WEIGHT: "); @responses = (); @hold = @prompts; while(@hold){ $item = shift @hold; print "PLEASE ENTER $item"; $input = <STDIN>; chomp $input; push(@responses, $input); } @hold = @prompts; print "YOUR RESPONSES:\n"; while(@hold){ $label = shift @hold; $r = shift @responses; print "$label $r\n"; } exit;
Remember that the user's responses will be stored as: $responses[0] --> contains response for name prompt $responses[1] --> contains response for age prompt $responses[2] --> contains response for height prompt $responses[3] --> contains response for weight prompt The shift function returns the value stored in the first item of an array and deletes that item from the array.

ASSIGNMENT:

Write a script which takes as input the mean, sample SD, and N. The t value will be interpolated from the df (N-1) and then the 95% CI will be calculated. The results will be displayed like this:

mean: 44 sample SD: 2.3 sample size: 67 95% CI: [39.6,48.4] You must store your responses and prompts in arrays and use the shift function to iterate through your prompts.