Perl Basics and Biostatistics
Confidence Interval of the Mean
In this lesson the student will learn how to:
- interpolate between two values
- calculate the 95% CI of the mean for a data set
- use the shift function
By the end of this lesson the student will be able to:
Write a Perl script which calculates the 95% CI for a data
set given the mean, sample SD, and N.
How sure can we be that the mean of our sample matches the mean of the
population we are sampling from? Assuming the use of random sampling
techniques, we can be sure that the larger our sample size the more
confident we can be that any statistics we generate from our sample
data is an accurate reflection of the population we are sampling from.
But how sure can we be? That's where the calculation of a confidence
level of the mean comes in. Before we delve into this topic we will
establish that the correct syntax to express a CI is to denote a range
as in 64.6 to 66.7 inches or [64.6, 66.7].
Calculating the Confidence Interval of a Mean
The basic formula to calculate the confidence interval of a mean looks
like this:
95%CI: (m - t * ( s/ sqrt(N) ) ) to (m + t * ( s/ sqrt(N) ) )
In this equation m stands for mean, t is the coefficient for 95% CI, s
is our sample SD, and N is the number of items in our data set. The value
for t is derived from the degrees of freedom (which is just N - 1) as
displayed in the following table:
df | t |
1 | 12.706 |
2 | 4.303 |
3 | 3.182 |
4 | 2.776 |
5 | 2.571 |
6 | 2.447 |
7 | 2.365 |
8 | 2.306 |
9 | 2.262 |
10 | 2.228 |
11 | 2.201 |
12 | 2.179 |
13 | 2.160 |
14 | 2.145 |
15 | 2.131 |
20 | 2.086 |
25 | 2.060 |
30 | 2.042 |
40 | 2.021 |
60 | 2.000 |
120 | 1.980 |
infinity | 1.960 |
Here's an example:
s = 10.0
m = 100
N = 34
To find t we must interpolate from the table:
We know that for df 30 t = 2.042 and that for df 40 t = 2.021
For N = 34, df = 33
33 is 7/10ths of the way from 40 down to 30. So to find t we just do the following calculation:
t = .7 * (2.042 - 2.021) + 2.021
= .7 * .021 + 2.021
= .0147 + 2.021
= 2.036
So now we can apply this to our equation:
95%CI(high) = 100 + 2.036 * ( 10/sqrt(34) )
95%CI(low) = 100 - 2.036 * ( 10/sqrt(34) )
Using Shift - The following script shows you a useful method to
prompt the user and to then use the information collected from the user.
#!/usr/bin/perl
@prompts = ("YOUR NAME: ", "YOUR AGE: ", "YOUR HEIGHT: ", "YOUR WEIGHT: ");
@responses = ();
@hold = @prompts;
while(@hold){
$item = shift @hold;
print "PLEASE ENTER $item";
$input = ;
chomp $input;
push(@responses, $input);
}
@hold = @prompts;
print "YOUR RESPONSES:\n";
while(@hold){
$label = shift @hold;
$r = shift @responses;
print "$label $r\n";
}
exit;
Remember that the user's responses will be stored as:
$responses[0] --> contains response for name prompt
$responses[1] --> contains response for age prompt
$responses[2] --> contains response for height prompt
$responses[3] --> contains response for weight prompt
The shift function returns the value stored in the first item of an array
and deletes that item from the array.
ASSIGNMENT:
Write a script which takes as input the mean, sample SD, and N. The t value
will be interpolated from the df (N-1) and then the 95% CI will be
calculated. The results will be displayed like this:
mean: 44
sample SD: 2.3
sample size: 67
95% CI: [39.6,48.4]
You must store your responses and prompts in arrays and use the shift
function to iterate through your prompts.