Write a script to calculate the 95% CI for the difference between means based on basic input data.
Let's suppose that we have the results of a weight loss study reported to us using only the following data:
Mean Weight Loss Standard Deviation Number of Subjects Group One: 8.5 lbs 1.8 lbs 64 Group Two: 6.1 lbs 2.4 lbs 64In this study both groups were provided with organized exercise activities, nutritional guidance, and a supervised diet (typical fat camp activities in which case we should add guided imagery, self-esteen building, and goal setting activities). The key difference was that group one received an experimental weight loss drug and that group two received a placebo.
From the numbers we are provided with we can calculate confidence intervals for each group. Recall the following equations:
SEM = SD / sqrt(N) 95% CI = MEAN +/- t * SEMThe 95% CI t value for a study with groups containing 64 subjects is 2.000. So, plugging in the numbers we get:
SEM G1: 1.8/8 = .23 SEM G2: 2.4/8 = .30 95% CI G1: 8.5 +/- 2.0 * .23 high: 8.96 low: 8.04 95% CI G2: 6.1 +/- 2.0 * .30 high: 6.70 low: 5.50You can see from these numbers that there is no overlap between our groups. From a quick inspection of the data we can see that the difference between our means is 8.5 - 6.1 = 2.4 . Further, we can see that the difference between the low of the CI for the upper mean and the high of the CI for the lower mean is: 8.04 - 6.7 = 1.34. Which numbers do we use when reporting our results? What is the true difference between the means and how confident can we be of these numbers? How should we handle these numbers?
Here are the equations we use to evaluate the difference between means of unpaired groups:
First, we calculate the SE of the difference: SE of the difference = sqrt( SEM12 + SEM22 ) = sqrt( .232 + .302 ) = sqrt( .1429 ) = .378 Second, we subtract our raw means: dm = 8.5 - 6.1 = 2.4 Third, we look up our value for t using: df = N1 + N2 - 2 = 64 + 64 - 2 = 126 The t value for 95% CI given df = 126 is 1.96 (assignment 11 has a table of t values). Finally, we apply the following formula: CI of mean difference = dm +/- SE of diff = 2.4 +/- .378 high = 2.778 low = 2.022Things get a whole lot more complicated if the N for our groups are not equal. (This is a situation we are not going to deal with in this unit.)
Arrays
Pay particular attention to the way that arrays are created and the way that values are retrieved from them. Further, consider how this method could be adapted for the look up of t values. (The assignment called "Confidence Interval of the Mean" has a table of t values.)
ASSIGNMENT:
Write a PERL script to calculate the 95% CI for the Difference of the Means of two studies. Here is the data from three studies for you to run your script on:
STUDY ONE: mean SD N G1 66 10.2 22 G2 70 11.4 22 STUDY TWO: G1 10.4 2.3 30 G2 8.3 1.9 30 STUDY THREE: G1 125 9.5 45 G2 105 12.2 45You can make more of your own data to make sure everything is working correctly. Use the technique demonstrated above to look up the correct t value for your N. NOTE: You must use an array to store your t values. ALSO: Don't worry about interpolating t values.