Perl Basics and Biostatistics
Paired Groups
In this lesson the student will learn how to:
- use array slicing to manipulate values in an array
- use the range operator in the context of an array
- calculate the 95% CI of the difference between means for paired groups
By the end of this lesson the student will be able to:
Write a script which calculates the 95% CI of the difference
between the means for paired groups based on individual scores
stored in a pair of arrays.
Let's redesign our fat camp experiment. This time instead of just randomly
assigning our chubby campers to either the experimental or control group, we
are going to only include pairs of chubby twins or siblings in our study.
One member from each pair will be given the experimental treatment and the
other will receive the placebo. Otherwise, everything will be as it was in
the first version of the fat camp experiment. When analyzing our data we
will calculate the difference between EACH PAIR and then find the mean
difference for all pairs before doing further statistical analysis on the
data. (Remember in the first experiment we found the average weight loss for
each group and THEN found the difference before proceeding on to more
statistical analysis.)
In this type of experiment we use the number of pairs to determine df:
df = Npairs - 1
Our experiment has 42 pairs of fat campers and so our df is 41 and the 95%
CI value corresponding to a df of 41 is 2.021. Our average difference
between the two groups comes out to 4.4 pounds (this indicates that our
experimental group lost more than our control group; a negative number would
indicate the reverse situation). Our SD is 1.5 pounds. To calculate 95% CI
we plug our values into this equation:
95% CI of mean difference = MEAN +/- t * SE of paired differences
SE of paired differences = SD / sqrt(N)
SE = 1.5 / sqrt(41)
= .23
95% CI of mean difference = 4.4 +/- 2.021 * .23
high: 4.86
low: 3.94
Based on our statistical analysis of the data we can conclude that our
treatment made a definite change in the weight of our subjects.
Arrays and Range Operator and Slicing
Consider this sample Perl Script:
#!/usr/bin/perl -w
use strict;
#assignment of array values using range (..) operator
my @a = (1, 5, 9, 10..15, 18, 20..30, 41);
#print slice from array for indicated indexes
print "@a[0,1,4]\n";
#print entire array
print "@a\n";
#assignment using slice of array
@a[0..4]=(100,101,102,103,104);
print "@a\n";
#assignment using slice of array
@a[5,10,15]=(1000,2000,3000);
print "@a\n";
exit;
This script demonstrates the use of slicing and the range operator with
arrays. Now consider this next script.
#!/usr/bin/perl -w
use strict;
my @data = (0..5);
print "FOREACH using array:\n";
foreach my $c (@data){
print "$c\n";
}
print "FOREACH using range:\n";
foreach my $c (0..5){
print "$c\n";
}
exit;
You should play around a bit with arrays, range operators, foreach loops,
and slicing. For instance, can you use a range operator in a print
statement? Can you substitute a string value for an integer value in an
array? Can you use a range operator with the larger number before the dots?
BTW, the length of an array can be stored like this:
$len = @array;
ASSIGNMENT:
Write a PERL script which contains two arrays of the same length (25
subjects each). One
array will contain data for group one (experimental group) and the other
will contain data for group two (control group). You will find the mean
difference between these two groups and then you will calculate the 95% CI
for the difference between these means. Report the raw mean and the high and
low of the confidence interval.
The values in your arrays will represent paired groups. Thus, array1[0] and
array2[0] will contain values for each member of a pair.