Perl Basics and Biostatistics

Paired Groups

In this lesson the student will learn how to:
  1. use array slicing to manipulate values in an array
  2. use the range operator in the context of an array
  3. calculate the 95% CI of the difference between means for paired groups
By the end of this lesson the student will be able to:

    Write a script which calculates the 95% CI of the difference
    between the means for paired groups based on individual scores
    stored in a pair of arrays.

Let's redesign our fat camp experiment. This time instead of just randomly assigning our chubby campers to either the experimental or control group, we are going to only include pairs of chubby twins or siblings in our study. One member from each pair will be given the experimental treatment and the other will receive the placebo. Otherwise, everything will be as it was in the first version of the fat camp experiment. When analyzing our data we will calculate the difference between EACH PAIR and then find the mean difference for all pairs before doing further statistical analysis on the data. (Remember in the first experiment we found the average weight loss for each group and THEN found the difference before proceeding on to more statistical analysis.)

In this type of experiment we use the number of pairs to determine df:


   df = Npairs - 1

Our experiment has 42 pairs of fat campers and so our df is 41 and the 95% CI value corresponding to a df of 41 is 2.021. Our average difference between the two groups comes out to 4.4 pounds (this indicates that our experimental group lost more than our control group; a negative number would indicate the reverse situation). Our SD is 1.5 pounds. To calculate 95% CI we plug our values into this equation:

   95% CI of mean difference = MEAN +/- t * SE of paired differences

   SE of paired differences = SD / sqrt(N)

   SE = 1.5 / sqrt(41)
      = .23

   95% CI of mean difference = 4.4 +/- 2.021 * .23
   high: 4.86
   low: 3.94

Based on our statistical analysis of the data we can conclude that our treatment made a definite change in the weight of our subjects.

Arrays and Range Operator and Slicing

Consider this sample Perl Script:

#!/usr/bin/perl -w use strict; #assignment of array values using range (..) operator my @a = (1, 5, 9, 10..15, 18, 20..30, 41); #print slice from array for indicated indexes print "@a[0,1,4]\n"; #print entire array print "@a\n"; #assignment using slice of array @a[0..4]=(100,101,102,103,104); print "@a\n"; #assignment using slice of array @a[5,10,15]=(1000,2000,3000); print "@a\n"; exit;
This script demonstrates the use of slicing and the range operator with arrays. Now consider this next script.
#!/usr/bin/perl -w use strict; my @data = (0..5); print "FOREACH using array:\n"; foreach my $c (@data){ print "$c\n"; } print "FOREACH using range:\n"; foreach my $c (0..5){ print "$c\n"; } exit;
You should play around a bit with arrays, range operators, foreach loops, and slicing. For instance, can you use a range operator in a print statement? Can you substitute a string value for an integer value in an array? Can you use a range operator with the larger number before the dots?

BTW, the length of an array can be stored like this:


  $len = @array;

ASSIGNMENT:

Write a PERL script which contains two arrays of the same length (25 subjects each). One array will contain data for group one (experimental group) and the other will contain data for group two (control group). You will find the mean difference between these two groups and then you will calculate the 95% CI for the difference between these means. Report the raw mean and the high and low of the confidence interval.

The values in your arrays will represent paired groups. Thus, array1[0] and array2[0] will contain values for each member of a pair.