Random Sampling

In this lesson the student will learn how to:

explain the statistical reliability of random sampling
redirect script output to a file
push scalar values onto a list
generate random numbers

By the end of this lesson the student will be able to:


  Use a script to perform a simple statistical
  experiment.

To help us discuss these questions, let's create two identical groups of subjects:

G1 = ( 88, 90, 79, 82, 99, 105, 91, 86, 74, 101, 80, 96 ). G2 = ( 88, 90, 79, 82, 99, 105, 91, 86, 74, 101, 80, 96 ).

G1 Sample = ( 90, 82, 105, 74, 80 ). Sample Mean: 86 G2 Sample = ( 88, 82, 99, 101, 86 ). Sample Mean: 91

NOTE: sample means rounded to nearest whole

The actual average for each group is 89. So, you can see that random sampling doesn't guarantee that our samples will be very representative of our actual groups. In fact, we could wind up with random samples like this:

G1 Sample = ( 99, 105, 91, 101, 96 ) Sample Mean: 98 G2 Sample = ( 79, 82, 74, 80, 86 ) Sample Mean: 80

Random Sampling Script

Consider the following Perl script which illustrates the reliability of random sampling:

#!/usr/bin/perl -w use strict; my @pop = (62, 65, 66, 72, 80, 75, 67, 73, 79, 69, 70, 64, 64, 65, 67, 81, 61, 82, 68, 59, 83, 77, 73, 74, 66, 67, 77, 78, 82, 80, 73, 68, 67, 73, 84, 58, 75, 76, 72, 68); my $actual_average = 0; my $num = @pop; my $tally = 0; foreach my $n (@pop){ $tally+=$n; } $actual_average = $tally/$num; print "ACTUAL AVERAGE: $actual_average\n"; srand(time); my $s1 = 0; my @g1 = (); my $i = 0; #index $tally = 0; foreach my $r (0..4){ $i = int rand($num); $tally += $pop[$i]; push(@g1,$pop[$i]); } $s1 = $tally/5; print "SAMPLE: @g1, SAMPLE AVERAGE: $s1\n"; my $s2 = 0; my @g2 = (); $tally = 0; foreach my $r (0..4){ $i = int rand($num); $tally += $pop[$i]; push(@g2,$pop[$i]); } $s2 = $tally/5; print "SAMPLE: @g2, SAMPLE AVERAGE: $s2\n"; exit;

Notice the use of push to add an item to an array. This is a very useful trick in a lot of circumstances. One particularly useful context for this trick is when you are creating tables in a CGI script (something we'll discuss in unit four).

Also notice the use of srand(time) and the int rand($num) construct. The srand function seeds the random number generator with whatever input it is given. In this case we seed the random number generator with the current time (which is returned by the time function). The actual random number is generated by int rand($num). The int function converts the number returned by the rand function to an integer value. The rand function is given a number representing the size of our data array.

ASSIGNMENT:

Present the results to this activity on a web page.

Run the Perl script shown above ten times. Redirect the output to a file so that it can be easily reformatted into an HTML page. To redirect you do this: ./perlScript.pl >> output_file.txt Once you have ten sets of output, calculate an average for the GROUP ONES and the GROUP TWOS. Present this information on a well-organized and easily readable web page.

Perl Basics and Biostatistics

Random Sampling