Perl Basics and Biostatistics

Numeric Variables

In this lesson the student will learn how to:
  1. Store values in variables
  2. Calculate the arithmetic mean
  3. Use the four basic arithmetic operators in a PERL script
  4. Name and identify scalar variables
By the end of this lesson the student will be able to:

  Write a PERL script which calculates the mean
  of a group of numbers.

Most computer languages provide a way for the programmer to store values. Perl provides something called a scalar variable as it's simplest type of variable. As just mentioned, variables are used to store values. You can name variables almost anything you want, but there are certain limits. Here are a few sample variable declarations:
$val = 88; $v2 = 99; $abc123 = 444;
These are all valid names for variables which can contain numeric values. Actually, it is perfectly permissible for variables such as these to contain values such as:
$xval = 3.44; $hotdog = "How about some fries?"; $cold = "32 degrees";
Variables whose name begins with a dollar sign are referred to as scalar variables. We will use them to hold numeric values in this assignment. Consider the following short script:
#!/usr/bin/perl $v1 = 2; $v2 = 1; $answer = $v1 + $v2; print "$v1 + $v2 = $answer\n"; exit;
Run this script. It should be fairly clear what it does and why it does it.

In PERL we can also subtract, divide, and multiply. (We can do quite a few other things, but we'll save that stuff for later.) Here's a script which subtracts, divides, and multiplies.

#!/usr/bin/perl $v1 = 22; $v2 = 13; $add = $v1 + $v2; $sub = $v1 - $v2; $mul = $v1 * $v2; $div = $v1 / $v2; print "$v1 + $v2 = $add\n"; print "$v1 - $v2 = $sub\n"; print "$v1 * $v2 = $mul\n"; print "$v1 / $v2 = $div\n"; exit;
Make sure you understand the difference between a line which looks like this: $add = $v1 + $v2; And one that looks like this: print "$v1 + $v2 = $add\n"; The first line actually stores a value (in this case, the sum of the values stored in $v1 and $v2 are stored in $add). The second line merely prints out the contents of the variables, $v1, $v2, and $add, in a formatted manner.

The Arithmetic Mean

Hopefully you already know how to calculate the arithmetic mean, but if you don't we'll go through the steps in just a moment. First, we should discuss the importance of an arithmetic mean. Often we refer to the arithmetic mean as the average of a group of values. For instance, we could weigh all the students in the seventh grade class and from this set of data calculate a mean weight for the class. Alternately, we could be dealing with people suffering from some physical condition which we treat with some new medication. In our study to determine the effectiveness of our new treatment we could collect data on how long it took each person to recover from the physical condition after beginning treatment with our new medication. We could then calculate an average time to recovery for this group of people using our medication. The mean is a handy way of summarizing data.

Remember the weight of living organisms and the time it takes a chemical or biolgical compound to have an effect on an organism are both examples of biological data. Keep in mind that we are discussing how to analyze biological data. One of the most basic ways to begin such analysis is to calculate the mean for the data you've gathered.

If someone asks, "How much do seventh graders weigh?" the most meaningful answer is probably the mean weight of seventh graders. If we weigh 40 seventh graders there is a good chance that the heaviest one will weigh somewhere around 140 pounds and the lightest one will weigh around 70 pounds. As you can see the heaviest one is likely to be twice as heavy as the lightest one. The average weight of seventh graders is probably going to be somewhere around 98 pounds (which is not exactly half way between 70 and 140). (NOTE: In the case of seventh graders it would probably be appropriate to report separate means for each gender.)

Likewise, in the case of reporting the time to recovery after beginning treatment with some therapeutic agent, it makes the most sense to report the average time to recovery. For instance, you might have one patient recover in 13 days after beginning treatment and another not recover until 31 days after starting treatment (and some patients might not respond at all), but the most meaningful single statistic is the mean time to recovery which might be something like 26 days (which is definitely not exactly half way between 13 and 31).

We have limited this discussion to the arithmetic mean. Obviously, the mean or average is not the only statistic we can generate from a data set. We will discuss more statistics in upcoming lessons.

Calculating the Mean

To calculate the mean we simply add up all the values in our data set and then divide this total by the total number of values in the set. This can be summarized like this:

MEAN = sum_of_all_values / number_of_values We can add up a bunch of values in PERL like this:
$total = $a + $b + $c + $d + $e + $f;
In this case we have only six values and so we could calculate the mean like this:
$mean = $total / 6;

ASSIGNMENT:

Write a PERL script which calculates the average for twelve values. Your values must range between 60 and 110 and be more or less (approximately) evenly distributed within this range. Format your output like this:

TOTAL: 857 NUMBER OF VALUES: 10 MEAN: 85.7