Python Basics and Biostatistics

Numeric Variables

In this lesson the student will learn how to:

Store values in variables
Calculate the arithmetic mean
Use the four basic arithmetic operators in a PYTHON script
Name and identify scalar variables

By the end of this lesson the student will be able to:


  Write a PYTHON script which calculates the mean
  of a group of numbers.

Most computer languages provide a way for the programmer to store values. Python provides a type of variable called a numeric variable as it's simplest type of variable. As just mentioned, variables are used to store values. You can name variables almost anything you want, but there are certain limits. Here are a few sample numeric variable declarations:

val = 88 v2 = 99 abc123 = 444

These are all valid names for variables which can contain numeric values. Actually, it is perfectly permissible for variables such as these to contain values such as:

xval = 3.44 #floating-point value hotdog = 3.14e2 #floating-point value multiplied by 10^2 cold = 0x9aa #hex value

Numeric variables contain numeric values and cannot contain other types of values. We will take a look at other types of values in future lessons. For now, inspect the following Python script:

#!/usr/bin/python v1 = 2 v2 = 1 answer = v1 + v2 print str(v1) + " + " + str(v2) + " = " + str(answer)

Run this script. (Remember to make it executable by typing: chmod +x nvals.py. Assuming, of course, that you named it nvals.py.) For the most part this script should be easy to understand. However, you might have a couple questions about the last line. So, let's take a closer look at it.

print str(v1) + " + " + str(v2) + " = " + str(answer)

First of all, the print command simply tells Python to print the line in question. In this script the output will appear at the command prompt. The rest of the line is what will be printed, but the complication here is that we want to include a plus sign and an equals sign along with the numbers. The values included inside quotation marks are known as string values and in order to get string values and numeric values to print together in the same line, numeric values must be converted to string values using the str() function. If all went well, the following output is what you observed when you ran this script.

2 + 1 = 3

In PYTHON we can also subtract, divide, and multiply. (We can do quite a few other things, but we'll save that stuff for later.) Here's a script which subtracts, divides, and multiplies.

#!/usr/bin/python v1 = 22 v2 = 13 add = v1 + v2 sub = v1 - v2 mul = v1 * v2 div = v1 / v2 print str(v1) + " + " + str(v2) + " = " + str(add) print str(v1) + " - " + str(v2) + " = " + str(sub) print str(v1) + " * " + str(v2) + " = " + str(mul) print str(v1) + " / " + str(v2) + " = " + str(div)

Also make sure that you understand the difference between this line: sum = v1 + v2 And one that looks like this: print str(v1) + " + " + str(v2) + " = " + str(sum) The first line actually stores a value (in this case, the sum of the values stored in v1 and v2 are stored in sum). The second line merely prints out the contents of the variables, v1, v2, and sum, in a formatted manner. When plus signs are used to join together strings, they no longer perform an arithmetic operation. Instead they are used to perform an operation known as concatenation. For instance, the concatenation of "thunder" and "struck" is "thunderstruck".

The Arithmetic Mean

Hopefully you already know how to calculate the arithmetic mean, but if you don't we'll go through the steps in just a moment. First, we should discuss the importance of an arithmetic mean. Often we refer to the arithmetic mean as the average of a group of values. For instance, we could weigh all the students in the seventh grade class and from this set of data calculate a mean weight for the class. Alternately, we could be dealing with people suffering from some physical condition which we treat with some new medication. In our study to determine the effectiveness of our new treatment we could collect data on how long it took each person to recover from the physical condition after beginning treatment with our new medication. We could then calculate an average time to recovery for this group of people using our medication. The mean is a handy way of summarizing data.

Remember the weight of living organisms and the time it takes a chemical or biolgical compound to have an effect on an organism are both examples of biological data. Keep in mind that we are discussing how to analyze biological data. One of the most basic ways to begin such analysis is to calculate the mean for the data you've gathered.

If someone asks, "How much do seventh graders weigh?" the most meaningful answer is probably the mean weight of seventh graders. If we weigh forty seventh graders there is a good chance that the heaviest one will weigh somewhere around 140 pounds and the lightest one will weigh around 70 pounds. As you can see the heaviest one is likely to be twice as heavy as the lightest one. The average weight of seventh graders is probably going to be somewhere around 98 pounds (which is not exactly half way between 70 and 140). (NOTE: In the case of seventh graders it would probably be appropriate to report separate means for each gender.)

Likewise, in the case of reporting the time to recovery after beginning treatment with some therapeutic agent, it makes the most sense to report the average time to recovery. For instance, you might have one patient recover in 13 days after beginning treatment and another not recover until 31 days after starting treatment (and some patients might not respond at all), but the most meaningful single statistic is the mean time to recovery which might be something like 26 days (which is definitely not exactly half way between 13 and 31).

We have limited this discussion to the arithmetic mean. Obviously, the mean or average is not the only statistic we can generate from a data set. We will discuss more statistics in upcoming lessons.

Calculating the Mean

To calculate the mean we simply add up all the values in our data set and then divide this total by the total number of values in the set. This can be summarized like this:

MEAN = sum_of_all_values / number_of_values We can add up a bunch of values in PYTHON like this:

total = a + b + c + d + e + f

In this case we have only six values and so we could calculate the mean like this:

mean = total / 6

One lasts thing you should know before getting to the assignment is how to force integer values to behave as floating-point values. As a quick experiment run this simple PYTHON script:

#!/usr/bin/python v1 = 22 v2 = 13 div = v1 / v2 print str(v1) + " / " + str(v2) + " = " + str(div)

You will notice that the answer produced is 1. This is normal behavior when we are doing integer division. In order to get PYTHON to store the answer as a floating-point value we need to make one small change:

#!/usr/bin/python v1 = 22 v2 = 13 div = float(v1) / v2 print str(v1) + " / " + str(v2) + " = " + str(div)

There are actually a couple other ways to force floating-point division, but this is the most explicit. Having employed the float() function, we now derive the following output: 22 / 13 = 1.69230769231

ASSIGNMENT:

Write a PYTHON script which calculates the average for twelve values. Your values must range between 60 and 110 and be more or less (approximately) evenly distributed within this range. Format your output like this:

TOTAL: 857 NUMBER OF VALUES: 10 MEAN: 85.7 All values must be stored in variables and you must use a minimum of fifteen variables in your script.

Main Index