Python Basics and Biostatistics

Correlation Coefficient

Back to index
In this lesson the student will learn how to:
  1. use for-loops
  2. calculate the correlation coefficient
By the end of this lesson the student will be able to:

  Write scripts containing simple for-loops

The equation for calculating r (the correlation coefficient) is programmed into many cheap calculators, so you don't really need to know how to calculate it yourself. The equation can be expressed in several ways, here's one:

          summation i to N [ (Xi - mean of X)/ SDx * (Yi - mean of Y) / SDy ]
   r = ------------------------------------------------------------------------
                                 ( N - 1 )

As you will recall:
   
    SD = sqrt( summation i to N (Yi - mean)2 / (N - 1) )

So calculating r requires several steps:
  1. Find the mean of your groups of numbers.
        
        X = ( 3, 5, 6, 7, 10, 12 )
        mean of X = 7.17
    
        Y = ( 5, 6, 7, 9, 10, 13 )
        mean of Y = 8.33
    
    
  2. Calculate the SD for each group.
    
      SDx = [ (3-7.17)2+(5-7.17)2+(6-7.17)2+(7-7.17)2+(10-7.17)2+(12-7.17)2 ] /  5
      SD = 1.48
    
      SDx = [ (5-8.33)2+(6-8.33)2+(7-8.33)2+(9-8.33)2+(10-8.33)2+(13-8.33)2 ] /  5
      SD = 1.32
    
    
  3. Calculate r:
    
      r = [((3-7.17)/1.48*(5-8.33)/1.32)/5+((5-7.17)/1.48*(6-8.33)/1.32)/5+((6-7.17)/1.48*(7-8.33)/1.32)/5+((7-7.17)/1.48*(9-8.33)/1.32)/5+((10-7.17)/1.48*(10-8.33)/1.32)/5+((12-7.17)/1.48*(13-8.33)/1.32)/5 ] / 5
      r = 0.978
    
    

Nonparametric Methods

Statistical methods that do not make assumptions about the distribution of the population are called nonparametric tests. Most nonparametric ideas are based on a simple idea. List the values in order from low to high, and assign each value a rank. Base all further analyses on the ranks. By analyzing ranks rather than values, you don't need to care about the distribution of the population.

PYTHON: For Loops

This is a fairly simple iterative construct:

#!/usr/bin/python for n in range(1,11): print n
This is also pretty straight forward:
#!/usr/bin/python stuff = ("hairpin", 33, "PIG", "lantern", "33-34", "earth") for n in stuff: print n
You can also change the increment in a loop, like this:
#!/usr/bin/python for x in range(10,100,5): print x
One more example:
#!/usr/bin/python s1 = "This is a sentence that is kind of short." for n in range(0,len(s1)): print s1[n]

ASSIGNMENT:

  1. Write a for-loop which outputs the multiples of twelve from twelve to 144
  2. Write a for-loop which outputs the first 15 powers of two
  3. Create a written explanation of how a for-loop could be used to calculate the correlation coefficient