Python Basics and Biostatistics

Relative Risk

In this lesson the student will learn how to:

use the Python log function
calculate the 95% CI of the logarithm of the relative risk (RR)
calculate the antilog of a number

By the end of this lesson the student will be able to:


    Write a Python script to find the 95% CI for ln(RR) for any relevant
    data set which is input by the user of the script.

For this and the next assignment we are going to use the following experimental situation as our reference point. Let's say that we want to find out what effect an experimental drug (such as AZT in the 1990s) has on the progression of a disease (such as AIDS). We would want to perform a properly designed study (employing double blinding, randomization, etc.) in which our experimental group received the drug and our control group received a placebo. We would determine a reasonable length for our study (perhaps six months) and report our findings at the end of the study: The first thing we can do with our data is to calculate the proportion of patients in which the disease didn't progress:


                             72
      Experimental Group   ----- = 15%
                            483


                            154
      Control Group        ----- = 31%
                            489

Next, we can calculate the difference between proportions:


    Control - Experimental = 31 - 15 = 16%

So, at this point we can say that our treatment will reduce the incidence of the disease by about 16%. We calculate the 95% CI for this number like this:


       D = P₁ - P₂
       D = 31 - 15 = 16

       SE of DIFF = sqrt(P₁(1-P₁)/N₁ + P₂(1-P₂)/N₂)
       SE of DIFF = sqrt( 31*69/489 + 15*85/483 )
       SE of DIFF = sqrt ( 4.37 + 3.17 ) = sqrt ( 7.54 ) = 2.75

       95% CI of DIFF = D +/- 1.96 * SE of DIFF
       95% CI of DIFF = 16 +/- 1.96 * 2.75
       high: 21.39 
       low: 10.61

So, what we really can say is that we are 95% sure that our treatment will reduce the incidence of the disease by BETWEEN 10.61% and 21.39%.

We can also use these numbers to calculate the relative risk. Here's the basic equation:


   relative risk = p₁ / p₂ = 15 / 31 = .48 = 48%

We can use this number to say that those receiving the treatment were 48% as likely to experience disease progression as those not receiving the treatment. Alternatively, we could calculate relative risk the other way around:


   relative risk = p₂ / p₁ = 31 / 15 = 2.07

In this case we would say that those not receiving the treatment were over twice as likely to experience disease progression as those receiving the treatment.

To find the 95% CI of the relative risk we do the following. (We will go with the first RR calculation of .48 for this example.)


  A = 72
  B = 411
  C = 154
  D = 335
  (see results from above for origin of these numbers)

  95% CI of ln(RR) = ln(RR) +/- 1.96 * sqrt( (B/A)/(A+B) + (D/C)/(D+C))
  95% CI of ln(RR) = 3.87 +/- 1.96 * sqrt ( .012 + .004 )
  95% CI of ln(RR) = 3.87 +/- 1.96 * .13 = 3.87 +/- .25
  high: 4.12
  low: 3.62

  Now we take the antilog of each limit:

  antilog = e^x
  high = e^4.12 = 61.56
  low = e^3.62 = 37.34
  e = 2.71828

This calculation of the CI of a relative risk is not straightforward. Several methods have been developed to calculate an approximate CI. The CI is not symmetrical around the relative risk. This makes sense, as the relative risk can never be less than 0, but it can get very high. However, the CI of the logarithm of the relative risk is approximately symmetrical. That's why this method is used.

The Log Function in Python

Consider this sample script:

#!/usr/bin/python import math n = 0 while(n!='q'): n = raw_input("INPUT NUMBER: ") if (n != 'q') : i = int(n) log = math.log(i) print "log: " + str(log) antilog = 2.71828**log print "antilog: " + str(antilog)

Lucky for us, the Python log function returns the natural log (as opposed to the log base 10) which is exactly what we need for the 95% CI of the RR equation.

Assignment:

Write a Python script to find the 95% CI for ln(RR) for any relevant data set which is input by the user of your script. Create data sets to test your script with.

Your data sets should take this form:


                 Disease
   Treatment	Progressed	No Progression		Total
   --------------------------------------------------------------
    AZT		  72		    411		          483
    Placebo	 154		    335			  489
    Total	 226		    746			  972
   --------------------------------------------------------------

Your script only needs to take input for the non-total values (the inner 2 x 2 table).