Python Basics and Biostatistics

Probabilities v. Odds

In this lesson the student will learn how to:
  1. explain the difference between probability and odds
  2. convert from odds to probability and vice versa
  3. calculate the odds ratio
  4. use split to convert a string into a list of integers
  5. sort odd and even numbers using the modulus operator
By the end of this lesson the student will be able to:

       Write a Python script to calculate the 95% CI of the Odds 
       Ratio based on information presented in a contingency table

Consider the following contingency table:

                 Disease
   Treatment	Progressed	No Progression		Total
   --------------------------------------------------------------
    AZT		  72		    411		          483
    Placebo	 154		    335			  489
    Total	 226		    746			  972
   --------------------------------------------------------------

A contingency table is a common way of presenting data. The rows and columns can have different meanings. Each cell in the table contains the number of subjects that are classified as part of one particular column and row.

Probabilities and Odds

Probabilities and odds are two different ways of expressing the same concept. Probabilities can be converted to odds, and odds can be converted to probabilities. The probability that an event will occur is the fraction of times you expect to see that event in many trials. The odds are defined as the probability that the event will occur divided by the probability that the event will not occur. Here are the conversion formulas for probability and odds:



      Odds = probability / ( 1 - probability )

      Probability = Odds / ( 1 + odds )

For example, if we flip two coins the chances of getting two tails is 25% since there are four possible outcomes and only one of those outcomes is the one specified. The odds of getting two tails then are 25:75 which can be simplified to 1:3 or one to three or .33. Here are the relationships expressed as equations:

     Probability = target outcome / possible outcomes

     P of 2 tails = 1 / 4 = .25

     Odds = P / (1 - P)

     Odds of 2 tails = .25 / .75  = .33

Probability values always range from 0 to 1.0. Odds range from 0 to infinity and interestingly as probability values go from .5 to 1.0, odds go from 1 to infinity.

Inspecting the contingency table from above we can make the following statements:

  1. The probability of the disease progressing for a patient receiving AZT is 72/483 or 15%.
  2. The odds of the disease progressing for a patient receiving AZT are 72/411 or 18%.
  3. The probability of the disease progressing for a patient not receiving AZT is 154/489 or 31%.
  4. The odds of the disease progressing for a patient not receiving AZT are 154/335 or 46%.
The Odds Ratio:

From the odds information we can calculate what is known as the odds ratio:


   Odds ratio = ( odds AZT group / odds no AZT group ) = .18 / .46 = .39

So we can say, compared to control patients, the odds of disease progression in AZT-treated subjects is .18/.46 which equals .39. In other words, the odds of disease progression in AZT-treated subjects is about two-fifths that of control patients.

As with the relative risk, the CI of the odds ratio is not symmetrical. This makes sense as the odds ratio cannot be negative but can be any positive number. The asymmetry is especially noticeable when the odds ratio is low. Several methods can be used to approximate the CI of the odds ratio. Here's the one we use:


  A = 72
  B = 411
  C = 154
  D = 335
  OR = .39
  95% CI of ln(OR) = ln(OR)+/-1.96*sqrt(1/A + 1/B + 1/C + 1/D)
  Take the antilogarithm of both values to obtain 95% CI of the OR.

  95% CI of ln(OR) = ln(.39) +/- 1.96*sqrt(1/72 + 1/411 + 1/154 + 1/335)
  high =-0.942 + 1.96 * sqrt(.025) = -0.942 + 1.96 * .158
       = -0.942 + .31 = -0.632
  low  = -0.942 - .31 = -1.252

  95% CI of OR:

  high = antilog(-0.632) = .53
  low  = antilog(-1.252) = .29

So, this means that we can be 95% sure that the true odds ratio is somewhere between .29 and .53.

More Array Stuff in Python

Consider this script which takes input from the command line and then stores it in an array:

#!/usr/bin/python n = raw_input("INPUT LIST OF INTEGER VALUES: ") print n #split into an array of integer values #using a list comprehension technique #program will crash if non-integers entered ra = [int(x) for x in n.split()] print ra
You should take a close look at the list comprehension technique used to convert a single string into a list of integer values. It takes the output of the split function and converts each item to an integer value. Obviously there will a problem if any of the individual members of the string entered by the user are not actually integers.

Now inspect the behavior of the for-in loop shown in this script:

#!/usr/bin/python items = [ 7, 8, 9, 12, 222, 333, 44, 55, 651, 778, 881 ] for x in items: if x%2==0: print "EVEN: " + str(x) else: print "ODD: " + str(x)
The modulus operator returns the remainder of a division problem. When dividing by two the remainder is either 0 or 1. Odd numbers divided by two produce a remainder of 1 and even numbers produce no remainder (or a remainder of zero). Thus, a number modulus 2 will return 1 if the number is odd and 0 if the number is even.

ASSIGNMENT:

Write a Python script to calculate the 95% CI of the Odds Ratio based on information presented in a contingency table. Create at least three contingency tables with which to test your script. Your script should report the raw contingency ratio as well as the high and low of the 95% CI for it.