Write a Python script to find the 95% CI for ln(RR) for any relevant data set which is input by the user of the script.
72 Experimental Group ----- = 15% 483 154 Control Group ----- = 31% 489Next, we can calculate the difference between proportions:
Control - Experimental = 31 - 15 = 16%So, at this point we can say that our treatment will reduce the incidence of the disease by about 16%. We calculate the 95% CI for this number like this:
D = P1 - P2 D = 31 - 15 = 16 SE of DIFF = sqrt(P1(1-P1)/N1 + P2(1-P2)/N2) SE of DIFF = sqrt( 31*69/489 + 15*85/483 ) SE of DIFF = sqrt ( 4.37 + 3.17 ) = sqrt ( 7.54 ) = 2.75 95% CI of DIFF = D +/- 1.96 * SE of DIFF 95% CI of DIFF = 16 +/- 1.96 * 2.75 high: 21.39 low: 10.61So, what we really can say is that we are 95% sure that our treatment will reduce the incidence of the disease by BETWEEN 10.61% and 21.39%.
We can also use these numbers to calculate the relative risk. Here's the basic equation:
relative risk = p1 / p2 = 15 / 31 = .48 = 48%We can use this number to say that those receiving the treatment were 48% as likely to experience disease progression as those not receiving the treatment. Alternatively, we could calculate relative risk the other way around:
relative risk = p2 / p1 = 31 / 15 = 2.07In this case we would say that those not receiving the treatment were over twice as likely to experience disease progression as those receiving the treatment.
To find the 95% CI of the relative risk we do the following. (We will go with the first RR calculation of .48 for this example.)
A = 72 B = 411 C = 154 D = 335 (see results from above for origin of these numbers) 95% CI of ln(RR) = ln(RR) +/- 1.96 * sqrt( (B/A)/(A+B) + (D/C)/(D+C)) 95% CI of ln(RR) = 3.87 +/- 1.96 * sqrt ( .012 + .004 ) 95% CI of ln(RR) = 3.87 +/- 1.96 * .13 = 3.87 +/- .25 high: 4.12 low: 3.62 Now we take the antilog of each limit: antilog = ex high = e4.12 = 61.56 low = e3.62 = 37.34 e = 2.71828This calculation of the CI of a relative risk is not straightforward. Several methods have been developed to calculate an approximate CI. The CI is not symmetrical around the relative risk. This makes sense, as the relative risk can never be less than 0, but it can get very high. However, the CI of the logarithm of the relative risk is approximately symmetrical. That's why this method is used.
The Log Function in Python
Consider this sample script:
Assignment:
Your data sets should take this form:
Disease Treatment Progressed No Progression Total -------------------------------------------------------------- AZT 72 411 483 Placebo 154 335 489 Total 226 746 972 --------------------------------------------------------------Your script only needs to take input for the non-total values (the inner 2 x 2 table).