Python Basics and Biostatistics

Positive and Negative Predictive Values

In this lesson the student will learn how to:

calculate positive and negative predictive values
fill out a result matrix based on values for prevalence, specificity, and sensitivity
pass values to functions
redirect script output to files

By the end of this lesson the student will be able to:


  Write scripts which will calculate positive and negative predictive
  value and produce output in an HTML table.

Neither the specificity nor sensitivity answer the most important questions: If the test is positive, what is the chance that the patient really has the disease? If the test is negative, what is the chance that the patient really doesn't have the disease? The answers to those questions are quantified by the positive predictive value and negative predictive value:


                                   TP
    Positive Predictive Value = -------
                                TP + FP

                                   TN
    Negative Predictive Value = -------
                                TN + FN

The sensitivity and specificity are properties of the test. In contrast, the postive predictive value and negative predictive value are determined by characteristics of the test and the prevalence of the disease in the population being studied. The lower the prevalence of the disease, the lower the ratio of true positives to false positives.

Consider this two by two matrix which tabulates positive and negative test results against the presence and absence of a disease:

LAB TEST RESULTS: Disease Present Disease Absent Total
Test Positive A D G
Test Negative B E H
Total C F I

In order to fill the numbers in on a matrix like this one we must know three things:

prevalence of disease
sensitivity of the test
specificity of the test

For a population of 100,000 and prevalence of 1 in 100, sensitivity of .82, and specificity of .963, we get:

LAB TEST RESULTS: Disease Present Disease Absent Total
Test Positive 820 3663 4483
Test Negative 180 95337 95517
Total 1000 99000 100,000

To fill this matrix out we perform the following steps:

Given the value for I (the size of your population which is usually a power of ten sufficiently large so that you can deal with whole values), you calculate C and F using the prevalence.
Next, using the value in C, you calculate A and B using the sensitivity value.
Next, using the value in F, you calculate D and E using the specificity value.
G and H can be filled in by adding A + D, and B + E.

Once the matrix is filled out we can calculate the positive and negative predictive values:


  PPV = 820 / 4483 = .18

  NPV = 95337 / 95517 = .99

In other words, if you test positive you can be 18% sure that you have the disease and if you test negative you can be 99% sure that you don't have the disease.

Here's another example. Sometimes the prevalence of a disease is greater for people who have siblings who have the disease. In this case, the prevalence is often 50%. Using the numbers for sensitivity and specificity from above the table values would look like this:

LAB TEST RESULTS: Disease Present Disease Absent Total
Test Positive 41000 1850 42850
Test Negative 9000 48150 57150
Total 50000 50000 100,000

Using these values we can calculate the PPV and the NPV:


  PPV = 41000 / 42850 = .96
  NPV = 48150 / 57150 = .84

Passing Values To Subroutines

Often you want to send specific data to a function. Here's an example showing how this is done:

#!/usr/bin/python def print_num(x,y,z): print "YOU ENTERED: " + x + ", " + y + ", " + z total = int(x) + int(y) + int(z) print "TOTAL: " + str(total) a = raw_input("ENTER NUMBER VALUE ONE: ") b = raw_input("ENTER NUMBER VALUE TWO: ") c = raw_input("ENTER NUMBER VALUE THREE:") print_num(a,b,c)

You can also pass lists like this:

#!/usr/bin/python import re flowers = [ "rose", "tulip", "lilac", "iris", "flax", "lupine", "datura", "pansy", "petunia", "daisy" ] def shorthand(stff): for x in stff: replaced = re.sub('[aeiou]','',x) print replaced shorthand(flowers)

NOTE: re stands for Regular Expression and is a way of defining a pattern or group of patterns. In this case any vowel is eliminated, but for the sake of this assignment you don't need to understand how to work with regular expressions. Just make sure you know how to define and use functions.

ASSIGNMENT:

Do the following:

Write a script which will calculate the positive predictive value and the negative predictive value given input for TP, TN, FP, and FN. Use separate functions for each calculation and pass these values in as an array to each function.
Write a script which takes input for nine values. Write a function which displays these values in a 3x3 matrix as shown in the first section of this lesson. Your function will take the nine values in as a single array. Further your output should take the form of a web page and your matrix should be an HTML table which uses the same labels as shown above.

LAB TEST RESULTS:	Disease Present	Disease Absent	Total
Test Positive	A	D	G
Test Negative	B	E	H
Total	C	F	I