Perl Basics and Biostatistics

Positive and Negative Predictive Values

In this lesson the student will learn how to:

calculate positive and negative predictive values
fill out a result matrix based on values for prevalence, specificity, and sensitivity
pass values to subroutines
redirect script output to files

By the end of this lesson the student will be able to:


  Write scripts which will calculate positive and negative predictive
  value and produce output in an HTML table.

Neither the specificity nor sensitivity answer the most important questions: If the test is positive, what is the chance that the patient really has the disease? If the test is negative, what is the chance that the patient really doesn't have the disease? The answers to those questions are quantified by the positive predictive value and negative predictive value:


                                   TP
    Positive Predictive Value = -------
                                TP + FP

                                   TN
    Negative Predictive Value = -------
                                TN + FN

The sensitivity and specificity are properties of the test. In contrast, the postive predictive value and negative predictive value are determined by characteristics of the test and the prevalence of the disease in the population being studied. The lower the prevalence of the disease, the lower the ratio of true positives to false positives.

Consider this two by two matrix which tabulates positive and negative test results against the presence and absence of a disease:

LAB TEST RESULTS: Disease Present Disease Absent Total
Test Positive A D G
Test Negative B E H
Total C F I

In order to fill the numbers in on a matrix like this one we must know three things:

prevalence of disease
sensitivity of the test
specificity of the test

For a population of 100,000 and prevalence of 1 in 100, sensitivity of .82, and specificity of .963, we get:

LAB TEST RESULTS: Disease Present Disease Absent Total
Test Positive 820 3663 4483
Test Negative 180 95337 95517
Total 1000 99000 100,000

To fill this matrix out we perform the following steps:

Given the value for I (the size of your population which is usually a power of ten sufficiently large so that you can deal with whole values), you calculate C and F using the prevalence.
Next, using the value in C, you calculate A and B using the sensitivity value.
Next, using the value in F, you calculate D and E using the specificity value.
G and H can be filled in by adding A + D, and B + E.

Once the matrix is filled out we can calculate the positive and negative predictive values:


  PPV = 820 / 4483 = .18

  NPV = 95337 / 95517 = .99

In other words, if you test positive you can be 18% sure that you have the disease and if you test negative you can be 99% sure that you don't have the disease.

Here's another example. Sometimes the prevalence of a disease is greater for people who have siblings who have the disease. In this case, the prevalence is often 50%. Using the numbers for sensitivity and specificity from above the table values would look like this:

LAB TEST RESULTS: Disease Present Disease Absent Total
Test Positive 41000 1850 42850
Test Negative 9000 48150 57150
Total 50000 50000 100,000

Using these values we can calculate the PPV and the NPV:


  PPV = 41000 / 42850 = .96
  NPV = 48150 / 57150 = .84

Passing Values To Subroutines

Often you want to send specific data to a subroutine. Here's an example showing how this is done:

#!/usr/bin/perl -w print "ENTER NUMBER VALUE: "; $n1 = <STDIN>; chomp($n1); print "ENTER NUMBER VALUE: "; $n2 = <STDIN>; chomp($n2); print "ENTER NUMBER VALUE: "; $n3 = <STDIN>; chomp($n3); &printnum($n1,$n2,$n3); sub printnum{ local($A, $B, $C) = @_; local($total); print "You entered: $A, $B, $C\n"; $total = $A + $B + $C; print "TOTAL: $total\n"; } exit;

Here's another sample script passing values to a subroutine:

#!/usr/bin/perl -w $wc = $cc = $lc = 0; $char = ""; $word = "\\s+"; while($line = <STDIN>){ $cc += &cnt($line, $char); $line =~ s/^\s+|\s+$//g; $wc += &cnt($line, $word); $lc++; } print "TOTALS: $lc lines, $wc words, $cc characters\n"; sub cnt{ local($line, $pattern) = @_; local ($C); @items = split(/$pattern/, $line); $C = @items; } exit;

(Use ^D to end input for this script.)

You can also pass lists like this:

#!/usr/bin/perl -w @flowers = ( "tulip", "rose", "pansy", "carnation", "flax", "lupine", "iris", "nasturtium", "datura" ); @foods = ("taco", "salad", "hamburger", "apple"); &alphalist(@flowers); &alphalist("dog", "cat", "bird", "lizard", "fish", "rabbit"); &alphalist("burrito", "noodles", @foods, "rice", "spaghetti"); sub alphalist{ local(@L) = @_; @L = sort(@L); print "\nLIST: "; foreach $item (@L){ print "$item "; } print "\n"; } exit;

Also consider this slight alteration:

#!/usr/bin/perl -w @flowers = ( "tulip", "rose", "pansy", "carnation", "flax", "lupine", "iris", "nasturtium", "datura" ); @foods = ("taco", "salad", "hamburger", "apple"); &alphalist(@flowers); &alphalist("dog", "cat", "bird", "lizard", "fish", "rabbit"); &alphalist("burrito", "noodles", @foods, "rice", "spaghetti"); sub alphalist{ local($f, @L) = @_; @L = sort(@L); print "FIRST ITEM: $f\n"; print "LIST: "; foreach $item (@L){ print "$item "; } print "\n"; } exit;

ASSIGNMENT:

Do the following:

Write a script which will calculate the positive predictive value and the negative predictive value given input for TP, TN, FP, and FN. Use separate subroutines for each calculation and pass these values in as an array to each subroutine.
Write a script which takes input for nine values. Write a subroutine which displays these values in a 3x3 matrix as shown in the first section of this lesson. Your subroutine will take the nine values in as a single array. Further your output should take the form of a web page and your matrix should be an HTML table which uses the same labels as shown above.

LAB TEST RESULTS:	Disease Present	Disease Absent	Total
Test Positive	A	D	G
Test Negative	B	E	H
Total	C	F	I