Perl Basics and Biostatistics

Positive and Negative Predictive Values

In this lesson the student will learn how to:
  1. calculate positive and negative predictive values
  2. fill out a result matrix based on values for prevalence, specificity, and sensitivity
  3. pass values to subroutines
  4. redirect script output to files
By the end of this lesson the student will be able to:

  Write scripts which will calculate positive and negative predictive
  value and produce output in an HTML table.

Neither the specificity nor sensitivity answer the most important questions: If the test is positive, what is the chance that the patient really has the disease? If the test is negative, what is the chance that the patient really doesn't have the disease? The answers to those questions are quantified by the positive predictive value and negative predictive value:

                                   TP
    Positive Predictive Value = -------
                                TP + FP

                                   TN
    Negative Predictive Value = -------
                                TN + FN

The sensitivity and specificity are properties of the test. In contrast, the postive predictive value and negative predictive value are determined by characteristics of the test and the prevalence of the disease in the population being studied. The lower the prevalence of the disease, the lower the ratio of true positives to false positives.

Consider this two by two matrix which tabulates positive and negative test results against the presence and absence of a disease:

LAB TEST RESULTS:Disease PresentDisease AbsentTotal
Test PositiveADG
Test NegativeBEH
TotalCFI

In order to fill the numbers in on a matrix like this one we must know three things:

  1. prevalence of disease
  2. sensitivity of the test
  3. specificity of the test
For a population of 100,000 and prevalence of 1 in 100, sensitivity of .82, and specificity of .963, we get:

LAB TEST RESULTS:Disease PresentDisease AbsentTotal
Test Positive82036634483
Test Negative1809533795517
Total100099000100,000

To fill this matrix out we perform the following steps:

  1. Given the value for I (the size of your population which is usually a power of ten sufficiently large so that you can deal with whole values), you calculate C and F using the prevalence.
  2. Next, using the value in C, you calculate A and B using the sensitivity value.
  3. Next, using the value in F, you calculate D and E using the specificity value.
  4. G and H can be filled in by adding A + D, and B + E.
Once the matrix is filled out we can calculate the positive and negative predictive values:

  PPV = 820 / 4483 = .18

  NPV = 95337 / 95517 = .99

In other words, if you test positive you can be 18% sure that you have the disease and if you test negative you can be 99% sure that you don't have the disease.

Here's another example. Sometimes the prevalence of a disease is greater for people who have siblings who have the disease. In this case, the prevalence is often 50%. Using the numbers for sensitivity and specificity from above the table values would look like this:

LAB TEST RESULTS:Disease PresentDisease AbsentTotal
Test Positive41000185042850
Test Negative90004815057150
Total5000050000100,000
Using these values we can calculate the PPV and the NPV:


  PPV = 41000 / 42850 = .96
  NPV = 48150 / 57150 = .84

Passing Values To Subroutines

Often you want to send specific data to a subroutine. Here's an example showing how this is done:

#!/usr/bin/perl -w print "ENTER NUMBER VALUE: "; $n1 = <STDIN>; chomp($n1); print "ENTER NUMBER VALUE: "; $n2 = <STDIN>; chomp($n2); print "ENTER NUMBER VALUE: "; $n3 = <STDIN>; chomp($n3); &printnum($n1,$n2,$n3); sub printnum{ local($A, $B, $C) = @_; local($total); print "You entered: $A, $B, $C\n"; $total = $A + $B + $C; print "TOTAL: $total\n"; } exit;
Here's another sample script passing values to a subroutine:
#!/usr/bin/perl -w $wc = $cc = $lc = 0; $char = ""; $word = "\\s+"; while($line = <STDIN>){ $cc += &cnt($line, $char); $line =~ s/^\s+|\s+$//g; $wc += &cnt($line, $word); $lc++; } print "TOTALS: $lc lines, $wc words, $cc characters\n"; sub cnt{ local($line, $pattern) = @_; local ($C); @items = split(/$pattern/, $line); $C = @items; } exit;
(Use ^D to end input for this script.)

You can also pass lists like this:

#!/usr/bin/perl -w @flowers = ( "tulip", "rose", "pansy", "carnation", "flax", "lupine", "iris", "nasturtium", "datura" ); @foods = ("taco", "salad", "hamburger", "apple"); &alphalist(@flowers); &alphalist("dog", "cat", "bird", "lizard", "fish", "rabbit"); &alphalist("burrito", "noodles", @foods, "rice", "spaghetti"); sub alphalist{ local(@L) = @_; @L = sort(@L); print "\nLIST: "; foreach $item (@L){ print "$item "; } print "\n"; } exit;
Also consider this slight alteration:
#!/usr/bin/perl -w @flowers = ( "tulip", "rose", "pansy", "carnation", "flax", "lupine", "iris", "nasturtium", "datura" ); @foods = ("taco", "salad", "hamburger", "apple"); &alphalist(@flowers); &alphalist("dog", "cat", "bird", "lizard", "fish", "rabbit"); &alphalist("burrito", "noodles", @foods, "rice", "spaghetti"); sub alphalist{ local($f, @L) = @_; @L = sort(@L); print "FIRST ITEM: $f\n"; print "LIST: "; foreach $item (@L){ print "$item "; } print "\n"; } exit;

ASSIGNMENT:

Do the following:

  1. Write a script which will calculate the positive predictive value and the negative predictive value given input for TP, TN, FP, and FN. Use separate subroutines for each calculation and pass these values in as an array to each subroutine.
  2. Write a script which takes input for nine values. Write a subroutine which displays these values in a 3x3 matrix as shown in the first section of this lesson. Your subroutine will take the nine values in as a single array. Further your output should take the form of a web page and your matrix should be an HTML table which uses the same labels as shown above.