Perl Basics and Biostatistics
Positive and Negative Predictive Values
In this lesson the student will learn how to:
- calculate positive and negative predictive values
- fill out a result matrix based on values for prevalence, specificity,
and sensitivity
- pass values to subroutines
- redirect script output to files
By the end of this lesson the student will be able to:
Write scripts which will calculate positive and negative predictive
value and produce output in an HTML table.
Neither the specificity nor sensitivity answer the most important questions:
If the test is positive, what is the chance that the patient really has the
disease? If the test is negative, what is the chance that the patient really
doesn't have the disease? The answers to those questions are quantified by
the positive predictive value and negative predictive value:
TP
Positive Predictive Value = -------
TP + FP
TN
Negative Predictive Value = -------
TN + FN
The sensitivity and specificity are properties of the test. In contrast, the
postive predictive value and negative predictive value are determined by
characteristics of the test and the prevalence of the disease in the
population being studied. The lower the prevalence of the disease, the lower
the ratio of true positives to false positives.
Consider this two by two matrix which tabulates positive and negative test
results against the presence and absence of a disease:
LAB TEST RESULTS: | Disease Present | Disease Absent | Total |
Test Positive | A | D | G |
Test Negative | B | E | H |
Total | C | F | I |
In order to fill the numbers in on a matrix like this one we must know three
things:
- prevalence of disease
- sensitivity of the test
- specificity of the test
For a population of 100,000 and prevalence of 1 in 100, sensitivity of .82, and
specificity of .963, we get:
LAB TEST RESULTS: | Disease Present | Disease Absent | Total |
Test Positive | 820 | 3663 | 4483 |
Test Negative | 180 | 95337 | 95517 |
Total | 1000 | 99000 | 100,000 |
To fill this matrix out we perform the following steps:
- Given the value for I (the size of your population which is usually a
power of ten sufficiently large so that you can deal with whole values), you
calculate C and F using the prevalence.
- Next, using the value in C, you calculate A and B using the sensitivity
value.
- Next, using the value in F, you calculate D and E using the specificity
value.
- G and H can be filled in by adding A + D, and B + E.
Once the matrix is filled out we can calculate the positive and negative
predictive values:
PPV = 820 / 4483 = .18
NPV = 95337 / 95517 = .99
In other words, if you test positive you can be 18% sure that you have the
disease and if you test negative you can be 99% sure that you don't have the
disease.
Here's another example. Sometimes the prevalence of a disease is greater for
people who have siblings who have the disease. In this case, the prevalence
is often 50%. Using the numbers for sensitivity and specificity from above
the table values would look like this:
LAB TEST RESULTS: | Disease Present | Disease Absent | Total |
Test Positive | 41000 | 1850 | 42850 |
Test Negative | 9000 | 48150 | 57150 |
Total | 50000 | 50000 | 100,000 |
Using these values we can calculate the PPV and the NPV:
PPV = 41000 / 42850 = .96
NPV = 48150 / 57150 = .84
Passing Values To Subroutines
Often you want to send specific data to a subroutine. Here's an example
showing how this is done:
#!/usr/bin/perl -w
print "ENTER NUMBER VALUE: ";
$n1 = ;
chomp($n1);
print "ENTER NUMBER VALUE: ";
$n2 = ;
chomp($n2);
print "ENTER NUMBER VALUE: ";
$n3 = ;
chomp($n3);
&printnum($n1,$n2,$n3);
sub printnum{
local($A, $B, $C) = @_;
local($total);
print "You entered: $A, $B, $C\n";
$total = $A + $B + $C;
print "TOTAL: $total\n";
}
exit;
Here's another sample script passing values to a subroutine:
#!/usr/bin/perl -w
$wc = $cc = $lc = 0;
$char = "";
$word = "\\s+";
while($line = ){
$cc += &cnt($line, $char);
$line =~ s/^\s+|\s+$//g;
$wc += &cnt($line, $word);
$lc++;
}
print "TOTALS: $lc lines, $wc words, $cc characters\n";
sub cnt{
local($line, $pattern) = @_;
local ($C);
@items = split(/$pattern/, $line);
$C = @items;
}
exit;
(Use ^D to end input for this script.)
You can also pass lists like this:
#!/usr/bin/perl -w
@flowers = ( "tulip", "rose", "pansy", "carnation", "flax",
"lupine", "iris", "nasturtium", "datura" );
@foods = ("taco", "salad", "hamburger", "apple");
&alphalist(@flowers);
&alphalist("dog", "cat", "bird", "lizard", "fish", "rabbit");
&alphalist("burrito", "noodles", @foods, "rice", "spaghetti");
sub alphalist{
local(@L) = @_;
@L = sort(@L);
print "\nLIST: ";
foreach $item (@L){
print "$item ";
}
print "\n";
}
exit;
Also consider this slight alteration:
#!/usr/bin/perl -w
@flowers = ( "tulip", "rose", "pansy", "carnation", "flax",
"lupine", "iris", "nasturtium", "datura" );
@foods = ("taco", "salad", "hamburger", "apple");
&alphalist(@flowers);
&alphalist("dog", "cat", "bird", "lizard", "fish", "rabbit");
&alphalist("burrito", "noodles", @foods, "rice", "spaghetti");
sub alphalist{
local($f, @L) = @_;
@L = sort(@L);
print "FIRST ITEM: $f\n";
print "LIST: ";
foreach $item (@L){
print "$item ";
}
print "\n";
}
exit;
ASSIGNMENT:
Do the following:
- Write a script which will calculate the positive predictive value and
the negative predictive value given input for TP, TN, FP, and FN. Use
separate subroutines for each calculation and pass these values in as an
array to each subroutine.
- Write a script which takes input for nine values. Write a subroutine
which displays these values in a 3x3 matrix as shown in the first section of
this lesson. Your subroutine will take the nine values in as a single array.
Further your output should take the form of a web page and your matrix
should be an HTML table which uses the same labels as shown above.