NM Geron Corporation SY GERN CO Thomas Okarma AD 230 Constitution Drive, Menlo Park, CA, 94025 FO nuclear transfer, stem cells, oncologyEach line begins with a two letter code. Here's a key to the two letter codes:
NM company name SY stock symbol CO CEO of company AD address FO focus areas of research and developmentBetween the code and the information is exactly two spaces. This will make it easier to parse this file.
Here are four more examples:
NM Perlegen SY private CO Brad Margus AD 2021 Stierlin Ct, Mountain View, CA, 94043 FO genomics, SNPs, haplotypes NM Incyte Genomics SY INCY CO Paul Friedman AD 3160 Porter Drive, Palo Alto, CA, 94304 FO genomic information and software NM Protein Design Labs, Incorporated SY PDLI CO Douglas Ebersole AD 34801 Campus Drive, Fremont, CA, 94555 FO monoclonal antibodies NM Nanogen SY NGEN CO V. Randy White AD 10398 Pacific Center Court, San Diego, CA, 92121 FO microelectronics for genomics research and medical diagnosticsWhile it is possible to understand information presented in this format, it may not be appropriate to present information this way to a general audience. So, what we need is a way of parsing this information and preparing it for display. Obviously, we could collect a lot more information (and we will) about companies, and so we are simply doing a scaled-down exercise in this assignment.
Here's a parser written in Perl which will output a properly coded web page.
makeHTML.pl
util.pm
(Make sure the information files are saved as company_name.des and that each company has it's own file.)
We could implement this as a CGI script, but we won't go that far for this assignment. The important thing is to inspect the Perl code so that you understand what was done here.