Python Basics and Biostatistics

Introduction

Back to Index
In this lesson the student will learn how to:
  1. Write and execute a simple PYTHON script
  2. Use the print statement
  3. Identify and use the newline character
  4. Locate and write the bang line in a PYTHON script
  5. Associate common terms with statistics
  6. Explain in general terms how statistics can be useful
By the end of this lesson the student will be able to:

    Write a short Python script which prints out strings 
    containing common words associated with statistics.

Python is a popular, high-level programming language. It is one of the most widely used languages in existence. Other popular languages include C/C++, Java, and Perl. (We could also mention Ruby, PHP, and JavaScript as close contenders.) Python is an extremely useful language as you will see in the following lessons.

Follow these steps to get your first Python script up and running:

  1. Open JOE for a file called test.py and enter the following:
    #!/usr/bin/python print "HELLO FROM PYTHON" print "THIS IS A SIMPLE PYTHON SCRIPT"
  2. Save the file and run the following command at the command line:
    chmod +x test.py
  3. Next you are ready to run or execute the script. Here's how you do it:
    ./test.py
    The ./ indicates that the file is in the current directory.
The first line of this little script is called the bang line because it starts with an exclamation mark (sometimes referred to as a bang by people who work with computers). This line specifies the location of the PYTHON interpreter. The next two lines are simply print commands which do nothing more than output the characters between the quotation marks.


IMPORTANT NOTE: The Python interpreter comes as a standard part of most Linux installations. It is most often located in the /usr/bin directory. It COULD be located elsewhere, for instance, /usr/local/bin, is a frequent alternate location. Python does not come as standard on the Windows platform. You must install it yourself. The documentation for the Windows platform installation should give you information about the path to the Python interpreter. Macintosh users running OS X will most likely find that Python is available.

Bioinformatics and Biostatistics

You've probably heard terms such as average, mean, standard deviation, median, range, probability, odds, interval, ratio, sensitivity, correlation, and prevalence. These are all terms which are part of the vocabulary of statistics. In this unit we will introduce you to statistics and show you a little about how statistics can be applied to biological data (and also how Python can help you with this endeavor).

You've probably heard of the field of bioinformatics. Bioinformatics is all about making sense of biological data. Probably the most important tool used in bioinformatics is the computer. Computer languages commonly used in field of bioinformatics include R, Perl, Java, JavaScript, C/C++, and Python. In this unit we will limit our discussion to biostatistics. Biostatistics is all about making sense of the statistics generated through the analysis of biological data. Biostatistics is a sub-discipline of bioinformatics. Computers and computer languages are also very important for performing biostatical analysis.

One thing is certain about biological data: There is a lot of it. In fact, there is a HUGE amount of data being generated on a daily basis. The shear fact that one human genome contains 3.1 billion basepairs (something we will discuss at great length in unit two) and that there are billions of humans (all with slightly different genomes) should help you to get some persepctive on the enormous amount of information which could be gathered just on the contents of the human genome. Add to that the enormous number of species of living things (which all have genomes of their own) which could be studied and you get even more data which could potentially be analyzed.

Biology isn't just about the study of genomes, however. Biologists conduct experiments dealing with the effects of potential pharmaceuticals on living organisms, how proteins are constructed, the structure of various microorganisms, biological pathways in the body, the function of various systems within the body, and many other interesting topics. All of these studies yield information: usually lots of information. So, again we see the importance of bioinformatics and it's sub-field, biostatistics.

ASSIGNMENT:

You will write a simple Python script which produces four lines of output. Each line will contain at least four common words associated with the field of statistics.


Main Index