Introduction to PROSITE

"PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs."

Click here to go to PROSITE website.

"PROSITE is a database of protein families and domains. It is based on the observation that, while there is a huge number of different proteins, most of them can be grouped, on the basis of similarities in their sequences, into a limited number of families. Proteins or protein domains belonging to a particular family generally share functional attributes and are derived from a common ancestor."

"It is apparent, when studying protein sequence families, that some regions have been better conserved than others during evolution. These regions are generally important for the function of a protein and/or for the maintenance of its three- dimensional structure. By analyzing the constant and variable properties of such groups of similar sequences, it is possible to derive a signature for a protein family or domain, which distinguishes its members from all other unrelated proteins. A pertinent analogy is the use of fingerprints by the police for identification purposes. A fingerprint is generally sufficient to identify a given individual. Similarly, a protein signature can be used to assign a newly sequenced protein to a specific family of proteins and thus to formulate hypotheses about its function."

"PROSITE currently contains patterns and profiles specific for more than a thousand protein families or domains. Each of these signatures comes with documentation providing background information on the structure and function of these proteins."
- - - - - - - - - - - - - - - - - - - - - - - -
Here are three files you will work with during this and the next two lessons:

BROKEN LINKS: FILES TOO LARGE FOR LOCAL STORAGE!

PROSITE DOCUMENTATION (prosite.hlp)
PROSITE DATA (prosite.dat)
PROSITE LIST OF DOCUMENT ENTRIES (prosite.lis)

Actually, you will only work with the data file, but the other files have been included for your information.

Here is a little script which will list only the ID lines form the prosite.dat file.

Perl script to list only IDs

ASSIGNMENT:
You will modify the sample script so that the ID line AND AC line are printed out for each record. Print both of these lines on a single line with the ID and AC stripped off and a comma between the two.