Click here to go to PROSITE website.
"PROSITE is a database of protein families and domains. It is based on the observation that, while there is a huge number of different proteins, most of them can be grouped, on the basis of similarities in their sequences, into a limited number of families. Proteins or protein domains belonging to a particular family generally share functional attributes and are derived from a common ancestor."
"It is apparent, when studying protein sequence families, that some regions have been better conserved than others during evolution. These regions are generally important for the function of a protein and/or for the maintenance of its three- dimensional structure. By analyzing the constant and variable properties of such groups of similar sequences, it is possible to derive a signature for a protein family or domain, which distinguishes its members from all other unrelated proteins. A pertinent analogy is the use of fingerprints by the police for identification purposes. A fingerprint is generally sufficient to identify a given individual. Similarly, a protein signature can be used to assign a newly sequenced protein to a specific family of proteins and thus to formulate hypotheses about its function."
"PROSITE currently contains patterns and profiles specific for more than a
thousand protein families or domains. Each of these signatures comes with
documentation providing background information on the structure and function
of these proteins."
- - - - - - - - - - - - - - - - - - - - - - - -
Here are three files you will work with during this and the next two
lessons:
BROKEN LINKS: FILES TOO LARGE FOR LOCAL STORAGE!
PROSITE DOCUMENTATION (prosite.hlp)
PROSITE DATA (prosite.dat)
PROSITE LIST OF DOCUMENT ENTRIES (prosite.lis)
Actually, you will only work with the data file, but the other files have
been included for your information.
Here is a little script which will list only the ID lines form the prosite.dat file.