Write a script which will count the number of repeats of a pattern within a sequence.
Telomeres
At the ends of eukaryotic chromosomes are telomeres. Telomeres consist of 10,000 nucleotides of the TTAGGG motif. That is, TTAGGG is repeated over and over hundreds of time at the ends of chromosomes. Telomeres are important since each time a cell divides and each chromosome is replicated, some of the nucleotides at each end of each chromosome are lost. When the telomeres become too short the cell will stop replicating, a state known as replicative cell senesence.
An enzyme called telomerase can repair repair the telomeres and cells which produce telomerase are immortal. This may, at first, sound like a good thing, but according to some researchers telomeres provide a mechanism to safeguard against uncontrolled cell proliferation (and thus they protect us from cancer). In cancer cells replicate in an out-of-control manner often because they inappropriately express the telomerase enzyme. By the way, the study of cancer is called oncology.
Another important part of a chromosome is the centromere which is located in the middle of human chromosomes. Actually centromeres are not exactly in the middle of human chromosomes. They are a little off center and for this reason chromosomes are said to have a short and a long arm. In fact, the location of genes is specified in terms of which arm they are on. The short arm is annotated with a p (for petit) and the long arm is annotated with a q (which is the next letter after p). For instance, the KRTHB1 gene (which codes for one type of keratin) is located at 12q13 (long arm of chromosome 12). The MTNR1B gene (which codes for one type of melatonin) is located at 11q21 (long arm of chromosome 11). The CD44 gene (which codes for the CD44 receptor) is located at 11p13 (short arm of chromosome 11). The last number in each example specifies the distance from the centromere.
More Regular Expressions
CAAAC CAAAAC CAAAAAC CAAAAAAC CAAAAAAAC GTTTTG GTTTTTG GTTTTTTTG ACGCGCGA ACCCCA AAAAACCCCC AAAAAAAACCCCCCCCCC AAAACCCCThe {n,n} construct is called the quantifier since it allows you to quantify exact numbers of repeats of a certain pattern. For instance, A{3} matches three A's. A{3,5} matches from three to five A's. A{3,} matches three or more A's. Other quantifiers include the +, *, and ?. The + matches one or more repeats of a pattern. The * matches 0 or more. The ? matches 0 or 1.
g acgac acac aaccgaacc aaccgggaaccYou can also count the number of occurences of a particular pattern like this:
ASSIGNMENT:
Write a script which checks for repetitions of the TTAGGG motif found in an input sequence. Make sure that any sequence which does not contain the normal nucleotides gets rejected by the script.