Write a script which identifies a portion of a long sequence which is located between two patterns
Restriction Enzymes
Restriction enzymes (also referred to as molecular scissors or restriction endonucleases) are special enzymes which cut DNA at specific locations called restriction sites. Often they are used to cut a DNA strand and there are a multitude of restriction enzymes available which can cut DNA sequences in many locations. A typical restriction enzyme recognizes a sequence of about six nucleotides in length and makes the cut somewhere within or at one of the ends of the sequence. EcoRI is the name of one of the most common restriction enzymes and it recognizes the sequence GAATTC with the actual cut occurring between the G and the A.
Here are some more examples of restriction enzymes:
PunAI C|YCGRG ZanI CC|WGG BoxI GACNN|NNGTC BbeI GGCGC|C BamHI G|GATCC ApoI R|AATTY CacI |GATC CfoI GCG|C MthZI C|TAG SinI G|GWCC PfeI G|AWTC ECO24I GRGCY|C FauBII CG|CG FseI GGCCGG|CC HalII CTGCA|G HindII GTY|RAC Kzo49I G|GWCCThe vertical line in each sequence shows the cut site for the enzyme.
You will notice that a few letters which do not stand for nucleotides are include in the recognition sequences. Here are their meanings:
R = G or A Y = C or T M = A or C K = G or T S = G or C W = A or T B = not A (C or G or T) D = not C (A or G or T) H = not G (A or C or T) V = not T (A or C or G) N = A or C or G or T
Parentheses in Regular Expressions
How do the results differ for this modified version of the last example?
ASSIGNMENT:
Write a script which identifies the portion of a long sequence which is located between two restriction enzymes. Use BamHI as the first restriction enzyme and EcoRI as the second.