Write a script which allows the user to search for a pattern specified using IUB ambiguity codes.
IUB Ambiguity Codes
First of all IUB stands for International Union of Biochemistry (the full name is actually IUBMB which stands for the same thing only with Molecular Biology added to the end). The IUBMB is the organization responsible for the IUB Ambiguity Codes which are just standardized symbols referring to nucleotides. The standard symbols used for nucleotides are shown below:
R = G or A Y = C or T M = A or C K = G or T S = G or C W = A or T B = not A (C or G or T) D = not C (A or G or T) H = not G (A or C or T) V = not T (A or C or G) N = A or C or G or TSo, if we encounter a sequence like this:
AGATCVWNNKAGATCAny of the following match:
AGATCAAGTKAGATC AGATCCTCAKAGATC AGATCGTAGKAGATC
Regular Expressions Matching IUB Ambiguity Codes
Suppose that we want to find a match for AGNVWCCT. How would we write a regular expression for this?
ASSIGNMENT:
Write a script which allows the user to enter a search pattern (which utilizes the IUB ambiguity code) and a sequence to use the pattern on. Report the part of the sequence which comes before the pattern match, the actual matched portion of the sequence, and the part that comes after the match. Make a loop in your script which keeps prompting for search pattern and sequence until the user enters "q" for quit. Ensure that the pattern and sequence are recalled between loop iterations unless the user enters new ones. (This way the user can repeat the pattern for several sequences or vice versa.)