INTRODUCTION TO REGULAR EXPRESSIONS

Back to index

Regular expressions are used to search for patterns in strings. They are especially useful when searching for patterns in very long strings. Although patterns can be located without the use of regular expressions, their use can save many lines of code. Consider the following rather tedious program.
#!/usr/bin/python v1 = "415-555-0000" v2 = "415550000" def isPhoneNumber(text): if len(text) != 12: return False for i in range(0,3): if not text[i].isdigit(): return False if text[3] != '-': return False for i in range(4,7): if not text[i].isdigit(): return False if text[7] != '-': return False for i in range(8,12): if not text[i].isdigit(): return False return True print isPhoneNumber(v1) print isPhoneNumber(v2) #find phone number v3 = "Call me at 455-804-5555 this afternoon or at 452-774-3333 this evening. Alternate number: 468-357-9999" nums = 0 for i in range(0,len(v3)-12+1): if isPhoneNumber(v3[i:i+12]): print v3[i:i+12] nums = nums + 1 if nums==0: print "No phone numbers found."
Although this program will recognize twelve-character phone numbers, it uses a lot of lines of code to get the job done. The same job could be accomplished with much fewer lines of code using regular expressions as shown here:
#!/usr/bin/python import re v1 = "415-555-0000" v2 = "415550000" phoneRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d') def isPhoneNum(text): mo = phoneNumRegex.search(text) if mo: print mo.group() else: print "NOT PHONE NUMBER" isPhoneNum(v1) isPhoneNum(v2) v3 = "Call me at 455-804-5555 this afternoon or at 452-774-3333 this evening. Alternate number: 468-357-9999" print(phoneNumRegex.findall(v3))
There are things we can do to create more flexible regular expressions which will recognize alternative phone number patterns, but for now this will set the stage for the following assignment.
ASSIGNMENT:

Write a short program which will find all possible zip codes (five numeric consecutive digits) in a string which contains at least five zip codes and is at least a few lines in length. (Make sure your string consists of proper sentences which contain zip codes as well as four-digit years and monetary amounts. SUGGESTION: A fictitious crime report might be an easy theme to use since location, year, and cost of damage can be put into words with reasonable ease.) Once you have your five zip codes collected into an array they will be processed to convert them to a named location using a dictionary which will contain all zip codes in your string along with the names of towns they match. Your output will look something like this:

99900 --> Belmont 85854 --> Washmutville 93422 --> Rigton 47354 --> Tegula 49003 --> Petulema