INTRODUCTION TO REGULAR EXPRESSIONS
Back to index
Regular expressions are used to search for patterns in strings. They are
especially useful when searching for patterns in very long strings. Although
patterns can be located without the use of regular expressions, their use
can save many lines of code. Consider the following rather tedious program.
#!/usr/bin/python
v1 = "415-555-0000"
v2 = "415550000"
def isPhoneNumber(text):
if len(text) != 12:
return False
for i in range(0,3):
if not text[i].isdigit():
return False
if text[3] != '-':
return False
for i in range(4,7):
if not text[i].isdigit():
return False
if text[7] != '-':
return False
for i in range(8,12):
if not text[i].isdigit():
return False
return True
print isPhoneNumber(v1)
print isPhoneNumber(v2)
#find phone number
v3 = "Call me at 455-804-5555 this afternoon or at 452-774-3333 this evening. Alternate number: 468-357-9999"
nums = 0
for i in range(0,len(v3)-12+1):
if isPhoneNumber(v3[i:i+12]):
print v3[i:i+12]
nums = nums + 1
if nums==0:
print "No phone numbers found."
Although this program will recognize twelve-character phone numbers, it uses a
lot of lines of code to get the job done. The same job could be accomplished
with much fewer lines of code using regular expressions as shown here:
#!/usr/bin/python
import re
v1 = "415-555-0000"
v2 = "415550000"
phoneRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
def isPhoneNum(text):
mo = phoneNumRegex.search(text)
if mo:
print mo.group()
else:
print "NOT PHONE NUMBER"
isPhoneNum(v1)
isPhoneNum(v2)
v3 = "Call me at 455-804-5555 this afternoon or at 452-774-3333 this evening. Alternate number: 468-357-9999"
print(phoneNumRegex.findall(v3))
There are things we can do to create more flexible regular expressions which
will recognize alternative phone number patterns, but for now this will set
the stage for the following assignment.
ASSIGNMENT:
Write a short program which will find all possible zip codes (five numeric
consecutive digits) in a string which contains at least five zip codes and
is at least a few lines in length. (Make sure your string consists of proper
sentences which contain zip codes as well as four-digit years and monetary
amounts. SUGGESTION: A fictitious crime report might be an easy theme to use since
location, year, and cost of damage can be put into words with reasonable
ease.) Once you have your five zip codes collected into an array they
will be processed to convert them to a named location using a dictionary
which will contain all zip codes in your string along with the names of
towns they match. Your output will look something like this:
99900 --> Belmont
85854 --> Washmutville
93422 --> Rigton
47354 --> Tegula
49003 --> Petulema
|