Bioinformatics Algorithms
Hamming Distnce
Back to Index
This example creates a list of similar sequences within
Hamming distance d of the original sequence. If d is a
small number then the list is short and the sequences in
the list are very similar, but if d is large, then the
list can be lengthy.
The HammingDistance function is used to compare two sequences.
Basically, the neighbors function generates all possible substitutions in a
pattern and returns a neighborhood of sequences within Hamming distance d of
the original sequence. Note that a value of d greater than the length of the
original sequence is problematic. What happens if you use such a value in
the sample program?
#!/usr/bin/python
import sys
seq="ACGAC"
d=1
def HammingDistance(p,q):
score=0
for n in range(0,len(p)):
if p[n] != q[n]:
score = score + 1
return score
def neighbors(patt, d):
if d == 0:
return patt
if len(patt) == 1:
return ["A", "C", "G", "T"]
neighborhood = []
suffixN = neighbors(patt[1:],d)
for s in suffixN:
if HammingDistance(patt[1:], s) < d:
for x in ["A","C","G","T"]:
neighborhood.append(x+s)
else:
neighborhood.append(patt[0]+s)
return neighborhood
output = neighbors(seq,d)
print seq
print output
Main Index