Bioinformatics Algorithms
Entropy
Back to Index
One way to measure the degree to which a motif is conserved is to compared
it column by column and produce an entropy score. Entropy is a measure of
the uncertainty of a probability distribution. Conserved columns produce
a lower score.
Counterintuitively, scoring of the resulting matrix can be done by row or
column and the resulting overall score will be the same.
#!/usr/bin/python
import math
seqs = [ "tcatattttt","ccctatccac","gggggggggg","gggggggggg","gtgggggggg",
"gggggggggt","gaaaaaaaaa","tctcccttat","ttttttttca","ttttttccta",
"tattccacac","tcctccttcc"]
score=0
for s in seqs:
print s
g=0
c=0
t=0
a=0
for x in s:
if x == "g":
g=g+1
elif x == "c":
c=c+1
elif x == "t":
t=t+1
else:
a=a+1
print t
if g == 10 or g == 0:
score = score + 0
else:
score = score + (-g/float(10))*math.log(g/float(10),2)
if c == 10 or c == 0:
score = score + 0
else:
score = score + (-c/float(10))*math.log(c/float(10),2)
if t == 10 or t == 0:
score = score + 0
else:
score = score + (-t/float(10))*math.log(t/float(10),2)
if a == 10 or a == 0:
score = score + 0
else:
score = score + (-a/float(10))*math.log(a/float(10),2)
print score
print score
Main Index