Bioinformatics Algorithms

Entropy

Back to Index

One way to measure the degree to which a motif is conserved is to compared
it column by column and produce an entropy score. Entropy is a measure of
the uncertainty of a probability distribution. Conserved columns produce
a lower score.

Counterintuitively, scoring of the resulting matrix can be done by row or column and the resulting overall score will be the same.
#!/usr/bin/python import math seqs = [ "tcatattttt","ccctatccac","gggggggggg","gggggggggg","gtgggggggg", "gggggggggt","gaaaaaaaaa","tctcccttat","ttttttttca","ttttttccta", "tattccacac","tcctccttcc"] score=0 for s in seqs: print s g=0 c=0 t=0 a=0 for x in s: if x == "g": g=g+1 elif x == "c": c=c+1 elif x == "t": t=t+1 else: a=a+1 print t if g == 10 or g == 0: score = score + 0 else: score = score + (-g/float(10))*math.log(g/float(10),2) if c == 10 or c == 0: score = score + 0 else: score = score + (-c/float(10))*math.log(c/float(10),2) if t == 10 or t == 0: score = score + 0 else: score = score + (-t/float(10))*math.log(t/float(10),2) if a == 10 or a == 0: score = score + 0 else: score = score + (-a/float(10))*math.log(a/float(10),2) print score print score

SOURCE CODE:
ENTROPY.py

Main Index