Bioinformatics Algorithms
Reverse Complement and Counting K-mers
Back to Index
The first script simply constructs a reverse complement to a
nucleotide sequence. The second will find the most frequent k-mer
in a sequence.
revcomp.py Script Outline
Takes a nucleotide sequences (presumably in 3'-->5' direction) and creates a
complement in the opposite direction (presumably in 5'-->3' direction).
#!/usr/bin/python
import sys
#lines = sys.stdin.read().splitlines()
#p=lines[0]
#p="GCTAGCT"
p = raw_input("ENTER NUCLEOTIDE STRING: ")
comp = ""
for i in p:
if i=='T':
comp = comp + 'A'
elif i=='A':
comp = comp + 'T'
elif i=='C':
comp = comp + 'G'
else:
comp = comp + 'C'
#print p
#print comp
print comp[::-1]
Counting K-mers Script Outline
Takes a nucleotide sequence as input and uses a dictionary to keep track of
all possible k-mers. It then goes through dictionary and reports the most
frequent patterns of length k.
TO RUN THIS SCRIPT:
./pdic.py < pdic_data.txt
#!/usr/bin/python
import sys
lines = sys.stdin.read().splitlines()
t=lines[0]
k=int(lines[1])
dlist = {}
for i in range(0,len(t)-k+1):
item = t[i:i+k]
if dlist.has_key(item):
dlist[item] = dlist[item] + 1
else:
dlist.update({item : 1})
print dlist
largest=0
for i in dlist.items():
if i[1]>largest:
largest=i[1]
print largest
output=""
for j in sorted(dlist.items()):
if j[1] == largest:
output = output + j[0] + " "
print output
Main Index