Bioinformatics Algorithms

Reverse Complement and Counting K-mers


The first script simply constructs a reverse complement to a 
nucleotide sequence. The second will find the most frequent k-mer
in a sequence.

revcomp.py Script Outline

Takes a nucleotide sequences (presumably in 3'-->5' direction) and creates a complement in the opposite direction (presumably in 5'-->3' direction).

#!/usr/bin/python import sys #lines = sys.stdin.read().splitlines() #p=lines[0] #p="GCTAGCT" p = raw_input("ENTER NUCLEOTIDE STRING: ") comp = "" for i in p: if i=='T': comp = comp + 'A' elif i=='A': comp = comp + 'T' elif i=='C': comp = comp + 'G' else: comp = comp + 'C' #print p #print comp print comp[::-1]

Counting K-mers Script Outline

Takes a nucleotide sequence as input and uses a dictionary to keep track of all possible k-mers. It then goes through dictionary and reports the most frequent patterns of length k.

TO RUN THIS SCRIPT:


     ./pdic.py < pdic_data.txt

#!/usr/bin/python import sys lines = sys.stdin.read().splitlines() t=lines[0] k=int(lines[1]) dlist = {} for i in range(0,len(t)-k+1): item = t[i:i+k] if dlist.has_key(item): dlist[item] = dlist[item] + 1 else: dlist.update({item : 1}) print dlist largest=0 for i in dlist.items(): if i[1]>largest: largest=i[1] print largest output="" for j in sorted(dlist.items()): if j[1] == largest: output = output + j[0] + " " print output

SOURCE CODE:
Reverse Complement
Counting K-mers
Data for Counting K-mers

Main Index