分类目录归档:某硬核老师的思考题

Genomics HW2 速成指南

Genomics HW2 速成指南

确信自己一定发的没晚

应该来说,HW2 还是比较简单的,难点体现在如下两道题上,并给出我的解答:

Question 3

Assemble Please assemble the sequence in the file ”Assemble.fa” into a complete sequence (10’) and determine whether it is a circular DNA. (5’) (Sequence length=45)

Seq_1
ACAATCGGGC
Seq_2 1
TCGCTTGAGA
Seq_3
GAGCACAACT
Seq_4 1
TTGAGAAGGA
Seq_5 1
CACCTATCGC
Seq_6
CGACCTCAAC

这道题组装完了如何判断是不是环状……我是懒得搞了。不过呢,题目中已经给了长度,拼出来比它长,大概率是环状。
下面是用于组装的代码:

#完成一个短序列的组装
import re
import itertools
from collections import defaultdict
#读fasta文件,将每一个read加入列表
seq=[]
with open("Assembleliu.fa",'r') as fasta:
    for line in fasta.readlines():
        if line.startswith('>'):
            continue
        else:
            #先通过正则表达式去除其他的字符
            line = re.sub('[^ATCGatcg]','',line)
            seq.append(line)

#下面来进行拼接,这部分代码来自于课件

def overlap(a, b, min_length=3):
    """Return length of longest suffix of 'a'.
    function matches a prefix of 'b' that is at least 'min_length'
    characters long.  If no such overlap exists,
    return 0.
    """
    start = 0  # start all the way at the left
    while True:
        start = a.find(b[:min_length], start)  # look for b's suffx in a
        if start == -1:  # no more occurrences to right
            return 0
        # found occurrence; check for full suffix/prefix match
        if b.startswith(a[start:]):
            return len(a)-start
        start += 1  # move just past previous match


def scs(ss):
    """Return shortest common superstring."""
    shortest_sup = None
    for ssperm in itertools.permutations(ss):
        sup = ssperm[0]  # superstring starts as first string
        for i in range(len(ss)-1):
            # overlap adjacent strings A and B in the permutation
            olen = overlap(ssperm[i], ssperm[i+1], min_length=1)
            # add non-overlapping portion of B to superstring
            sup += ssperm[i+1][olen:]
        if shortest_sup is None or len(sup) < len(shortest_sup):
            shortest_sup = sup  # found shorter superstring
    return shortest_sup  # return shortest



#输出结果
print("SCS result: ", scs(seq))
print("SCS length: ", len(scs(seq)))
#比较SCS length 和题目中给出的长度可以判断是否为ciucle,如果是circle,
# 那么SCS result给出的结果就并不可靠,由于我自己作业是线性的,就不管怎么输出circle了[->_->]

果然我仍然是面向ctrl+c/v程序设计选手

Question 4

K-mer depth
1. Please calculate the kmer-depth of the sequence in the ”GenomeKmerDepth.fa” file and answer, what is the maximum kmer-depth (5’) and what is the corresponding kmer (5’). (k=3)
2. Please read the document ”GenomeRepeatKmerDepth.pdf” and find out the pictures correspond with random genome (5’), large fragment repeat genome (5’) and SINE enrichment genome (5’) separately.

这道题的代码要比上面的好写很多:

genomeSeq = ""
with open("./GenomeKmerDepth.fa",'r') as genome:
    for line in genome.readlines():
        if line.startswith('>'):
            continue
        if (line == '\n'):
            continue
        else:
            line.rstrip();
            genomeSeq += line

#遇到新的kmer加入字典,遇到已经有的就在字典里加入新的键值对
genomeSeq =genomeSeq.rstrip()
k = 3
dictKmer = {}
total = 0
i = 0#字符串标记
while (i<len(genomeSeq)-k):
    tempMer = genomeSeq[i:i+k]
    #print(tempMer)
    if tempMer in dictKmer:
        dictKmer[tempMer] +=1
        total +=1
    else:
        dictKmer[tempMer] = 1
        total +=1
    i+=1


#输出出现最多的kmer
output = sorted(dictKmer.items(),key = lambda item:item[1])
print(output[-1])

四月初六 分子实验思考题

“思考题也许会迟到,但绝不会缺席。“

三周过后,我们又迎来了新一期的分子实验思考题。

1. 我们纯化带有His标签的蛋白时,使用的镍柱有Ni-IDA和Ni-NTA,它们的区别在哪里?
2. 阅读所给的文献,回答下列问题。①使用大肠杆菌作为蛋白表达宿主的优势有哪些?②使用T7启动子表达系统时,防止蛋白本底表达的手段有哪些?③我们常用抗生素进行阳性菌的筛选,文献中还提到一种筛选系统:plasmid addiction,简述一下是什么原理,有哪几个类型。④从基因组的角度说一下BL21(DE3)作为蛋白表达宿主菌的优势。
3. 登陆 http://www.ruf.rice.edu/~bioslabs/studies/sds-page/sdsgoofs.html,观摩别人家的SDS-PAGE

继续阅读