0701 D3Similarity done
总结
相似性搜索中:
输入了化湿败毒方中 1878个 化合物(输入1894个,16个报错),与32个靶点的617个阳性化合物进行比较
发现有36个化合物与抗新冠的阳性化合物有超过90%的二维相似性,来源于5个靶标
1 3C-like protease
2 Papain-like protease
11 Ca,Na,-exchanging ATPase pump
27 Na,K-exchanging ATPase pump
30 50S ribosomal protein L2
收集这 36 个化合物的结构(opt)
可视化位置:172.21.85.2 /home/dddc/zzy/test/visualize
名字+结构存放位置:172.21.85.12 /home/zjxu/zzy/hsbd/similarity/analysis/36mol
记录
- # 172.21.85.12 /home/zjxu/zzy/hsbd/similarity/analysis
- # 先统计2-32这31个靶点
- for i in {2..32};do awk '$3>0.9 {print}' ../data/hsbd-1894mol-2d_results-sorted2dx3d_tgt${i}.txt|sed "s/^/target$i /" >>2-32_up0.9.txt;done
- for i in {2..32};do awk '$3>0.8 {print}' ../data/hsbd-1894mol-2d_results-sorted2dx3d_tgt${i}.txt|sed "s/^/target$i /" >>2-32_up0.8.txt;done
- for i in {2..32};do awk '$3>0.7 {print}' ../data/hsbd-1894mol-2d_results-sorted2dx3d_tgt${i}.txt|sed "s/^/target$i /" >>2-32_up0.7.txt;done
- (base) [zjxu@R820 analysis]$ awk '{print $2}' 2-32_up0.9.txt |sort|uniq |wc -l
- 34
- (base) [zjxu@R820 analysis]$ awk '{print $1}' 2-32_up0.9.txt |sort|uniq|wc -l
- 4
for a in`cat 34mol_index`
;do c=`grep TCMID ../5/${a}.mol2`
;sed -i "s/\b$a\b/$c/g" test.txt;donefor a in`cat result/34mol_index`
;do c=`grep TCMID ../5/${a}.mol2`
;sed -i "s/\b$a\b/$c/g" result.txt;done
- 5里的名字和后面交的(6-32 / 2-4)不是一样的,检查!!!!
- for a in
`cat 34mol_index`
;do c=`grep TCMID ../../6/${a}.mol2`
;sed -i "s/\b$a\b/$c/g" test2.txt;done - for a in
`cat result/34mol_index`
;do c=`grep TCMID ../../6/${a}.mol2`
;sed -i "s/\b$a\b/$c/g" result.txt;done
- # 第一个靶点阳性化合物多(400+),额外统计
- # 由于切分了任务,要先把名字换回去
- for i in
`cat index`
;do for a in {1..200};do c=`grep TCMID ../${i}/MOL_${a}.mol2`
;sed -i "s/\bMOL_${a}\b/$c/g" hsbd-mol-list-${i}_results-sorted2dx3d_tgt1.txt;done;done - for a in {201..294};do c=
`grep TCMID ../1894/MOL_${a}.mol2`
;sed -i "s/\bMOL_${a}\b/$c/g" hsbd-mol-list-1894_results-sorted2dx3d_tgt1.txt;done - for i in
`cat index`
;do awk '$3>0.9 {print}' hsbd-mol-list-${i}_results-sorted2dx3d_tgt1.txt|sed "s/^/target1 /" >>1_up0.9.txt;done - for i in
`cat index`
;do awk '$3>0.8 {print}' hsbd-mol-list-${i}_results-sorted2dx3d_tgt1.txt|sed "s/^/target1 /" >>1_up0.8.txt;done - for i in
`cat index`
;do awk '$3>0.7 {print}' hsbd-mol-list-${i}_results-sorted2dx3d_tgt1.txt|sed "s/^/target1 /" >>1_up0.7.txt;done - (base) [zjxu@R820 analysis]$ awk '{print $2}' 1_up0.9.txt |sort|uniq|wc -l
- 16
- (base) [zjxu@R820 analysis]$ awk '{print $2}' 1_up0.8.txt |sort|uniq|wc -l
- 51
- (base) [zjxu@R820 analysis]$ awk '{print $2}' 1_up0.7.txt |sort|uniq|wc -l
- 117
- # 统计库里的阳性化合物数量
- (base) [zjxu@R820 data]$ for i in {1..32};do wc -l ~/xbzhang/2019-nCov-final/D3Similarity/D3Similarity-yqyang/VS/target-id/molall_${i}.lst>>count;done
- (base) [zjxu@R820 data]$ awk '{print $1}' count |awk '{sum+=$1}END{print sum}'
- 617
- # 找到36个化合物的结构
- /home/zjxu/zzy/hsbd/similarity/analysis/36mol
- for a in
`cat 1894index`
;do c=`grep TCMID ../../6/MOL_${a}.mol2`
;mv 1894mol/ MOL_${a}-opt.mol2 1894mol/${c}.mol2;done - # 由于有化合物和多个靶标的阳性化合物像,所以没法直接替换,用了笨办法,uniq -c -d看有哪些化合物属于上述情况,然后手动把他们的靶标修改成 targetAandB 的形式
- # 之后,批量替换mol2文件中存储名字的行(第二行)
- for i in
`cat 36index`
;do c=`grep $i pro-mol.txt`
;sed "2s/.*/$c/" 36/$i.mol2 >>name-36.mol2;done