KL divergence python ver测试报告

1.黑盒测试
1) ver.python
测试代码:kldiv-lywu
环境依赖:MDanalysis
测试输入:
{
  • top_ref = "./demo/step1/1-D2_DA_WT/prot.pdb"
  • traj_ref = ["./demo/step1/1-D2_DA_WT/run1/traj.dcd", "./demo/step1/1-D2_DA_WT/run2/traj.dcd"]
  • top_test = "./demo/step1/2-D2_BRC_WT/prot.pdb"
  • traj_test = ["./demo/step1/2-D2_BRC_WT/run1/traj.dcd", "./demo/step1/2-D2_BRC_WT/run2/traj.dcd"]
}##两个体系:1-D2作为参考体系&2-D2作为测试体系
测试命令:python kldiv_example_usuage.py
测试输出:
{
residresnamekl_div
79LEU0.166126
7173ARG0.179004
128130VAL0.371105
##from kl_div_res.csv,排除kl_div等于0的
##from kldiv_example_usuage.png
}
2)ver.matlab
测试代码:AlloCraft-main
环境依赖:Bioinformatics toolbox,MDprot,VMD
测试输入:
{
  • testdir:~/demo/step1/2-D2_BRC_WT
  • refdir: ~/demo/step1/
  • databasePath: ~/demo/step1/database
  • ##其他参数保持默认

}##由input_kldiv载入给出
测试命令:
addpath:{
  • src;
  • util;
}##这里说明其他的代码是供MI使用的
input_kldiv
kldivMain
测试输出:
1)test_1101_1(md2pathdev generated)
 kl1 visualization(probably)
kl-res distribution
dihedrals distribution(Ref .vs. Test)
2)test_1101_2(md2pathdev  originally contained in /demo)
补充:
kldivMain matlab的测试过程要求1-D2_DA_WT的MI计算结果md2pathdev(详见错误1)。此处测试采用既测试了md2path产生的,也测试了demo本身自带的md2pathdev文件夹。
错误1:错误使用 save
由于 'D:\Matlab\Works\AlloCraft\demo\step1\1-D2_DA_WT\md2pathdev' 不存在,无法创建 'dihedrals.mat'。
出错 Simulation/computeDihedrals (第 89 行)
                    save(options.Path, 'dihedrals', 'reSort');
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
出错 kldivMain (第 139 行)
refSim.computeDihedrals( 'Path', fullfile(settings.refdir, "md2pathdev","dihedrals.mat"),...
总结:黑盒测试不符合预期,并且kl的计算matlab版本依赖于参考体系的MI计算。测试下来,具体依赖的量为dihedrals.mat,BS_residu.txt,建议明确python版本是否实现了同样的依赖传递。
2.全程测试
测试标准:关键变量的断点等价测试
关键变量:{
  • kl1:一阶KL散度
  • kl2:二阶KL散度
  • dihedralsRef:参考系统的二面角数据
  • dihedralsTest:测试系统的二面角数据
}
###以下过程采用的matlab数据产出与test_1101_1 一致
测试方案:
matlab版本保存关键变量 :*.mat
python 同样保存关键变量:*.npy
比较对应关键变量之间的差异:
1)kl1
比对结果:
MATLAB (MATLAB kl1):
- Non-zero values: 118
- Mean: 0.0681
- Length: 1
- Shape: (1, 1066)
Python (Python kl1):
- Non-zero values: 3
- Mean: 0.0015
- Length: 295
- Shape: (295,)
Comparison Results:
- Non-zero count difference: 115
- Mean difference: 0.0666
- Length difference: -294
- Cosine similarity: Cannot compare - different lengths
2)kl2(默认不计算)
3)dihedralsRef
比对结果:
MATLAB dihedralsRef stats:
- Shape: (314, 1066)
- Non-zero count: 334724
- Mean: 4.2881
Python dihedralsRef stats:
- Shape: (295, 8)
值得注意的是,python中dihedralsTest/Ref中存在许多空值阵列,matlab版本中并没有这种情况:
  array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan])                                            ]]
4)dihedralsTest
比对结果:
MATLAB dihedralsTest content (first 5 rows):
[[6.16714437 0.45286268 0.43257636 ... 2.38117311 1.18061491 5.25285963]
 [2.69939788 6.02492648 5.98582133 ... 2.19857802 1.08216082 2.98703869]
 [3.05658081 0.57866149 0.30948201 ... 4.23180163 4.79413621 3.44941891]
 [3.0065602  0.59420209 0.5067571  ... 2.06935103 4.52787803 3.31760294]
 [2.93705326 5.94524694 5.94910143 ... 4.29951669 3.86043934 3.15148747]]
MATLAB dihedralsTest stats:
- Shape: (316, 1066)
- Non-zero count: 336856
- Mean: 4.2851
Python dihedralsTest stats:
- Shape: (295, 8)
总结:不管是kl,还是dihedrals的计算结果都与matlab版本中的变量有很大的差别,实际值不一致、维度不一致。其次,由于python版本计算结果包含诸多连续的空值,有较大可能出现错误。并且输入数据整体一致,结果出现较大差别,建议重新确认流程。
3.参数确认
1)n_blocks(python )==4
nBlocks(matlab)==6
2) dihs(python)== ['phi','psi','chi_1','chi_2',chi2','chi_3','chi_4']
matlab中没有明确的dihs变量,相关的只有dihText和dihType
dihText(matlab)=="\chi" (由dihType==0 决定)
4.参数调整后的再测试
修改参数:n_blocks==6 
dihs=['chi']导致keyerror,因此仅修改n_blocks
测试结果:
residresnamekl_div
128130VAL0.226054
241243TYR0.19875
250252VAL0.020061
总结:不是由于参数变化导致结果的差异。