KL div pyver 对比报告

1.思路整理:
1)二面角计算:既然二面角的计算结果不符,Trackback到输入和计算过程{
  • 输入:{
  • m.[chain,HigherOrder,Path,ReSortPath,ResIds,StartFrame]
  • p.[u_ref,n_frames_ref,residues_ref]
  • #差不很多
  • }
  • 计算:{
  • ##大概率是计算过程,维度都对不上。
  • ###残基对不上
  • }
  • }
}
python版本可视化分布
2)KL div计算:
为了确认错误的问题点在哪,将python计算的二面角数据保存为mat格式,然后用matlab加载,然后看结果如何。
脚本:
1)在kldiv_example_usuage.py中增加了写入mat文件的代码:/variables/new_format/*.mat
(要转化为0-2pi)
2)通过inspect_mat_format.py确认写入的文件的内容格式:
结果:dihedrals正确写入,reSort不能正确写入(参考log1)
现在已经可以正确写入reSort,参考log2
3)从new_format/*mat 加载数据,然后开始加载,暂且跳过reconcileDihedralList,看结果,没成功。接下来不跳过reconcileDihedralList,发现测试体系和参考体系的残基对不上,并且python代码的确没用上database,修改代码
结论:相同体系的二面角的分布的不同,二面角的计算过程存在不一致
补充:测试文件包:kldiv_lyw_tested.tar.gz
有效信息:
dihedral.mat数据结构:
参考数据.mat文件结构:
键: dict_keys(['__header__', '__version__', '__globals__', 'dihedrals', 'reSort'])
dihedrals: 类型=<class 'numpy.ndarray'>, 形状=(278, 1)
reSort: 类型=<class 'numpy.ndarray'>, 形状=(1066, 6)
测试数据.mat文件结构:
键: dict_keys(['__header__', '__version__', '__globals__', 'dihedrals', 'reSort'])
dihedrals: 类型=<class 'numpy.ndarray'>, 形状=(278, 1)
reSort: 类型=<class 'numpy.ndarray'>, 形状=(1066, 6)
2.问题总结
1)二面角的计算出的变量{
  • |dihedralsRef|: (0,10)
  • |dihedralsTest|:(0,10)
  • } versus {
  • |dihedralsRef/Test| (0,...180
  • }
  • 大概python使用了角度制√
2)Resort.txt内容代表什么,reSortKLDiv.mat量代表什么,reSortCommonRef,reSortCommon代表啥,reSort到底是啥。
Resort是对应残基和二面角关系的文件。
3)m code中提到了对齐的问题(reconcileDihedralList)会不会有意一定影响,感觉会有,但是不应该很大。
4)一个比较大的问题一定是:二面角和残基的对应关系,Resort.txt和dihedral.mat表示,一个残基对应(1,8)个二面角不等。mad,reSort文件在各自体系的输出目录中,另外原来保存的dihedrals.mat文件就有reSort,测试体系的reSort和参考体系的不太一样
5) 二面角聚类过程会不会有影响,不会
6)难道因为python使用的不是弧制度?不是,不过后面会转化
7)matlab中按照二面角的类型组织,python中按照残基组织
8)python产生的mat Resort全是0,解决了,调整了赋值
9)然后就是为什么kldiv中的代码没有用到‘database',也就是reconcileDihedralList过程
补充:
computeDihedrals用的calcalldihedralsfromtrajs是MDprot的。
log1:inspect_mat_format结果
python inspect_mat_format.py Inspecting original .mat file formats Inspecting file: variables/dihedralsRef.mat Keys in the .mat file: ['__header__', '__version__', '__globals__', 'dihedrals', 'reSort'] Structure of each variable: Variable: dihedrals Type: <class 'numpy.ndarray'> Shape: (278, 1) Data type: object First few elements: [array([[ nan, 6.26505222, 0.60820916, 0.41011049], [ nan, 3.11268983, 0.68137676, 0.4586515 ], [ nan, 3.29773689, 5.73342441, 5.85807488], ..., [ nan, 2.95955215, 5.54917202, 5.84895467], [ nan, 3.03202124, 5.58968382, 5.8660925 ], [ nan, 2.46877949, 0.38443584, 0.37765127]])] Variable: reSort Type: <class 'numpy.ndarray'> Shape: (1066, 6) Data type: uint16 First few elements: [ 1 2 1 7 15 17] Inspecting file: variables/dihedralsTest.mat Keys in the .mat file: ['__header__', '__version__', '__globals__', 'dihedrals', 'reSort'] Structure of each variable: Variable: dihedrals Type: <class 'numpy.ndarray'> Shape: (278, 1) Data type: object First few elements: [array([[ nan, 6.16714437, 0.45286268, 0.43257636], [ nan, 2.69939788, 6.0249python inspect_mat_format.py
Inspecting original .mat file formats
Inspecting file: variables/dihedralsRef.mat
Keys in the .mat file:
['__header__', '__version__', '__globals__', 'dihedrals', 'reSort']
Structure of each variable:
Variable: dihedrals
Type: <class 'numpy.ndarray'>
Shape: (278, 1)
Data type: object
First few elements:
[array([[       nan, 6.26505222, 0.60820916, 0.41011049],
        [       nan, 3.11268983, 0.68137676, 0.4586515 ],
        [       nan, 3.29773689, 5.73342441, 5.85807488],
        ...,
        [       nan, 2.95955215, 5.54917202, 5.84895467],
        [       nan, 3.03202124, 5.58968382, 5.8660925 ],
        [       nan, 2.46877949, 0.38443584, 0.37765127]])]
Variable: reSort
Type: <class 'numpy.ndarray'>
Shape: (1066, 6)
Data type: uint16
First few elements:
[ 1  2  1  7 15 17]
Inspecting file: variables/dihedralsTest.mat
Keys in the .mat file:
['__header__', '__version__', '__globals__', 'dihedrals', 'reSort']
Structure of each variable:
Variable: dihedrals
Type: <class 'numpy.ndarray'>
Shape: (278, 1)
Data type: object
First few elements:
[array([[       nan, 6.16714437, 0.45286268, 0.43257636],
        [       nan, 2.69939788, 6.02492648, 5.98582133],
        [       nan, 3.05658081, 0.57866149, 0.30948201],
        ...,
        [       nan, 3.265389  , 5.67503575, 5.87973518],
        [       nan, 2.88017197, 6.08011377, 6.25613019],
        [       nan, 2.98105289, 5.53090046, 5.81683091]])]
Variable: reSort
Type: <class 'numpy.ndarray'>
Shape: (1066, 6)
Data type: uint16
First few elements:
[ 1  2  1  7 15 17]
Inspecting new format .mat files
Inspecting file: variables/new_format/dihedralsRef.mat
Keys in the .mat file:
['__header__', '__version__', '__globals__', 'dihedrals', 'reSort']
Structure of each variable:
Variable: dihedrals
Type: <class 'numpy.ndarray'>
Shape: (295, 1)
Data type: object
First few elements:
[array([[3.64495732, 2.42104095, 1.54115443,        nan],
        [1.18972157, 1.15738416, 2.07942048,        nan],
        [1.90693256, 5.72842837, 5.6320921 ,        nan],
        ...,
        [1.77526998, 1.5740378 , 2.12850539,        nan],
        [2.97226818, 2.5168138 , 0.36831886,        nan],
        [1.09061554, 6.22481215, 1.69429451,        nan]])]
Variable: reSort
Type: <class 'numpy.ndarray'>
Shape: (1066, 6)
Data type: uint16
First few elements:
[0 0 0 0 0 0]
Inspecting file: variables/new_format/dihedralsTest.mat
Keys in the .mat file:
['__header__', '__version__', '__globals__', 'dihedrals', 'reSort']
Structure of each variable:
Variable: dihedrals
Type: <class 'numpy.ndarray'>
Shape: (295, 1)
Data type: object
First few elements:
[array([[2.75087261, 5.87442114, 5.72130917,        nan],
        [6.25840421, 3.30164357, 2.23651414,        nan],
        [5.68586318, 3.89117689, 4.28636175,        nan],
        ...,
        [0.97210598, 2.58374201, 4.17642761,        nan],
        [1.28241095, 2.97127412, 4.97082563,        nan],
        [1.30699512, 5.13538554, 5.81914576,        nan]])]
Variable: reSort
Type: <class 'numpy.ndarray'>
Shape: (1066, 6)
Data type: uint16
First few elements:
[0 0 0 0 0 0]2648, 5.98582133], [ nan, 3.05658081, 0.57866149, 0.30948201], ..., [ nan, 3.265389 , 5.67503575, 5.87973518], [ nan, 2.88017197, 6.08011377, 6.25613019], [ nan, 2.98105289, 5.53090046, 5.81683091]])] Variable: reSort Type: <class 'numpy.ndarray'> Shape: (1066, 6) Data type: uint16 First few elements: [ 1 2 1 7 15 17] Inspecting new format .mat files Inspecting file: variables/new_format/dihedralsRef.mat Keys in the .mat file: ['__header__', '__version__', '__globals__', 'dihedrals', 'reSort'] Structure of each variable: Variable: dihedrals Type: <class 'numpy.ndarray'> Shape: (295, 1) Data type: object First few elements: [array([[ -78.03645168, -3.86214436, -61.29069864, nan], [ -93.05805803, 120.537905 , -60.75243259, nan], [-130.03995889, 24.57798429, -63.48294628, nan], ..., [ -98.75569494, 14.14040841, -54.42016237, nan], [ -91.27551143, 84.19822279, -87.59627544, nan], [ -86.87397876, 87.90622114, -61.13755856, nan]])] Variable: reSort Type: <class 'numpy.ndarray'> Shape: (1066, 6) Data type: uint16 First few elements: [0 0 0 0 0 0] Inspecting file: variables/new_format/dihedralsTest.mat Keys in the .mat file: ['__header__', '__version__', '__globals__', 'dihedrals', 'reSort'] Structure of each variable: Variable: dihedrals Type: <class 'numpy.ndarray'> Shape: (295, 1) Data type: object First few elements: [array([[ -72.64735107, -12.97513478, -63.39372921, nan], [-125.68848724, 15.86801418, -54.31215363, nan], [ -69.7123605 , 135.83806834, -45.9791207 , nan], ..., [ -68.1429324 , 115.68107754, -77.50498138, nan], [ 64.11426402, 46.95357127, -57.86102744, nan], [ 57.85566288, 42.83449738, -44.4463367 , nan]])] Variable: reSort Type: <class 'numpy.ndarray'> Shape: (1066, 6) Data type: uint16 First few elements: [0 0 0 0 0 0]
log2:
python inspect_mat_format.py
Inspecting original .mat file formats
Inspecting file: variables/dihedralsRef.mat
Keys in the .mat file:
['__header__', '__version__', '__globals__', 'dihedrals', 'reSort']
Structure of each variable:
Variable: dihedrals
Type: <class 'numpy.ndarray'>
Shape: (278, 1)
Data type: object
First few elements:
[array([[       nan, 6.26505222, 0.60820916, 0.41011049],
        [       nan, 3.11268983, 0.68137676, 0.4586515 ],
        [       nan, 3.29773689, 5.73342441, 5.85807488],
        ...,
        [       nan, 2.95955215, 5.54917202, 5.84895467],
        [       nan, 3.03202124, 5.58968382, 5.8660925 ],
        [       nan, 2.46877949, 0.38443584, 0.37765127]])]
Variable: reSort
Type: <class 'numpy.ndarray'>
Shape: (1066, 6)
Data type: uint16
Detailed reSort array analysis:
Shape: (1066, 6)
Total entries: 1066
Column descriptions (MATLAB reference):
Column 0: First atom index
Column 1: Dihedral type (0=SC, 1=BB1/phi, 2=BB2/psi)
Columns 2-4: Other atom indices forming the dihedral
Column 5: Additional parameter (if present)
Found 278 unique residues
First 10 entries with full details:
Entry 0: [ 1  2  1  7 15 17]
Entry 1: [ 1  0  1  7  9 12]
Entry 2: [ 1  0  4  7  9 12]
Entry 3: [ 2  1 15 17 19 32]
Entry 4: [ 2  2 17 19 32 34]
Entry 5: [ 2  0 17 19 21 26]
Entry 6: [ 2  0 19 21 24 26]
Entry 7: [ 3  1 32 34 36 53]
Entry 8: [ 3  2 34 36 53 55]
Entry 9: [ 3  0 34 36 38 41]
Dihedral type counts:
SC: 512
BB1/phi: 277
BB2/psi: 277
Inspecting file: variables/dihedralsTest.mat
Keys in the .mat file:
['__header__', '__version__', '__globals__', 'dihedrals', 'reSort']
Structure of each variable:
Variable: dihedrals
Type: <class 'numpy.ndarray'>
Shape: (278, 1)
Data type: object
First few elements:
[array([[       nan, 6.16714437, 0.45286268, 0.43257636],
        [       nan, 2.69939788, 6.02492648, 5.98582133],
        [       nan, 3.05658081, 0.57866149, 0.30948201],
        ...,
        [       nan, 3.265389  , 5.67503575, 5.87973518],
        [       nan, 2.88017197, 6.08011377, 6.25613019],
        [       nan, 2.98105289, 5.53090046, 5.81683091]])]
Variable: reSort
Type: <class 'numpy.ndarray'>
Shape: (1066, 6)
Data type: uint16
Detailed reSort array analysis:
Shape: (1066, 6)
Total entries: 1066
Column descriptions (MATLAB reference):
Column 0: First atom index
Column 1: Dihedral type (0=SC, 1=BB1/phi, 2=BB2/psi)
Columns 2-4: Other atom indices forming the dihedral
Column 5: Additional parameter (if present)
Found 278 unique residues
First 10 entries with full details:
Entry 0: [ 1  2  1  7 15 17]
Entry 1: [ 1  0  1  7  9 12]
Entry 2: [ 1  0  4  7  9 12]
Entry 3: [ 2  1 15 17 19 32]
Entry 4: [ 2  2 17 19 32 34]
Entry 5: [ 2  0 17 19 21 26]
Entry 6: [ 2  0 19 21 24 26]
Entry 7: [ 3  1 32 34 36 53]
Entry 8: [ 3  2 34 36 53 55]
Entry 9: [ 3  0 34 36 38 41]
Dihedral type counts:
SC: 512
BB1/phi: 277
BB2/psi: 277
Inspecting new format .mat files
Inspecting file: variables/new_format/dihedralsRef.mat
Keys in the .mat file:
['__header__', '__version__', '__globals__', 'dihedrals', 'reSort']
Structure of each variable:
Variable: dihedrals
Type: <class 'numpy.ndarray'>
Shape: (295, 1)
Data type: object
First few elements:
[array([[3.64495732, 2.42104095, 1.54115443,        nan],
        [1.18972157, 1.15738416, 2.07942048,        nan],
        [1.90693256, 5.72842837, 5.6320921 ,        nan],
        ...,
        [1.77526998, 1.5740378 , 2.12850539,        nan],
        [2.97226818, 2.5168138 , 0.36831886,        nan],
        [1.09061554, 6.22481215, 1.69429451,        nan]])]
Variable: reSort
Type: <class 'numpy.ndarray'>
Shape: (775, 6)
Data type: uint16
Detailed reSort array analysis:
Shape: (775, 6)
Total entries: 775
Column descriptions (MATLAB reference):
Column 0: First atom index
Column 1: Dihedral type (0=SC, 1=BB1/phi, 2=BB2/psi)
Columns 2-4: Other atom indices forming the dihedral
Column 5: Additional parameter (if present)
Found 295 unique residues
First 10 entries with full details:
Entry 0: [ 2  1 14 16 18 31]
Entry 1: [ 2  2 16 18 31 33]
Entry 2: [ 2  0 16 18 20 25]
Entry 3: [ 3  1 31 33 35 52]
Entry 4: [ 3  2 33 35 52 54]
Entry 5: [ 3  0 33 35 37 40]
Entry 6: [ 4  1 52 54 56 66]
Entry 7: [ 4  2 54 56 66 68]
Entry 8: [ 4  0 54 56 58 61]
Entry 9: [ 5  1 66 68 70 87]
Dihedral type counts:
SC: 185
BB1/phi: 295
BB2/psi: 295
Inspecting file: variables/new_format/dihedralsTest.mat
Keys in the .mat file:
['__header__', '__version__', '__globals__', 'dihedrals', 'reSort']
Structure of each variable:
Variable: dihedrals
Type: <class 'numpy.ndarray'>
Shape: (295, 1)
Data type: object
First few elements:
[array([[2.75087261, 5.87442114, 5.72130917,        nan],
        [6.25840421, 3.30164357, 2.23651414,        nan],
        [5.68586318, 3.89117689, 4.28636175,        nan],
        ...,
        [0.97210598, 2.58374201, 4.17642761,        nan],
        [1.28241095, 2.97127412, 4.97082563,        nan],
        [1.30699512, 5.13538554, 5.81914576,        nan]])]
Variable: reSort
Type: <class 'numpy.ndarray'>
Shape: (775, 6)
Data type: uint16
Detailed reSort array analysis:
Shape: (775, 6)
Total entries: 775
Column descriptions (MATLAB reference):
Column 0: First atom index
Column 1: Dihedral type (0=SC, 1=BB1/phi, 2=BB2/psi)
Columns 2-4: Other atom indices forming the dihedral
Column 5: Additional parameter (if present)
Found 295 unique residues
First 10 entries with full details:
Entry 0: [ 2  1 14 16 18 31]
Entry 1: [ 2  2 16 18 31 33]
Entry 2: [ 2  0 16 18 20 25]
Entry 3: [ 3  1 31 33 35 52]
Entry 4: [ 3  2 33 35 52 54]
Entry 5: [ 3  0 33 35 37 40]
Entry 6: [ 4  1 52 54 56 66]
Entry 7: [ 4  2 54 56 66 68]
Entry 8: [ 4  0 54 56 58 61]
Entry 9: [ 5  1 66 68 70 87]
Dihedral type counts:
SC: 185
BB1/phi: 295
BB2/psi: 295