NMI是Normalized Mutual Information的简称,用于比较社团划分结果与ground-truth之间的接近程度,取值范围为[0, 1],出自2006年
Danon 的论文 [1]。 有两种计算方法,为了方便大家检测结果,写了一个通用的Python版计算函数,当然也可以直接调用库函数计算。代码如下:
# -*- coding: utf-8 -*-
import math
def NMI(c1, c2):
'''
Calculate Normalized Mutual Information (NMI) of community detection result c1 and ground_truth c2.
Format:
c1 = [[1, 2, 3], [4, 5, 6]]
c2 = [[1, 2 ,3], [4, 5, 6]]
nmi = NMI(c1, c2)
print(nmi)
Return: NMI value in range [0, 1]
Reference: Danon L, Diazguilera A, Arenas A. Effect of size heterogeneity
on community identification in complex networks[J]. Arxiv Physics, 2006, 2006(11):11010.
'''
nmi = 0
u = 0
d = 0
n1 = 0
n2 = 0
for i in c1:
n1 += len(i)
for j in c2:
n2 += len(j)
assert n1 == n2, 'ERROR: Size of c1 is not equal to c2'
s = n1
for i in c1:
ni = len(i)
for j in c2:
nj = len(j)
nij = len(set(i) & set(j))
if nij == 0:
continue
logt = math.log((nij * s)/(ni*nj))
u += nij * logt
u *= -2
for i in c1:
ni = len(i)
d += ni * math.log(ni / s)
for j in c2:
nj = len(j)
d += nj * math.log(nj / s)
if d != 0:
nmi = u / d
return nmi
def get_ground_truth(G, key):
'''
根据网络节点的key值获取社团划分的标准结构
例如:
输入为 G, G 中包含节点的实际社团标签,如果用label表示
G.node[1]['label'] = 1 # 表示G中的节点1属于1社团
G.node[2]['label'] = 1 # 表示G中的节点2属于1社团
G.node[3]['label'] = 2 # 表示G中的节点3属于1社团
输出结果为:
[[1,2], [3]]
'''
community = []
groups = set()
for n in G.nodes():
groups.add(G.nodes[n][key])
for g in groups:
community.append([n for n in G.nodes() if G.nodes[n][key]==g])
return community
if __name__ == '__main__':
c1 = [[1, 2, 3], [4, 5, 6]]
c2 = [[1, 2 ,3], [4, 5, 6]]
nmi = NMI(c1, c2)
print(nmi)
c1 = [[1, 2, 3], [4, 5, 6]]
c2 = [[1, 2],[3], [4], [5, 6]]
nmi = NMI(c1, c2)
print(nmi)
c1 = [[1, 2, 3, 4, 5, 6]]
c2 = [[1, 2],[3], [4, 5, 6]]
nmi = NMI(c1, c2)
print(nmi)
c1 = [[1, 2, 3], [4, 5, 6]]
c2 = [[1, 2],[3], [4, 5, 6], [7]]
nmi = NMI(c1, c2)
print(nmi)
参考资料:
- Danon L, Diazguilera A, Arenas A. Effect of size heterogeneity on community identification in complex networks[J]. Arxiv Physics, 2006, 2006(11):11010.