利用LFR Benchmark 生成带标签的网络数据集

hxy

No pains, no gains.

免责声明：网站内容仅供个人学习记录，禁做商业用途，转载请注明出处。

版权所有 © 2017-2020 NEUSNCP个人学习笔记辽ICP备17017855号-2

利用LFR Benchmark 生成带标签的网络数据集

hxy 2019年6月12日 09:47:31

现实世界中，获取大规模带标签的网络数据集比较困难。当然，不排除斯坦福的SNAP [1]，Newman大牛的个人主页[2]等给出了很多有价值的数据集。但是，如果想要构建符合自己需求的带有 ground truth的数据集就比较困难了。复杂网络领域的另一位大牛，Santo Fortunato [3]，给出了一个Linux版本的程序，可以根据参数配置生成比较理想的数据集。下载地址：LFR Benchmark

使用方法：

命令：

.\benchmark.exe -N 1000 -k 15 -maxk 20 -mu 0.1 -minc 20 -maxc 30

当然，生成的是.dat文件，如果在 Python 中使用还需要一定的处理。

试着写了一个转化函数，如下所示：

def read_LFR():
    ''' 基于LFR——benchmark生成后的数据进行处理，生成预览及GML文件
        参考文献： A. Lancichinetti, S.Fortunato, F. Radicchi, Benchmark graphsfor testing community detection algorithms, Phys. Rev. E 78 (2008) 046-110.
    '''
    G = nx.Graph()

    # 定义边文件和社团文件
    edge_file = './data/LFR_benchmark/network.dat'
    community_file = './data/LFR_benchmark/community.dat'

    # 读取数据，并以tab键分割
    data = csv.reader(open(edge_file, 'r'), delimiter='\t')
    # 加边
    edges = [(d[0], d[1]) for d in data]
    G.add_edges_from(edges)
    # 读社团
    community = csv.reader(open(community_file, 'r'), delimiter='\t')
    # 标记社团
    labels = {d[0]:d[1] for d in community}
    for n in G.nodes():
        G.nodes[n]['group'] = labels[n]

    groups = labels.values()
    nodes = labels.keys()
    partition = [ [n for n in nodes if labels[n] == g] for g in groups ]
    
    # 绘图
    draw_communities(G, partition)

    # 另存为gml文件，可以注册NEUSNCP账号后上传gml文件，使用 https://neusncp.com/api/cd 的可视化工具验证
    nx.write_gml(G, './res/result.gml')
    
    # 计算一下模块度
    from networkx.algorithms.community.quality import modularity, performance
    print(modularity(G, partition) )

看一下效果哈

看上去还可以。

值得期待的是，12days ago, NetworkX的开源项目中，已经将LFR benchmark列入了2.4版本的generators里，目前Networkx的2.3版本还没有添加这个函数，以后就可以直接调用了。
Move LFR_benchmark to generators (#3411)
* Move LFR_benchmark to generators
* Correct import line in docstring
* Removed LFR from algorithms. community.rst and put in generators.rst
Fixes #3404

NetworkX开源地址：https://github.com/networkx/networkx

参考文献：

SNAP.http://snap.stanford.edu/data/index.html
Newman的个人主页. http://www-personal.umich.edu/~mejn/
A. Lancichinetti, S.Fortunato, F. Radicchi, Benchmark graphsfor testing community detection algorithms, Phys. Rev. E 78 (2008) 046110.

最近更新： 2020年5月2日 09:04:12

浏览： 7.3K

您的评论 *

[[total]] 条评论

添加评论

[[item.time]]

[[item.user.username]] [[item.floor]]楼

[[cc.time]]

[[cc.user.username]] #[[cc.room]]

- «
- 1
- ...
- [[i]]
- ...
- »

点击加载更多……
添加评论
登录后即可回复

添加评论登录后即可回复

hxy

300

4.7K

利用LFR Benchmark 生成带标签的网络数据集

[[total]] 条评论