全部数据集 上传数据集
图预览 · 显示
显示 / 隐藏      
数据集介绍

malaria.net

You have downloaded the "malaria" data set that was used by Larremore, Clauset, and Jacobs in the paper "Efficiently inferring community structure in bipartite networks."

http://danlarremore.com/bipartiteSBM
larremor@hsph.harvard.edu

// FILE LIST

There are 4 files:
    1. malaria.edgelist - a tab-separated list of edges in the malaria network, in the form: i j w. In this case, all weights w are equal to 1.
    2. malaria.types - a list of the types of all vertices in the malaria network, which is bipartite and comprises genes (type 1) and substrings (type 2).
    3. malaria.partition - the partition shown in Figures 6 and 7 of the paper.
    4. malaria.mat - A MATLAB file that contains:
        A - the adjacency matrix
        B - the bipartite adjacency matrix
        N_a, N_b - the numbers of genes and substrings, respectively.
        P_a, P_b - both weighted one-mode projections.
        geneSequences - the genes themselves, at the amino acid level. See note below.
        geneSequenceHeaders - the names of the genes
        substrings - the substrings that were extracted from the sequences
        g - the partition shown in Figures 6 and 7 of the paper.

// A NOTE ABOUT SEQUENCE DATA

These sequences were initially published by Thomas S. Rask, et al. but were analyzed using more traditional genetic techniques:

    Rask, T. S., Hansen, D. A., Theander, T. G., Gorm Pedersen, A., & Lavstsen, T. (2010). Plasmodium falciparum Erythrocyte Membrane Protein 1 Diversity in Seven Genomes – Divide and Conquer. PLoS Computational Biology, 6(9), e1000933. doi:10.1371/journal.pcbi.1000933

The same sequences were reanalyzed using complex networks in their Highly Variable Regions by Daniel B. Larremore, Aaron Clauset, and Caroline O. Buckee. The sequence data provided here correspond to HVR6 of their paper.

    Larremore, D. B., Clauset, A., & Buckee, C. O. (2013). A Network Approach to Analyzing Highly Recombinant Malaria Parasite Genes. PLoS Computational Biology, 9(10), e1003268. doi:10.1371/journal.pcbi.1003268.s010
数据预览
起点 终点 数值
社团结果
社团 大小 节点
图信息
基本统计 · 计算 说明

N and E are the number of nodes and links. 〈k〉 and 〈d〉 are the average degree and the average distance, respectively. C and r are the average clustering coefficient and the assortative coefficient. H is the degree heterogeneity. βc is the epidemic threshold of the SIR model.

N 1103
E 2964
<k>  
<d>  
<C>  
r  
H  
beta_c
度分布 · 绘制
结果

社团个数:

模块度 (Q)

运行时间(秒)

导出格式