Authors are nodes. An undirected edge joins two authors if they co-published at least one paper across the 13 venues indexed by the atlas (ICRA · IROS · RA-L · T-RO · RSS · IJRR · Sci-Rob · SoRo · T-Mech · T-FR · RA-P · T-ASE · RAM). Edge weight = how many papers they co-authored together.
Two thresholds prune the graph before visualisation:
MIN_AUTHOR_PAPERS = 3 — each author must have ≥ 3 papers in the dataset. Keeps students/one-off attendees out.MIN_EDGE_COLLABS = 2 — each edge must represent ≥ 2 co-authored papers. Filters accidental single-paper collaborations.
Result: 23,806 authors connected by 68,598 edges in the exported
dataset. The Edge threshold slider on the viewer further filters
edges at render time (default = 5, i.e. only show pairs who co-wrote 5+ papers).
Definition: authors with the most distinct co-authors visible in the current view.
The ranking is recomputed whenever the edge threshold changes — a researcher may be a top-5 hub at threshold 2 (lots of weak collaborations) but drop out at threshold 10 (only a handful of deeply repeated co-authors).
A community is a group of authors who collaborate more densely among themselves than with the rest of the network. The atlas discovers 93 of them.
We extract the giant component at edge weight ≥ 5 (6,375 authors connected by 10,284 strong-tie edges) and run the Leiden algorithm (Traag, Waltman & van Eck, 2019) on it. Leiden is the modern successor to Louvain (Blondel et al., 2008): it optimises modularity
and repairs Louvain's bug where a community can end up internally disconnected. On the mainland we get:
| algorithm | communities | modularity Q | agreement |
|---|---|---|---|
| Louvain | 76 | 0.9398 | ARI 0.875 · NMI 0.953 — virtually the same partition |
| Leiden (used) | 83 | 0.9408 |
A Q above 0.3 is already considered strong community structure; 0.94 is textbook-grade — robotics is an extremely well-clustered field.
The mainland Leiden run only labels 6,375 authors. The remaining 17,431 have only weak ties (edge weight 2–4). We iterate a weighted-majority vote: an unlabelled author joins the community most represented among their co-authors, weighted by collaboration count. Two to three passes converge. 15,000+ fringe authors get absorbed into their strongest-connected mainland community.
The few thousand authors who are still unlabelled live on islands — connected subgraphs with no weight-2+ path to the mainland (niche labs who never collaborate outward):
Final: 83 mainland + 9 islands + 1 misc = 93 communities. Every visible node has a community.
For each community we concatenate every paper title authored by a member into one document. 93 documents are fed to scikit-learn's TF-IDF vectorizer, which scores each word by
The top 10 words by TF-IDF become the community's label — so we get quadrupedal · visual-inertial · racing for ETH instead of generic words like robot or control. Preprocessing: lower-case, strip a hand-picked robotics stopword list (robot, system, control, learning…), and require tokens to be ≥ 3 letters.
Within each community we rank members by their weighted degree in the full (weight ≥ 2) graph and keep the top 5. The tooltip shows the first 3. This picks "the professor with the most collaboration mass inside their community".
| Top 100 hubs | Communities | |
|---|---|---|
| Unit | individual | group |
| Metric | degree (co-author count) | Leiden modularity + TF-IDF |
| Question | "Who is the most connected researcher?" | "Which research neighbourhoods exist?" |
| Overlap | one person can span two groups | partition (each author in exactly one) |
| Sensitive to edge threshold? | yes (ranking recomputed) | no (assignment fixed at build time) |
On the mainland, Leiden finds a partition with modularity Q = 0.94. To calibrate: Q > 0.3 is usually considered "a clear community structure" in the literature; Q > 0.7 is rare outside small engineered networks. Robotics sits right at the high end — labs really are labs. Collaboration happens overwhelmingly inside the group with only occasional cross-community edges (advisor → new appointee, sabbatical, big collaborations, dual appointments).
Size distribution of the 92 non-misc communities:
| size band | count | rough analogue |
|---|---|---|
| 500+ authors | 15 | megacity — multi-decade flagship labs with many sub-teams |
| 200–499 | 18 | big city — established research groups |
| 100–199 | 26 | mid city — active single-PI or small-PI groups |
| 50–99 | 12 | town |
| 10–49 | 19 | village — niche lab or national-institute group |
| < 10 (misc) | 1,655 people | hamlets — one-off or very small teams lumped together |
"Six degrees of separation" is the claim that any two people in the world are connected by at most six acquaintance-links. It originated in a 1929 short story by Hungarian writer Frigyes Karinthy ("Chains") and was empirically tested by psychologist Stanley Milgram (1967): letters forwarded hand-to-hand across the US reached their target in a median of 5.2 steps. Popular culture knows it as the Kevin Bacon game (actor co-starring distance, median ≈ 3) and the Erdős number (mathematician co-authorship distance, median ≈ 5). Facebook (2011) measured 4.7 hops between arbitrary users worldwide.
We measured the same quantity on the weight ≥ 2 giant component (22,043 authors, 66,866 edges) using 2,000 random node pairs:
Robotics researchers are on average 6 hops apart — the same "six degrees" Karinthy guessed in 1929 for all of humanity, reproduced within a single research field. The field is a textbook small-world network: mostly-local clustering plus a handful of long-range shortcuts. For context, log(22,043) ≈ 9.9, so the observed 5.94 is tighter than what pure-random short-cutting would predict — the hub structure (big labs collaborating widely) pulls everyone closer together.
The Show 2nd/3rd degree connections toggle highlights neighbours up to 3 hops from the selected author. But the path-length distribution above shows that only about 11% of author pairs are within 3 hops — the remaining 89% are 4 hops or more. Inside a community, too, members are often 5–8 hops apart (think PhD student → advisor → co-author → long-time collaborator → …). So when you select an author and their same-community peers don't light up, it's not a bug — it's the small-world structure at work. To actually visually bridge to everyone in the community, the toggle would need to go to roughly 6 hops, at which point almost the entire mainland lights up.
A community is best thought of as a research family tree: not everyone shares an edge with the patriarch, but everyone is connected via advisor / co-author chains to a few top hubs at the centre. The hubs field in the tooltip lists the patriarchs; the rest are second/third/fourth-cousins who inherited the lab's style through their supervisors. This is related to the established concept of academic genealogy, which traces explicit advisor-advisee links — our network recovers a noisier version of the same structure from pure publication co-authorship.
All of the above is implemented in two scripts:
_make_coauthor_network.py — builds the graph and the HTML viewer._enrich_communities.py — runs Leiden + label propagation + TF-IDF and writes the community field back into coauthor_network.json.Source: github.com/gisbi-kim/robopaper-atlas.
← Back to the co-author network