RoboPaper Atlas
← Back to the co-author network

How the co-author network works

Two analyses share the same graph: Top 100 hubs ranks individuals, Communities discovers groups. They answer different questions.

The underlying graph

Authors are nodes. An undirected edge joins two authors if they co-published at least one paper across the 13 venues indexed by the atlas (ICRA · IROS · RA-L · T-RO · RSS · IJRR · Sci-Rob · SoRo · T-Mech · T-FR · RA-P · T-ASE · RAM). Edge weight = how many papers they co-authored together.

Two thresholds prune the graph before visualisation:

Result: 23,806 authors connected by 68,598 edges in the exported dataset. The Edge threshold slider on the viewer further filters edges at render time (default = 5, i.e. only show pairs who co-wrote 5+ papers).

Top 100 hubs — who is the most connected?

Definition: authors with the most distinct co-authors visible in the current view.

hub_score(author) = degree(author)     (= number of co-authors in the currently-visible subgraph)
tie-break = weighted degree(author)   (= Σ edge weights = total collaborations)

The ranking is recomputed whenever the edge threshold changes — a researcher may be a top-5 hub at threshold 2 (lots of weak collaborations) but drop out at threshold 10 (only a handful of deeply repeated co-authors).

Hubs measure individual reach. A hub can bridge two unrelated research communities (e.g. someone who writes occasionally with both an aerial lab and a SLAM lab); that bridging role is not captured by community detection, which must ultimately assign them to one side.

Communities — who are the research neighbourhoods?

A community is a group of authors who collaborate more densely among themselves than with the rest of the network. The atlas discovers 93 of them.

Step 1 — Detect the mainland with Leiden

We extract the giant component at edge weight ≥ 5 (6,375 authors connected by 10,284 strong-tie edges) and run the Leiden algorithm (Traag, Waltman & van Eck, 2019) on it. Leiden is the modern successor to Louvain (Blondel et al., 2008): it optimises modularity

Q = Σcommunities [ (edges inside) − (expected edges under random rewiring) ]

and repairs Louvain's bug where a community can end up internally disconnected. On the mainland we get:

algorithmcommunitiesmodularity Qagreement
Louvain760.9398ARI 0.875 · NMI 0.953 — virtually the same partition
Leiden (used)830.9408

A Q above 0.3 is already considered strong community structure; 0.94 is textbook-grade — robotics is an extremely well-clustered field.

Step 2 — Label propagation to absorb the fringe

The mainland Leiden run only labels 6,375 authors. The remaining 17,431 have only weak ties (edge weight 2–4). We iterate a weighted-majority vote: an unlabelled author joins the community most represented among their co-authors, weighted by collaboration count. Two to three passes converge. 15,000+ fringe authors get absorbed into their strongest-connected mainland community.

Step 3 — Islands and the misc bucket

The few thousand authors who are still unlabelled live on islands — connected subgraphs with no weight-2+ path to the mainland (niche labs who never collaborate outward):

Final: 83 mainland + 9 islands + 1 misc = 93 communities. Every visible node has a community.

Step 4 — Topic labels (TF-IDF on paper titles)

For each community we concatenate every paper title authored by a member into one document. 93 documents are fed to scikit-learn's TF-IDF vectorizer, which scores each word by

tfidf(word, community) = tf(word in this community) × log(total communities / communities containing word)

The top 10 words by TF-IDF become the community's label — so we get quadrupedal · visual-inertial · racing for ETH instead of generic words like robot or control. Preprocessing: lower-case, strip a hand-picked robotics stopword list (robot, system, control, learning…), and require tokens to be ≥ 3 letters.

Step 5 — Hub authors per community

Within each community we rank members by their weighted degree in the full (weight ≥ 2) graph and keep the top 5. The tooltip shows the first 3. This picks "the professor with the most collaboration mass inside their community".

When to use which

Top 100 hubsCommunities
Unitindividualgroup
Metricdegree (co-author count)Leiden modularity + TF-IDF
Question"Who is the most connected researcher?""Which research neighbourhoods exist?"
Overlapone person can span two groupspartition (each author in exactly one)
Sensitive to edge threshold?yes (ranking recomputed)no (assignment fixed at build time)

Findings & discussion

The robotics community is textbook-modular

On the mainland, Leiden finds a partition with modularity Q = 0.94. To calibrate: Q > 0.3 is usually considered "a clear community structure" in the literature; Q > 0.7 is rare outside small engineered networks. Robotics sits right at the high end — labs really are labs. Collaboration happens overwhelmingly inside the group with only occasional cross-community edges (advisor → new appointee, sabbatical, big collaborations, dual appointments).

Size distribution of the 92 non-misc communities:

size bandcountrough analogue
500+ authors15megacity — multi-decade flagship labs with many sub-teams
200–49918big city — established research groups
100–19926mid city — active single-PI or small-PI groups
50–9912town
10–4919village — niche lab or national-institute group
< 10 (misc)1,655 peoplehamlets — one-off or very small teams lumped together

Six degrees of separation — in our data

"Six degrees of separation" is the claim that any two people in the world are connected by at most six acquaintance-links. It originated in a 1929 short story by Hungarian writer Frigyes Karinthy ("Chains") and was empirically tested by psychologist Stanley Milgram (1967): letters forwarded hand-to-hand across the US reached their target in a median of 5.2 steps. Popular culture knows it as the Kevin Bacon game (actor co-starring distance, median ≈ 3) and the Erdős number (mathematician co-authorship distance, median ≈ 5). Facebook (2011) measured 4.7 hops between arbitrary users worldwide.

We measured the same quantity on the weight ≥ 2 giant component (22,043 authors, 66,866 edges) using 2,000 random node pairs:

mean shortest path = 5.94 hops median = 6 diameter (2-sweep) ≈ 17 hops   1 hop: 0.1% 2: 0.2% 3: 1.9% 4: 8.6% ███ 5: 26.2% ██████████ 6: 34.1% █████████████ 7: 18.1% ███████ 8: 7.7% ███ 9: 2.2% 10+: 0.9%

Robotics researchers are on average 6 hops apart — the same "six degrees" Karinthy guessed in 1929 for all of humanity, reproduced within a single research field. The field is a textbook small-world network: mostly-local clustering plus a handful of long-range shortcuts. For context, log(22,043) ≈ 9.9, so the observed 5.94 is tighter than what pure-random short-cutting would predict — the hub structure (big labs collaborating widely) pulls everyone closer together.

Why "Show 2nd/3rd degree connections" leaves gaps

The Show 2nd/3rd degree connections toggle highlights neighbours up to 3 hops from the selected author. But the path-length distribution above shows that only about 11% of author pairs are within 3 hops — the remaining 89% are 4 hops or more. Inside a community, too, members are often 5–8 hops apart (think PhD student → advisor → co-author → long-time collaborator → …). So when you select an author and their same-community peers don't light up, it's not a bug — it's the small-world structure at work. To actually visually bridge to everyone in the community, the toggle would need to go to roughly 6 hops, at which point almost the entire mainland lights up.

The "academic family" analogy

A community is best thought of as a research family tree: not everyone shares an edge with the patriarch, but everyone is connected via advisor / co-author chains to a few top hubs at the centre. The hubs field in the tooltip lists the patriarchs; the rest are second/third/fourth-cousins who inherited the lab's style through their supervisors. This is related to the established concept of academic genealogy, which traces explicit advisor-advisee links — our network recovers a noisier version of the same structure from pure publication co-authorship.

Repo & source code

All of the above is implemented in two scripts:

Source: github.com/gisbi-kim/robopaper-atlas.

← Back to the co-author network