The Citation Graveyard: What 20 Years of Wikipedia Link Churn Reveals About SEO

·

·

,

👁 7 views

Every link that ever pointed from a Wikipedia article to an SEO tool, agency, or thought leader is a small historical artifact. Most of them are dead now. And today I built a museum for them.

The Problem With Wikipedia Data

I have been building the Wiki Analysis feature on seobandwagon.dev — a dashboard that maps 20 years of SEO history through 51 Wikipedia articles. We are talking everything from Search Engine Optimization to Google Ads to Danny Sullivan, with historical snapshots pulled from the Wayback Machine and SERP data from DataForSEO.

The dataset was getting good. 298 historical snapshots. 51 articles. Roughly 900,000+ estimated monthly traffic across all of them. But something was missing: the raw data tells you what is cited now, but it does not tell you what got dropped along the way.

That is where the Citation Graveyard was born.

Building the Graveyard

The idea was simple: compare citations across time periods and surface the ones that disappeared. A domain that appears in a 2008 Wikipedia snapshot but vanishes by 2024 is not just gone — it is historically informative. It means Wikipedia editors (and by extension, the SEO community collective wisdom) decided it was not worth citing anymore.

The implementation was less simple. Historical Wikipedia data is messy. Not all articles have snapshots from every year. Some 2008 snapshots have suspiciously low word counts — data gaps from the original pipeline. The link extraction varied by era because Wikipedia markup format evolved.

After a few rounds of query iteration, I had it: a new public page at /wiki-analysis/churn with three tabs:

  • Citation Graveyard — domains dropped from Wikipedia SEO articles, sorted by how long they were cited before disappearing
  • New Citations (2017+) — fresh faces that earned a Wikipedia mention in the last several years
  • SEOMoz to Moz Rebrand Timeline — and this is where it got interesting

The Plot Twist Hidden in the Data

I was not expecting to find a clean confirmation of a major industry rebrand hiding in link data — but there it was. seomoz.org appears consistently in Wikipedia citations from 2008 through 2011, then drops completely. moz.com shows up starting in 2014 and holds through the present.

That is not a coincidence — that is Rand Fishkin 2012 rebrand, preserved in amber inside Wikipedia citation history. The community knew: the tool changed names, so the citations changed domains. The data validated itself.

It is one of those moments where you are doing routine data plumbing and then suddenly you have accidentally built a timeline of SEO industry history. Matt Cutts article dropped from 8 external links in 2008 to 1 by 2024. That is a story about trust signals eroding as Google became more opaque. The graveyard tells that story without editorial commentary — just data.

The Vocabulary of SEO: 20 Years

After shipping the graveyard, I kept the momentum going and built the N-gram timeline: a section that tracks which words dominated SEO writing over time.

Three tabs here too:

  • Trending — rising terms (content marketing, mobile, user experience) vs. fading ones (pagerank sculpting, spamdexing)
  • Article View — pick any two years for any article, compare the top terms side-by-side
  • Term Search — type a term, get a sparkline across 7 years plus which articles used it each year

The trending data tells a story you already know intuitively but rarely see quantified: pagerank sculpting was a real thing people wrote about in SEO articles. Then Google made it clear that was gaming the system, and the term literally disappeared from the corpus. Mobile went from an occasional mention to a dominant signal. User experience went from zero to everywhere.

The algorithm changed, and the vocabulary followed.

SERP Data for the People Pages

We also added people and company pages to the dataset today: Danny Sullivan, Matt Cutts, Barry Schwartz, John Mueller, Semrush, SpyFu, Neil Patel, and others. That meant pulling DataForSEO SERP data for their Wikipedia pages.

Quick lesson learned: DataForSEO ranked_keywords/live endpoint is domain-level, not page-level. You cannot pass a specific Wikipedia URL as the target — you have to target the entire domain and then filter results by URL. Once I knew that, the pull worked cleanly. 49 of 51 pages now have SERP data. Two pages (Search Engine Land and Yoast) legitimately have no Wikipedia ranking data, which itself is interesting.

Semrush Wikipedia article ranks #3 for the keyword semrush at 90,500 monthly searches. That is what brand authority looks like at scale.

What This Is Actually Building Toward

The Wiki Analysis tool is getting close to public launch. The citation churn and vocabulary data are not just interesting side features — they are the kind of insight that makes someone bookmark a tool and come back to it.

Wikipedia has been cataloging what the SEO industry considers credible for two decades. Every link added, every link dropped, every rebrand, every shift in vocabulary — it is all in there. We are just finally reading it properly.

The graveyard will keep growing. So will everything buried in it.

Stay in the loop

Get WordPress + AI insights delivered to your inbox. No spam, unsubscribe anytime.

We respect your privacy. Read our privacy policy.


Recommended Posts