Latest Posts
The networking industry has been undergoing a quiet revolution, moving away from archaic, string-heavy command-line interfaces (CLIs) and legacy protocols like SNMP. In modern networks, gNMI (gRPC Network Management Interface) is establishing itself as the standard for telemetry and configuration. But protocol interfaces are only half the battle. Agreeing on the data schemas (representing the thousands of configuration knobs and operational states of a core router) is a far harder challenge.
This is where YANG (Yet Another Next Generation) comes in. Published by the IETF, YANG is a data modeling language designed specifically to model the configuration, state, and RPCs of network hardware.
To most software engineers, YANG looks completely alien. But it contains architectural design choices that solve problems web developers are only beginning to grapple with.
Decoupling the Wire from the Schema
The most striking design choice in YANG is the total separation of the domain model from the serialization format and transport protocol.
In the web development world, schema systems are almost always tightly bound to their transport layer. An OpenAPI (Swagger) spec defines HTTP verbs, JSON payloads, and REST endpoints simultaneously. A GraphQL schema defines exactly how queries and mutations travel over HTTP.
YANG, by contrast, models the domain (e.g., a physical linecard, an IP interface, or a BGP routing table) as an abstract tree, completely independent of the wire format. The exact same YANG model can be:
- Serialized to XML and transported over SSH using NETCONF.
- Serialized to JSON and transported over HTTP using RESTCONF.
- Serialized to Protocol Buffers or JSON and streamed over HTTP/2 using gNMI.
graph TD
Y[YANG Schema] -->|Serialized as XML| NC[NETCONF over SSH]
Y -->|Serialized as JSON| RC[RESTCONF over HTTP]
Y -->|Serialized as Protobuf| GN[gNMI over HTTP/2]
This decoupling allows network hardware to support multiple management protocols simultaneously without changing the underlying business logic or data structures.
Why Not JSON Schema, GraphQL, or Smithy?
When software engineers first encounter YANG, they naturally ask: Why invent a new DSL? Why not just use JSON Schema, GraphQL, or AWS Smithy?
Part of the answer is historical: YANG development started in 2008, long before GraphQL or Smithy existed. But even today, standard software engineering tools fail to address the unique constraints of hardware configuration.
- The Config vs. State Split: A router has a strict boundary between Intended State (what you want the device to do, e.g., configure BGP neighbor
192.0.2.1) and Operational State (what the hardware is actually doing right now, e.g., BGP session isESTABLISHED, packet counters are incrementing). YANG tags nodes natively withconfig true(read-write config) andconfig false(read-only state). - Referential Integrity: In database systems, foreign keys guarantee that if a row references a user, that user must exist. In a router, there is no SQL database. Instead, the configuration payload itself must maintain referential integrity. If you configure a routing protocol to use interface
GigabitEthernet0/1, YANG’s nativeleafreftype validates at the schema layer that the interface is actually defined in the configuration. The router will reject the entire transaction before applying it if the reference is dangling. - Derived Identity Types: YANG supports extensible
identitydefinitions, allowing vendors to subclass base identities (like routing protocols or interface media types) without rewriting the core schemas.
GraphQL SDL can model config and state via queries and mutations, but it is an API contract detailing client-server transport rather than a domain modeling language. AWS Smithy is conceptually the closest modern equivalent, modeling cloud services independently of transport protocols. However, Smithy’s tooling focuses on generating cloud SDKs and API gateways, whereas YANG is optimized for embedded hardware validation, transaction commits, and deep configuration trees.
The Promise of OpenConfig
Historically, hardware vendors (Cisco, Juniper, Arista) published their own proprietary YANG models. The path to enable an interface on a Cisco router was completely different from a Juniper switch, forcing operators to write complex translation layers.
To solve this, the OpenConfig working group (backed by major operators like Google, Microsoft, and Meta) designed a set of vendor-neutral YANG models. They enforced a strict clean architecture pattern separating config and state into sibling containers under every list item.
Here is a simplified look at the OpenConfig interfaces model:
container interfaces {
list interface {
key "name";
leaf name {
type leafref {
path "../config/name";
}
}
container config {
leaf enabled {
type boolean;
}
}
container state {
config false; // Locks this entire branch as read-only operational state
leaf enabled {
type boolean;
}
leaf oper-status {
type enumeration {
enum UP;
enum DOWN;
}
}
}
}
}
Using tools like pyang, this YANG schema translates into a clean, readable operational tree:
module: openconfig-interfaces
+--rw interfaces
+--rw interface* [name]
+--rw name -> ../config/name
+--rw config
| +--rw enabled? boolean
+--ro state
+--ro enabled? boolean
+--ro oper-status? enumeration
A network automation script reading this schema instantly knows it can edit /interfaces/interface/config/enabled, but must only read /interfaces/interface/state/oper-status.
Dev UX
While YANG is architecturally brilliant, the developer experience for software engineers is notoriously painful.
Because YANG is a custom DSL, the software ecosystem is narrow. If you write C or C++, you use libyang. Python developers rely on pyang and pyangbind. In Go, the standard tool is ygot, which generates large Go structs from YANG files, complete with path-mapping helpers. But if you work in Rust, TypeScript, or Swift, you are largely on your own.
Furthermore, compiling and validating these schemas is computationally heavy. A modern router’s schema consists of hundreds of imported modules, augmented fields, and cross-tree validation rules. Building the schema AST in memory can consume hundreds of megabytes of RAM and take several seconds of CPU time, long before a single byte of telemetry is processed.
To mitigate this, tooling often compiles YANG models into optimized native code (such as static Go/C structs) or intermediate schemas (like Protocol Buffers) at build time. This allows applications to reap the architectural benefits of YANG’s validation and type safety without directly parsing or loading expensive schemas at runtime.
Interesting Observations
Looking at YANG’s design highlights several interesting trade-offs rather than hard and fast rules. The technical choices in YANG have distinct pros and cons:
- Model the Domain, Not the API:
- Pros: Building schemas that describe the core domain (like AWS Smithy or YANG) rather than binding schemas directly to JSON/REST protocols ensures longevity and enables protocol migration (e.g., REST to gRPC) without structural rewrites.
- Cons: It introduces substantial mapping and translation overhead, requiring serialization layers to map abstract domain structures onto concrete transport envelopes.
- First-Class Operational State:
- Pros: Separating the “intended configuration” from the “current operational reality” is a powerful pattern that prevents exposing mutable and immutable resource fields in a single flat JSON object.
- Cons: It doubles the schema complexity, forcing developers to model every attribute twice (once as configuration and once as state) and manage the reconciliation logic between them.
- Schema-Level Referential Validation:
- Pros: Enforcing relational integrity directly in the schema payload (using constructs like
leafref) is highly effective for distributed pipelines, catching invalid references before they reach the device. - Cons: It makes parsing computationally expensive. Validating a single payload requires loading the entire dependency graph in memory, making schema evaluation resource-heavy and slow.
- Pros: Enforcing relational integrity directly in the schema payload (using constructs like
- Self-Documenting Schemas:
- Pros: Similar to OpenAPI, YANG is excellent at acting as the authoritative documentation for the system. Native
descriptionandreferencestatements make YANG modules highly self-documenting, and tooling can easily parse them to generate clean API documentation or interactive tree diagrams. - Cons: Without specialized rendering tools (like
pyang), the raw YANG DSL is extremely verbose and difficult for developers outside the networking domain to read or scan directly.
- Pros: Similar to OpenAPI, YANG is excellent at acting as the authoritative documentation for the system. Native
Ultimately, YANG’s architecture is a beast to learn and expensive to run, but it is tailored to handle the complexity of physical infrastructure. It presents a clear trade-off: a heavy runtime and developer tax worth paying to replace fragile CLI regex parsing with type-safe, validated automation contracts.
You can write the perfect README, record a flawless terminal GIF, and polish your API docs for days. But the second a developer has to run brew install or copy a weird curl script just to test your tool, half of them will bounce. People are busy. They do not want to trash their local environment for a test drive.
The best approach is to just show them how it works. For Go developers, WebAssembly (WASM) is the most effective way to do that.
Seeing is Believing
The most direct way to prove your library works is to let someone use it right now in the browser.
For JavaScript developers, this has been the standard for a long time. For systems languages like Go, providing a live demo used to require a backend to execute code safely. That was usually expensive, slow, and hard to secure.
WASM changes that. You can compile your Go code into a binary that runs in the user’s browser. There is no backend, no latency after the first load, and it runs in a safe sandbox.
I ran into this wall recently with FauxRPC. It generates mock data from Protobufs. Explaining that in a README resulted in a lot of blank stares. But letting someone paste their own .proto file into a browser window and immediately see the JSON spit out? That converted them instantly.

FauxRPC running in the browser via WASM.
Building a WASM Bridge
Bridging Go and the browser’s JavaScript environment mostly comes down to the syscall/js package. You just need to register your function in the global JS scope. It looks something like this:
Go WASM Demo Example (click to expand)View on GitHub
package main
import (
"fmt"
"slices"
"syscall/js"
)
func main() {
fmt.Println("Go WebAssembly Initialized")
// Register a function in the global JavaScript scope
js.Global().Set("processData", js.FuncOf(processData))
// Keep the program alive so functions remain callable
select {}
}
func processData(this js.Value, args []js.Value) any {
if len(args) < 1 {
return "Error: No input provided"
}
input := args[0].String()
fmt.Printf("Processing input: %s\n", input)
// In a real demo, this is where you'd call your library.
// We'll reverse the string as a simple placeholder logic.
runes := []rune(input)
slices.Reverse(runes)
return string(runes)
}
The select {} part is important. It keeps the Go program running indefinitely so your functions stay available to JavaScript. Without it, the program would exit and the bridge would break.
Live Demo
Here is that exact Go code running in your browser. Type something in the box to see the WASM bridge update the output in real time:
Compiling to WASM
Compiling your Go code for the browser is a one-liner. You just need to set the GOOS and GOARCH environment variables:
GOOS=js GOARCH=wasm go build -o demo.wasm ./go/demo/main.go
Loading in the Browser
You will need a glue file called wasm_exec.js to actually load this in the browser. Luckily, Go already ships with it. Just pull it from your installation:
cp "$(go env GOROOT)/misc/wasm/wasm_exec.js" ./static/js/
After that, you can load and run your WASM module with a little bit of JavaScript. Using an input event listener with a debounce makes the interaction feel more natural:
const go = new Go();
WebAssembly.instantiateStreaming(fetch("demo.wasm"), go.importObject).then((result) => {
go.run(result.instance);
const input = document.getElementById('my-input');
input.addEventListener('input', debounce(() => {
console.log(processData(input.value));
}, 300));
});
Binary Size
Let’s be honest, Go binaries are notoriously thick. A basic hello-world WASM build sits at around 2MB. Start importing standard libraries like encoding/json or protoreflect, and suddenly you are staring at a 10MB payload.
If you need a smaller binary, you can use TinyGo:
tinygo build -o demo.wasm -target wasm ./go/demo/main.go
For the minimal example we used above, here is the size difference:
| Compiler | Raw Size | Gzipped Size |
|---|---|---|
| Go | 2.5 MB | 758 KB |
| TinyGo | 702 KB | 239 KB |
A 10MB payload isn’t great for a landing page. Even the small example is around a megabyte when gzipped with the standard Go compiler.
Reducing the impact
- TinyGo: It produces much smaller binaries, but it doesn’t support the entire standard library. If your code uses complex reflection, TinyGo might not work out of the box.
- Selective Imports: Be ruthless about what you import. Some standard library packages are surprisingly heavy in a WASM context:
fmt: Includingfmt.Printforfmt.Sprintfcan pull in a large chunk of the reflection and formatting logic. For simple debugging,println()is free and doesn’t add weight.encoding/json: This relies heavily on reflection. If you have a massive JSON structure, consider if you can simplify the interaction or use a code-gen based parser likeeasyjson.net/http: This is the big one. If you need to make API calls, don’t use Go’shttp.Client.
- Analyze Your Binary: You can use
go tool nmto see what is taking up space:go tool nm -size -sort size demo.wasm | head -n 20 - Compression: Make sure your web server is using Gzip or Brotli. It can turn a 10MB WASM file into 2 or 3MB.
- UI Feedback: Don’t show a blank screen while the WASM loads. Use a progress bar or a simple “Initializing…” message.
- Deferred Loading: Only load the WASM when the user actually gets to the demo section.
Managing the Toolchain
These builds rely on specific versions of Go and TinyGo. To keep things consistent, I use mise to manage them.
Adding a .mise.toml to your project helps ensure everyone is using the same compiler versions:
.mise.toml (click to expand)View on GitHub
[tools]
go = "1.26.3"
tinygo = "0.41.1"
Documentation as a Sandbox
Writing good docs is still important, but giving people a sandbox to play in is a game changer. WASM finally makes that practical for languages other than Javascript. If you are building something new, let people break it in the browser before showing them the installation steps. It saves everyone a lot of time.
Before we get into the weeds of how this was built, go check out bgp.kmcd.dev right now. Play around with the interactive elements:
- Learn about BGP through interactive diagrams
- Try the RPKI safety test. You might not like what you find about your own ISP’s routing security.
Once you have a feel for it, come back here. Or don’t. I’m not your dad or anything.
The Ultimate Interview Question
My favorite interview question for software engineers is wonderfully simple: “How does the internet work?”
If a candidate walks me through DHCP, DNS, TCP, TLS, and HTTP, I know I am talking to someone with solid real-world experience. But there is almost always a glaring omission in their answer. A shockingly small number of people ever mention the role of BGP.
It makes sense why so many engineers miss it. Most software development today is incredibly abstracted. An engineer building microservices in AWS or configuring a Kubernetes cluster spends their whole day thinking about application-layer protocols. BGP operates at a layer of the infrastructure that is almost entirely invisible to them; it is treated as “the network’s problem” or something only ISPs and cloud providers need to worry about.
But without BGP, the internet as we know it would literally not exist.
Why BGP Matters
BGP (Border Gateway Protocol) is essentially the “glue” that holds the internet together. It is the protocol that determines how data travels from one network to another across the globe. When you click a link, BGP is what decided the path those packets took to get to you.
It is also incredibly fragile. A single misconfiguration or a malicious “route leak” can accidentally divert traffic for an entire country or knock major services offline. Despite being the backbone of the global internet, much of it still relies on trust. This is why security measures like RPKI (Resource Public Key Infrastructure) are so critical yet inconsistently adopted.
The Power of Interactive Explainers
I am really happy with how this project turned out. I have always found “interactive explainer” microsites to be super effective for learning complex technical concepts. Reading a whitepaper about the Border Gateway Protocol (BGP) is one thing, but discovering that your own ISP is vulnerable to a BGP hijack, or visually seeing how a Remote Triggered Black Hole (RTBH) can mitigate a DDoS attack, makes the concepts actually stick.
Static documentation often fails to convey the dynamic nature of protocols. An interactive explainer unlocks a “feedback loop” that docs can’t: you change a variable, and you see the consequence immediately. It bridges the gap between abstract theory and practical intuition. I find these sites are worth building whenever a concept involves state transitions, complex spatial relationships, or high-stakes edge cases that are hard to replicate in a lab.
Embracing the Evolution
This whole thing started because I wanted to learn more about BGP, so I wrote a visually cool (and mostly useless) 24/7 live stream. The natural next step was to leverage some of the insights I observed into a dashboard. But as I built the early version of the dashboard, the explanatory text became more interesting and more powerful than the raw dashboard data.
What began as a simple monitoring visualization shifted into a massive interactive learning resource. This taught me a valuable lesson: projects don’t need to start useful. The most valuable outcome isn’t always the original goal. It is often the observations you make while building toward it. By staying flexible, I was able to reshape the project into something far more impactful than just another “live map.”
This shift aligns with how I tend to learn best: by doing. I’ve always found that I don’t truly understand a protocol until I’ve had to handle its edge cases in code. This is why I write about HTTP from Scratch, gRPC From Scratch and gRPC Over HTTP/3; to push myself to build one layer deeper than I strictly need for my day-to-day work. But there is another layer to it: I strongly believe that to properly learn something, you must be able to teach it, or at the very least, communicate it clearly to others. There is a strange shift that happens in my brain when I approach a topic with the intent to present it. It forces a level of rigor that I might otherwise skip. It is the same reason I am such a huge fan of self-reviews while a PR is in draft; looking at my own code through the lens of an external reviewer often reveals “perfect” code to be anything but. It is also 95% of the reason that I write this blog (the other 5% is vanity).
Interactive Tools
I replaced static images with interactive SVG diagrams driven by the same data models used in the backend. You can watch different BGP behaviors play out interactively, from route advertisements and withdrawals to full blown route leaks.
The most useful tool on the site is the ISP RPKI Safety Test. It lets you check if your own Internet provider is using RPKI to sign and validate routes.
When I first ran this, I was shocked to see that my own home ISP fails this check. This means they are effectively trusting the “word” of any other network on the planet without cryptographically verifying it. It is a sobering reminder that the backbone of our digital lives is often held together by conventions and good faith rather than hard security. If your ISP fails, it is a great excuse to reach out to their support and ask why. This test is powered by isbgpsafeyet.com, and they encourage you to tweet about your ISP if they fail.
Is your ISP BGP safe?
This tool checks if your ISP is filtering BGP routes based on RPKI. It attempts to fetch two resources: one from a validly signed prefix and one from an invalidly signed prefix. This is built using the same endpoints as isbgpsafeyet.com. Go there for more information.
How It Was Made
Getting a live global heartbeat of the internet to run smoothly required completely rethinking the architecture. The original implementation handled data collection and GPU rendering in a single Go process. It worked fine at first, but garbage collector pauses during high-volume routing bursts (30,000+ updates per second) caused dropped frames in the 24/7 live stream.
The new architecture separates these concerns into two distinct outputs: a real-time 4K live stream on YouTube and an interactive microsite at bgp.kmcd.dev.
Here is the high-level flow: raw BGP updates from global sensors come in, a Rust backend processes and validates them in real-time. This processed data is then broadcast to a Go client (which renders the 60 FPS YouTube stream) and a Go indexer (which generates the static data for the microsite).
Here is a look at the architecture that solved this:
The Rust Rewrite
I rewrote the telemetry collector in Rust using the BGPKit ecosystem. Offloading the heavy lifting of parsing BMP and RIS-Live streams to a language built for high-throughput, memory-safe concurrency completely solved the performance bottlenecks.
Could I have just optimized the Go version? Probably. But the sheer volume of small allocations during BGP parsing was a “worst-case scenario” for Go’s garbage collector. Moving to Rust allowed me to manage memory exactly where it mattered, ensuring that even the most massive routing bursts wouldn’t stutter the visualization. Plus, I had been wanting to dip my toes into Rust, and this proved to be a great project for it.
Go and Ebitengine for the Live Stream
With Rust handling the data ingestion, the Go viewer was freed up to focus entirely on the 24/7 YouTube live stream. Using the Ebitengine game engine, the Go application is now just a lean client that renders a 2D Mollweide projection of the globe at 60 FPS. This output is captured by OBS and pushed to YouTube. That 60 FPS target is nearly always reached now, when it was a pipedream with the first architecture.
Yes, I know how insane I sound when I say “oh, and Go is used for the frontend”, but I learned to respect the performance and robustness of Ebitengine for this specific real-time visualization task. This is what personal projects are for: to do things you wouldn’t normally do in ways you wouldn’t normally do them.
Unifying on Protobuf
To manage the complex schema between Go, Rust, and TypeScript, I leveraged Protocol Buffers and gRPC.
Defining the interface between the Rust collector and the Go viewer in Protobuf simplified the Go code significantly. Instead of managing internal channels, it just subscribes to a gRPC stream of events. I can even restart the Rust collector to update logic without the visualizer dropping a single frame.
Static Hourly Snapshots
To keep the web platform fast without maintaining a live database to service requests, I built a Go indexer. It generates snapshots of the global routing state every hour and commits them to a GitHub repository. This triggers a build on Cloudflare Pages, which deploys the updated snapshots as static assets. This has proven ‘reliable enough’ for this project. Because of this, the data referenced in the website are updated hourly.
Why a Microsite?
I chose to build bgp.kmcd.dev as a standalone microsite rather than integrating it directly into this Hugo blog for a few key reasons:
- Freedom of Choice: Starting fresh gave me complete control over the HTML, CSS, and JavaScript. I wasn’t constrained by the blog’s existing design or Hugo’s template system, allowing me to use the best tools for this specific project.
- Cohesion: It makes more sense for the frontend to live in the same repository as the data collection and processing code. Since they are part of the same system, they can evolve together without being tied to the blog’s codebase.
- Deployment: By keeping it separate, the microsite has its own build and deployment pipeline. It can be updated or refactored independently, which is much cleaner than jamming dynamic data features into a static blog.
The Result
This project shifted from a monolithic live map to a distributed educational tool. Choosing specialized tools for each layer (Rust for throughput, Go for rendering, and Protobuf for data delivery) made the system more stable and capable.
I set out to visualize the internet, but ended up understanding it. I even built something that might help others do the same. If there’s one takeaway here, it is that “learning by building” involves more than the code you write. It is also about the clarity you gain when you try to explain that code to the rest of the world.
Explore the tools and live data at bgp.kmcd.dev. Source code is on GitHub.
Two years ago, I wrote Making gRPC more approachable with ConnectRPC. At the time, ConnectRPC was the “new kid on the block”, a library promising to fix the “gRPC tax” by supporting HTTP/1.1 and JSON without an extra proxy.
Today, ConnectRPC isn’t just a library. It is the core of a toolchain that makes traditional protoc workflows look completely dated. Companies like Anthropic are using it in production to power their SDKs, even maintaining their own ConnectRPC library in Rust.
Let’s look at how far things have come and how tools like Buf Remote Plugins, Protobuf SDKs, FauxRPC, and native HTTP/3 are changing API development.
Code Generation
One of my biggest complaints in Working with Protobuf in 2024 was the compatibility matrix from hell. Managing local installations of protoc, protoc-gen-go, and half a dozen other plugins was a miserable onboarding experience. If one person had a slightly different version of a plugin, the generated code drifted, and the CI build would fail for reasons that took twenty minutes to track down.
We can finally stop doing that. Buf Remote Plugins effectively killed the “it works on my machine” version of protoc. By pointing buf.gen.yaml to remote plugins on the Buf Schema Registry (BSR), we get deterministic, zero-install code generation.
# buf.gen.yaml in 2026
version: v2
plugins:
- remote: buf.build/connectrpc/go:v1.19.1
out: gen/go
opt: paths=source_relative
- remote: buf.build/protocolbuffers/go:v1.34.1
out: gen/go
opt: paths=source_relative
Your CI pipeline doesn’t need a bloated custom Docker image packed with binaries anymore. You just need the buf CLI. New hires clone the repo, run one command, and they’re done. It’s the level of “it just works” that we should have had a decade ago.
First-Class IDE Support
Writing Protobuf used to feel like coding in a glorified Notepad. We lacked the basic editor intelligence that almost every other major language enjoys.
That changed in early 2026 when Buf released a production-grade Language Server Protocol (LSP) server for Protobuf. It’s bundled directly into the buf CLI, which means whether you use VSCode or Neovim, you finally get go-to-definition and reference finding that actually works.
The LSP is workspace-aware, too. You can cmd-click an imported message from a third-party library and jump straight to the definition on the BSR without manually syncing files. It also catches syntax errors and duplicate modifiers before you even try to compile, which saves you from that annoying “context switch to terminal, run build, see error, switch back” loop.
Format, Lint, and Breaking Changes
The Buf CLI also provides the kind of guardrails that keep a team from moving into “legacy debt” territory too quickly.
If you’ve ever sat through a PR review where someone spent ten comments arguing about whether a field should be camelCase or snake_case, buf fmt and buf lint are for you. They end the debate. You run the command, the code is formatted, and the team moves on to actually solving problems.
The real winner is buf breaking. In a microservices setup, accidentally deleting a field or changing a data type in your schema is a great way to wake up the on-call engineer. By running buf breaking in CI, you verify the current schema against previous commits. It catches destructive changes before they hit the main branch, ensuring your contracts stay stable without requiring a human to manually audit every .proto change.
Data Validation
Validation has historically been a tedious chore. Writing endless if req.Age < 0 or if req.Email == "" checks in every single handler is a waste of time and a magnet for bugs.
protovalidate (which recently hit v1.0) moves those rules directly into the Protobuf schema. Since it’s built on Google’s Common Expression Language (CEL), you can do more than just check for nulls; you can write complex cross-field logic, like ensuring a “start date” is always before an “end date.”
By dropping the protovalidate interceptor into your server, requests are automatically validated before they touch your business logic. But the real “aha!” moment is the frontend. Your TypeScript client can run these same rules in the browser before the request even leaves. No more maintaining a separate Zod or Yup schema that inevitably gets out of sync with the backend. One source of truth, enforced everywhere.
Docs and Mocks
Sharing a gRPC endpoint used to be a pain; you couldn’t just hand someone a cURL command and expect it to work. ConnectRPC solved that fundamental issue by supporting standard HTTP/1.1 and JSON. But to truly treat these services like REST APIs, we needed the documentation tooling to match. That is why I spent part of 2024 working on protoc-gen-connect-openapi.
Now, Self-Documenting Connect Services are essentially the default for me. Because ConnectRPC skips binary framing for unary calls and uses standard HTTP status codes, we can generate an OpenAPI spec directly from the Protobuf definitions. You can spin up a Swagger UI directly from your server, let external users test with JSON, and keep your strict internal contracts intact.
We’ve also mostly solved the “waiting for the backend” bottleneck. FauxRPC uses your Protobuf descriptors to spin up a mock server in seconds. When you pair it with protovalidate, the fake data is actually realistic enough to build a frontend against. Some teams are even running FauxRPC in Testcontainers for integration tests, which is much cleaner than trying to manage a “staging” backend for every test run.
Why gRPC-Web Failed
To understand why ConnectRPC won the frontend, you have to look at the history of the protocol. Native gRPC relies on HTTP/2 trailers for status codes, but browsers do not expose those trailers to JavaScript. This originally made gRPC effectively unusable on the web.
The official solution to this problem was gRPC-Web. Its intended goal was straightforward: allow developers to use gRPC directly from web applications. As far as that specific goal goes, it was a success. You could finally make gRPC calls from a browser.
But there is a big difference between a technical success and a widely adopted standard. gRPC-Web never truly took off for a number of reasons. First, it was fundamentally unfriendly to modern infrastructure. It required a separate proxy (usually Envoy) just to translate the frontend requests into something the gRPC backend could understand. This added immediate operational overhead to every project.
Worse, it preserved the most frustrating parts of gRPC. Every single request returned a 200 OK, regardless of whether the server crashed or the resource was missing. It was a baffling design choice that broke the internet’s existing contract for observability. You could not rely on standard load balancer metrics, standard browser dev tools, or your generic APM to see if your site was actually healthy. You were forced to use specialized, protocol-aware tooling just to perform basic debugging. If I have to open a dedicated “gRPC-aware” network tab just to see why a login failed, I feel like it hasn’t actually earned the “web” part of gRPC-Web name.
ConnectRPC stepped in and completely erased the proxy requirement. It also fixed the integration issues with the traditional web. A unary JSON request in ConnectRPC acts exactly like a standard REST call. If a resource is missing, you get a real 404 Not Found, and your existing monitoring stack just works. It gave frontend developers the familiar, straightforward debugging experience they actually wanted while keeping the strict schema safety that backend teams need.
Why it’s my default choice
In 2024, ConnectRPC was about making gRPC more approachable. Now, the underlying protocol is almost an implementation detail. We get the benefits of typed schemas and code generation, but the friction of the “gRPC tax” is gone.
If you’re still hand-rolling JSON/REST APIs or wrestling with legacy gRPC-go stubs and Envoy proxies, it’s time to move on. The tools are ready, the workflow is better, and your on-call engineer will thank you.
In today’s interconnected world, APIs (Application Programming Interfaces) are the glue that connects computers. They allow different applications to talk to each other, share data, and perform actions. However, traditional methods of creating APIs often lead to frustrating challenges: breaking changes in JSON APIs, silent failures due to missing fields, frontend and backend drift, or schema mismatches that result in the classic “works on my machine” excuse.
Imagine a real-world scenario where the backend team renames a userId field to user_id and deploys their changes. Instantly, the frontend checkout process breaks in production because the API had no strict enforcement to catch the mismatch.
This is where contract-based APIs come in. A contract-based API is one where the schema is defined first in a formal specification, and both client and server are generated or validated against that contract. They reduce ambiguity and enforce consistency across services.
The Power of Pre-defined API Contracts
A contract-based API defines exactly what data can be exchanged, in what format, and what actions can be performed. This strict, pre-defined agreement unlocks several immediate advantages:
- Improved Developer Experience: Developers on both sides (client and server) have a clear understanding of what is expected, making integration smoother.
- Automated Documentation: Contracts serve as self-documenting artifacts. This reduces the need for manual documentation maintenance and ensures the docs stay in sync with the actual API implementation.
- Reduced Errors: Mismatched data formats or API changes become less likely, leading to fewer bugs. Contracts act as a validation layer that catches potential issues early.
- Easier Integration: Contracts act as a single source of truth. Developers can quickly understand how to interact with the API without extensive back and forth communication.
- Streamlined Development: These APIs often enable tools to automatically generate code for both client and server implementations. This eliminates manual boilerplate so you can focus on core logic.
Protobuf: The Language of APIs
In modern distributed systems, the foundation of many contract-based APIs lies in Protocol Buffers (protobuf). It is a language-neutral data format specifically designed for structured messages.
Unlike JSON, which is a text-based format designed to be human-readable, Protobuf is a binary format. This means you trade the ability to natively read the raw data in transit for significant performance gains:
- Smaller Message Sizes: Protobuf messages are compact and efficient, which leads to faster transmission and reduced bandwidth usage.
- Faster Parsing: Parsing binary protobuf messages is significantly faster compared to traditional formats like JSON or XML.
- Built-in Versioning: Protobuf uses field numbers (the
= 1,= 2in the code below) to identify data. This allows for excellent backward and forward compatibility. You can add new fields without breaking older clients that do not know about them yet. - Cross-language Compatibility: Protobuf definitions are language-agnostic. Code for interacting with the API can be generated for almost any modern programming language.
Because the data is binary, you cannot simply open your browser’s network tab and read the payloads by default. You will usually need to rely on modern browser extensions (like the gRPC-Web or Connect dev tools) to decode the traffic. It also requires setting up specialized tooling and build steps to compile the generated code.
Here is a basic example of a .proto file defining messages for a user and an address:
syntax = "proto3";
message User {
string name = 1;
int32 id = 2;
string email = 3;
Address address = 4;
}
message Address {
string street = 1;
string city = 2;
string state = 3;
string zip = 4;
}
In this example, the User message has fields for name, ID, email, and an Address message. These defined structures ensure consistent data exchange between applications.
Key idea: Protobuf relies on immutable field numbers instead of field names. This golden rule guarantees backward and forward compatibility.
gRPC: Building APIs on a Solid Foundation
gRPC (gRPC Remote Procedure Call) is a high-performance framework that builds upon protobuf’s strengths. It provides a powerful way to implement remote procedure calls, allowing applications to interact using clients generated for each language.
Introducing Services and Request/Response Types with gRPC
We can expand the .proto file to define a service called UserService with methods for user management:
syntax = "proto3";
service UserService {
rpc CreateUser(CreateUserRequest) returns (User) {}
rpc GetUser(GetUserRequest) returns (User) {}
}
message CreateUserRequest {
User user = 1;
}
message GetUserRequest {
int32 id = 1;
}
This example defines a UserService with two methods: CreateUser and GetUser. Each method takes a specific request message and returns a response.
Notice how clear the intention is. A helpful mental model to contrast modern APIs is:
- REST is resource-oriented (relying on URLs and HTTP verbs).
- gRPC is action-oriented (relying on explicit methods).
A reader of this spec does not have to map vague HTTP verbs like “POST” to actions like “create.” Also, these method names are greppable. It is trivial to locate every use of CreateUser across several repositories, making refactoring and impact analysis much easier.
Server Reflection
Another powerful feature of the gRPC ecosystem is Server Reflection. This allows clients or debugging tools (like Postman or grpcurl) to query the server at runtime to discover the available services and methods. This eliminates the need to distribute .proto files to developers just so they can explore the API structure.
Distributing API Contracts
Defining a contract is only half the battle. How do the frontend and backend teams actually share that .proto file? If the schema is not easily accessible, the contract is useless.
In practice, teams usually solve this distribution problem in one of three ways:
- Monorepos: Storing the backend, frontend, and API definitions in a single repository so all code shares the same source of truth.
- Package Managers: Generating the client SDKs in a CI/CD pipeline and publishing them as internal NPM, Maven, or Go packages.
- Schema Registries: Using dedicated tools like the Buf Schema Registry to manage, version, and distribute Protobuf files securely across an organization.
What about public APIs?
Historically, strict RPC contracts were tough for external, public-facing APIs. If your primary consumers were third-party developers, handing them a raw Protobuf file or expecting them to set up gRPC clients caused massive friction. They just wanted to use standard REST with JSON.
This is where tools like ConnectRPC shine. ConnectRPC allows you to define your API using Protobuf, but it automatically exposes endpoints that support standard HTTP/1.1 and JSON serialization as a fallback format.
This hybrid approach also solves the local debugging problem. You can configure ConnectRPC to use JSON during local development specifically so you can read the network tab in plain text, and then flip it to highly efficient binary for production. In practice, you write Protobuf once, and get both gRPC and REST/JSON APIs for free.
Even better, because the source of truth is still Protobuf, you can use ecosystem plugins to automatically generate an OpenAPI specification directly from your .proto files. You get a highly maintainable, contract-driven architecture on the backend, while your external users can still curl standard REST endpoints, read plain JSON, and explore your API via a generated Swagger UI. It offers the best of both worlds without compromising the developer experience on either side.
Key idea: Tools like ConnectRPC allow you to maintain strict internal Protobuf contracts while exposing standard REST/JSON APIs to external consumers.
Alternatives
While Protobuf and gRPC are a powerful duo, there are other contract-based API solutions to consider depending on your architecture:
- OpenAPI (Swagger): Contracts are not exclusive to RPC. You can use OpenAPI to define strict contracts for RESTful services. However, a harsh reality of the industry is that OpenAPI specs often drift from the actual code because they are bolted on after the fact. To make OpenAPI truly safe, teams must rely on strict framework integration (like FastAPI in Python or tsoa in Node) where the code generates the spec, or vice versa.
- GraphQL: Arguably the most mainstream contract-driven API paradigm for frontend developers. Its strictly typed schema defines the exact shape of the available data. Unlike gRPC, which has fixed responses, GraphQL allows the client to dictate the exact payload it wants to receive.
- Twirp: Developed by Twitch, Twirp is a lightweight RPC framework built on top of Protobuf and HTTP/1.1. It shares similarities with ConnectRPC but focuses on absolute simplicity. It avoids the complexity of HTTP/2 and gRPC streams while still providing generated clients, making it an excellent alternative if full gRPC is overkill for your needs.
- Thrift: Originally developed at Facebook, Thrift is a language-neutral protocol for defining service contracts similar to Protobuf. It is often found in large-scale data environments and supports various RPC protocols.
- tRPC: This tool defines the API schema directly in TypeScript code to be reused on both the client and the server. While it often pairs with libraries like Zod for runtime validation, it lacks true language-agnostic safety across the network boundary since it relies entirely on a TypeScript ecosystem.
- Avro: This format uses JSON-like schemas but stores data in a compact binary format. It is a staple in the Apache Kafka ecosystem for streaming data pipelines. It handles schema evolution differently than Protobuf (often sending the schema alongside the data), making it highly flexible for dynamic systems.
When NOT to Use API Contracts
While these tools are powerful, they are not a silver bullet. You should reconsider using strict API contracts if:
- You are building small projects or MVPs: The initial setup, code generation, and boilerplate overhead might slow down your speed of delivery when rapid iteration is the top priority.
- Simplicity for external consumers outweighs strict contracts: If you are building a straightforward public API and are not using a hybrid tool like ConnectRPC, raw JSON over REST remains the path of least resistance for third-party developers.
- Your team lacks tooling maturity: Implementing gRPC or Protobuf requires solid CI/CD pipelines and a team that is comfortable managing build steps, code generation, and backward-compatible schema evolutions.
Key idea: Strict API contracts add overhead and may not be suitable for small MVPs, simple public APIs, or teams lacking tooling maturity.
Conclusion
Contract-based APIs offer a significant advantage in building robust and scalable communication between applications. Protobuf and gRPC provide a powerful combination for defining clear contracts and generating highly efficient code.
As a general rule of thumb: if you are building an early-stage prototype, stick to what is fast and familiar. But if you are scaling a complex system across multiple teams and services, contract-based APIs transition from a nice-to-have to an absolute necessity. Once multiple teams depend on your API, contracts stop being optional. They are how you avoid chaos.
Featured Posts
Right now, thousands of routers are arguing about how to reach each other. That’s expected. It’s how the Internet works. This website wouldn’t load without it. BGP (Border Gateway Protocol) continuously announces and withdraws prefixes, adjusting how traffic moves globally. Most people see URLs and apps; routers see prefixes and AS paths.
I made a map that lets us listen in on this conversation, but in a relaxing, aesthetically pleasing way.
In my last post, I mentioned a websocket-based streaming API from RIPE. At the time, I set it aside. Soon, it became my obsession and the live view was born. While this visualization occasionally stumbles into being practically useful for spotting global outages, my primary requirement was simply to build a really cool looking map.
You can check out the source code for this project on GitHub or watch the map in action on my YouTube channel or here:
What are we looking at?
This map is a live visualization of the Border Gateway Protocol (BGP). This is the “language” routers use to talk to each other and decide the best path for your data to travel across the globe.
Imagine a router trying to find the best way to send traffic to Google. It receives multiple path advertisements from its neighbors, and it has to pick the most efficient route:
Every pulse on the map represents a real routing update. Sometimes it’s routine churn. Sometimes it’s maintenance, an outage, or a path change somewhere along the way.
The Global Game of Telephone
To understand why the map pulses, you have to look at how routers talk. BGP is a path-vector protocol, which is effectively a global game of telephone. When a network (an Autonomous System, or AS) wants to be found, it tells its immediate neighbors, who tell their neighbors, and so on.
- The Announcement: When a router in Tokyo says, “I have a path to
8.8.8.0/24,” it sends an Update to its peers. Every peer that hears this stamps the message with its own ID before passing it along. This list of stamps is called the AS Path. - The Selection: Routers generally prefer the shortest path. If an observer in New York hears the same news from London (2 hops) and Sydney (5 hops), it will automatically choose the shorter London route. On the live map, you will see this selection light up as a purple pulse.
- The Withdrawal: If a fiber line is cut, for example, the router sends a Withdrawal. This is where the game of telephone gets frantic. Neighbors start checking their old notes: “Wait, I can’t go through London anymore? What about that longer path through Sydney I heard about earlier?”
Here is how that “envelope” looks as it travels from Tokyo to New York. Notice how the path grows longer at every step.
Because routers often wait a few seconds before passing news along (to avoid “vibrating” the whole internet with every tiny hiccup), these updates arrive in waves. On the live map, this looks like a ripple of activity starting at the origin and washing over the globe as the “signatures” accumulate.
Spotting a BGP Flap
If you are watching the map and suddenly see a wave of pulses lighting up all over the world at the exact same time, you might be witnessing a BGP flap.
In networking, flapping happens when a route rapidly appears and disappears. Imagine a misconfigured router or a loose fiber cable. The router yells to the Internet, “I have a path to Google!” only to drop the connection a second later and say, “Never mind, it is gone”. That single localized hiccup doesn’t stay local. It ripples outward as routers everywhere recalculate their paths. To keep the whole system from grinding to a halt, modern routers use Route Flap Damping. This essentially puts the noisy network in a time-out until it proves it can stay stable.
Decoding the Pulses
When you see those colored pulses popping off on the map, they represent BGP updates processed through a multi-stage classification engine. Rather than just showing raw protocol messages (which is what the earlier version of the map did), the map categorizes events into four distinct colors based on their behavior and potential impact.
Anomalies and Behaviors
The classification engine also maps events into Level 2 categorizations (anomalies) based on heuristics applied over recent activity windows. To make sense of the noise, the multi-stage engine uses specific triggers to drop these events into the four colored buckets before presenting them on the map. These fall into four severity tiers:
| Severity Tier | Color | Description |
|---|---|---|
| Critical | Red | Significant routing failures, such as a prefix sustaining multiple withdrawals with no announcements, or path violations that suggest a route leak. |
| Bad | Orange | Highly volatile or inefficient behavior, including rapid “flapping” of routes, excessive “babbling” (a term I coined for this project) with unchanged attributes, or frequent next-hop changes. |
| Normal / Policy | Purple | Standard routing adjustments, such as traffic engineering (Policy Churn), path length oscillations, or the natural “Path Hunting” process where routers explore alternatives during convergence. |
| Normal / Discovery | Blue | Routine background noise, including standard prefix origination or redundant gossip pulses that keep routing tables current. |
When you zoom out and see all those colors firing at once, the true scale of the Internet comes to life. It tells the story of over 70,000 independent networks coordinating in real time.
What else is on the map?
To make sure the map isn’t just a wall of moving dots, I included several dashboard elements that provide context to the chaos:
Path Hunting and Anycast
When I first started watching the live data, I was confused by why a single localized outage would trigger a massive global explosion of pulses.
I’ve since learned this is likely due to a phenomenon called “Path Hunting.” When a route dies, the Internet doesn’t instantly agree it’s gone. Instead, routers desperately try to find backup paths. They’ll try a longer route, fail, try an even longer one, fail again, and generate a new BGP update every single time. Those massive bursts of purple pulses are basically the routers “thinking out loud” as they scramble to route around the damage.
This scramble to find backup paths can occasionally leave behind an interesting anomaly known as a “BGP zombie.” If a router fails to process a withdrawal message due to a software bug or slow propagation, it will stubbornly keep announcing a dead path to its neighbors, creating a localized black hole for traffic. Cloudflare has a great write-up on hunting down these undead routes if you want to fall down that rabbit hole.
Anycast routing amplifies this chatter even further. Huge networks (like Google or Cloudflare) announce the exact same /24 prefix from dozens of different physical locations globally so their services are fast everywhere. But if a major transit provider drops a peering session, or a provider intentionally shifts traffic away from a datacenter for maintenance, thousands of routers might suddenly decide to shift their traffic to a different Anycast node all at once. The result is a sudden surge of routing adjustments across the map.

RIPE RIS Beacons and Anchors
While building the “Most Active Prefixes” list, I kept noticing the exact same thing: /24 subnets were overrepresented on the leaderboard.
A /24 (256 IPs) is effectively the smallest globally routable unit, so most churn naturally happens at that granularity.
But there was another reason for seeing the same /24 subnets appearing on the list. Not all activity on the map comes from failing links or organic traffic shifts. There is also intentional ‘breakage’ happening behind the scenes to test BGP propagation.
It turns out RIPE RIS operates Routing Beacons. Routing Beacons are prefixes deliberately announced and withdrawn on a fixed schedule, typically every two hours. One of them announces and withdraws every 10 minutes. Researchers use these beacons as a controlled signal inside the global routing table to study BGP propagation and convergence. To make the activity list useful, I had to write logic to classify and filter these beacons out of the ranking.
RIPE also runs “Anchors” alongside these beacons. While a beacon prefix constantly flips on and off, an anchor is a prefix permanently announced from the exact same physical router. This gives researchers a stable control group. They can compare the volatile beacon traffic against a baseline of stable routing from the identical location.
I eventually added a Beacon Analysis view that separates “organic” updates from beacon-driven ones. It makes the metrics more accurate and highlights how much traffic is from deliberate live validation.
BGP Babbling and Attribute Churn
So if a burst of updates isn’t a dying link, a desperate search for a backup path, or a research beacon, what else could it be? Sometimes a network is just fidgeting. I call this babbling. While not an official industry term, it perfectly describes the constant, repetitive “talk” of updates that don’t actually change anything meaningful about the route.
I caught a great example of this while watching the stream. A Finnish fiber provider (AS43016) was firing off nearly 100 pulses per second, and this went on for days. The raw data showed the route wasn’t actually dropping. Instead, a single piece of metadata called the Aggregator ID just kept flipping back and forth.
This creates a localized flurry of activity. Some router somewhere was probably misconfigured and couldn’t make up its mind about how to summarize its own network. Every time it changed its mind, even by a single bit, it had to update every other router on Earth. Standard monitoring tools usually miss these “attribute flaps” because the network stays perfectly reachable. But on the map, they paint a very clear picture: a constant, rhythmic heartbeat of orange “bad behavior” pulses.
I built a tool to debug noisy prefixes like this. It aggregates BGP update stats and tries to diagnose the root cause, such as path oscillation, a flapping link, or heavy Anycast routing. Here is the output for our problem child over at AS43016:
$ just debug-prefix 195.155.146.0/24
BGP Prefix Monitor Stats (Running for 293.4s)
--------------------------------------------------
Announcements: 4576 (15.60/s)
Withdrawals: 1422 (4.85/s)
Total Msgs: 5101 (17.39/s)
Unique Peers: 310
--------------------------------------------------
GLOBAL CHURN EVENTS:
AS-Path Changes: 2275
Community Changes: 3259
Next-Hop Changes: 0
Aggregator Flaps: 0
Path Length Flaps: 1255
--------------------------------------------------
LIKELY CONCLUSIONS:
- Path Length Oscillation (Route is toggling between different path lengths)
- BGP Babbling (Excessive update rate detected)
--------------------------------------------------
Top 5 Churning Peers:
187.16.220.216: 149 attribute changes
5.188.4.211: 142 attribute changes
103.152.35.254: 142 attribute changes
177.221.140.2: 138 attribute changes
154.18.4.110: 132 attribute changes
At the time of publishing, this prefix is still babbling away. This script became the basis for the classification engine that I discuss later on in the article.

Making the map
Handling 30,000+ BGP updates per second takes more than plotting points on a canvas. The project is written in Go for its concurrency model and relies on Ebitengine for hardware-accelerated 2D rendering.
Why a Stream?
I originally planned to build this as a standard web frontend, similar to my previous map. However, I hit two massive walls almost immediately.
The first problem was the sheer volume of data. BGP updates can easily peak at over 30,000 events per second. Forcing a web browser to process that firehose while maintaining a smooth 30 FPS with complex blending is just not in the cards today.
The second problem was scaling. If the map actually got popular, having thousands of browsers opening individual websocket connections to the RIPE RIS-Live service would be a disaster. It is wildly inefficient, and accidentally DDoSing a service designed to monitor Internet stability was not on my to-do list.
Here is what that scenario looks like:
To protect the RIPE service from being overwhelmed, the logical next step was to put a middleman in place to handle the multiplexing. This led me to a standard client-server architecture:
Multiplexing solves the connection problem, but it completely ignores the browser rendering issues I was having. To guarantee a smooth 30 FPS for everyone without melting their CPUs, I decided to bypass the browser canvas entirely. I pivoted the architecture to a centralized YouTube stream:
Now I had a choice. Scenario 1 was dead on arrival because it could make the operators of RIPE RIS-Live very sad and potentially angry. That left me with the choice between building a complex backend service to multiplex that single RIPE connection to all my users (Scenario 2), or completely changing how people view the map by streaming to YouTube (Scenario 3). I went with the latter option.
Rendering the entire visualization on my own server and broadcasting it guarantees that every viewer gets the exact same high-fidelity experience, regardless of their hardware. It is easy to run on a TV where the browser version isn’t really viable. This pivot also made the tech stack an obvious choice. Once I started experimenting with Ebitengine, hardware-accelerated rendering in Go gave me crisper, far more fluid visuals than I could ever squeeze out of a standard browser canvas.
The downside is reduced interaction: no zooming, no toggling UI, no customization. I think this tradeoff was ultimately worth it, but I just want to note what I lost from making this dramatic change in architecture.
Flattening IP Space
To map a BGP update to a geographic location, you need reliable IP-to-region data. I am currently only focusing on IPv4, and that data comes from five Regional Internet Registries (RIRs). Each registry publishes large and sometimes overlapping delegated stats files.
Fragmented lookups across raw datasets might be fine for offline processing, but we have a strict frame rate budget. If the engine had to search through five separate datasets for every single update, the visualization would stutter. At 30,000+ updates per second, efficiency is pretty important.
To solve this, I preprocess all the data upfront using a sweep-line algorithm. Each IP range acts as a segment on a 1D number line. The algorithm walks across this space, resolves any overlaps between registries, and collapses millions of ranges into a single, clean, non-overlapping index.
For example, take two overlapping registry entries:
- Range A (ARIN):
10.0.0.0to10.0.0.255 - Range B (RIPE):
10.0.0.128to10.0.1.255
The algorithm flattens these into three distinct, non-overlapping segments:
10.0.0.0to10.0.0.127(ARIN only)10.0.0.128to10.0.0.255(Conflict resolved)10.0.1.0to10.0.1.255(RIPE only)
This preprocessing seems like overkill, but it’s worth it since it makes lookups super cheap. I back this index with BadgerDB and a DiskTrie for high-performance persistent storage. This allows the engine to track “seen” prefixes seamlessly across different sessions without eating up memory.
Managing the Firehose
BGP updates arrive continuously, and during route flapping events the volume spikes hard.
To keep the visualization readable without becoming an incomprehensible mess, the pipeline waits 10 seconds to ensure a withdrawal isn’t just a rapid path re-convergence, and paces the visual output so spikes are emitted smoothly every 500ms.
Aesthetics and Motion
Animations use interpolation instead of snapping to the next state. For parts of the map which update infrequently, I wanted to highlight that a change occurred. For that, I added a “glitch” effect to the “Top Activity Hubs” and “Most Active Prefixes” to make it more obvious and to add to the cyberpunk aesthetic. These effects add polish, but too much motion detracts from the vibes of the map. Finding that balance took restraint and a surprisingly large amount of experimentation.
The pulses are what actually bring the data to life. In the engine, each pulse is a simple generated glow texture. I add a bit of spatial jitter so concurrent events do not stack perfectly on top of each other, and I scale their sizes logarithmically so massive data spikes do not turn the map into a solid wall of color.
The colors map directly to the severity tiers: red for critical events, orange for bad behavior, purple for policy churn and hunting, and blue for routine discovery. Because they use additive blending, overlapping pulses naturally create a bright hotspot over regions with a ton of routing activity. They pop onto the map, expand, and fade out smoothly. Managing this entire visual lifecycle efficiently is what keeps the map feeling dynamic without tanking the frame rate.
The Mollweide Projection
Mercator would have been easy, but it heavily distorts size near the poles. For a global activity map, that felt misleading.
I chose the Mollweide projection.
This is an equal-area projection, which means it accurately represents the physical footprint of different regions. It produces a world view that still feels familiar without exaggerating high-latitude areas.
More Meaningful Events
Raw BGP messages only tell us two things: a route was announced, or a route was withdrawn. So how does the dashboard know when to declare a ’link flap’, a ‘route leak’, or a massive ‘outage’? The short answer is that I built a classification engine that takes the pattern of raw announce/withdrawal updates that BGP provides and converts them more meaningful events. Some kinds of events are easier to detect than others.
Route leaks are a great example of how messy this can get. Initially, I tried to validate routes using databases like Cloudflare’s RPKI portal, specifically hooking into their rpki.json endpoint. The goal was to check if the announcements for networks actually matched their registered ASNs. In practice, this resulted in way too many false positives because a massive number of announcements just were not matching the registered ASNs. If I had kept that logic, the map would have been permanently covered in red alert pulses.
Because of the noise, I ended up implementing a check for the valley-free routing principle. To understand why this works, we have to look at how BGP treats business relationships. BGP routing policies are built around who is paying whom. A network typically has providers it pays for transit, customers who pay it for access, and peers it swaps traffic with for mutual benefit. The valley-free rule dictates that a network should never act as a free transit bridge between two of its providers or peers.
Imagine a small regional network buys internet access from both AT&T and Verizon for redundancy. AT&T shares its global routing table with this small network so it knows where to send data. If that small network accidentally announces all of those AT&T routes to Verizon, it is inadvertently telling the entire internet to send all traffic between Verizon and AT&T through its local routers. Traffic would flow down from Verizon, into the small regional network, and back up to AT&T. That “down and back up” path is what creates the valley shape in the AS path. Because that small network does not have the capacity to handle global Tier-1 traffic, it immediately gets crushed under the weight of the data. The network drops packets and causes a massive localized internet outage, which is a classic route leak.
So, when the classification engine sees an AS path that violates this principle by dipping down into a lower-tier network and back up to a major provider, the system flags it as a route leak. While it serves as a decent baseline, I am generally uncertain about relying solely on this method. It is definitely a part of the classification engine that I want to explore and refine over time.
Other rules are slightly more straightforward:
| Event | Detection Trigger |
|---|---|
| Outage | >= 3 withdrawals, 0 announcements |
| Route Leak | path contains Tier-1 to non-Tier-1 to Tier-1 |
| Link Flap | > 5 withdrawals, announce:withdrawal ratio < 2.5 |
| Babbling | High volume, unchanged attributes |
| Next-Hop Flap | >= 5 next-hop changes, stable path length |
| Aggregator Flap | > 10 AGGREGATOR changes |
| Policy Churn | Elevated attribute changes |
| Path Oscillation | Frequent path length switching |
| Path Hunting | Increasing path length, then withdrawal |
| Discovery | Prolonged announcements, few changes |
To make these determinations, the engine analyzes the last five minutes of event data. Once classified, a prefix holds its state for ten minutes before being reevaluated. This means a prefix might eventually downgrade from a Link Flap to a routine Discovery. Outage states are the sole exception and clear immediately upon any new announcement. These initial rules are just a baseline I plan to refine as the project evolves.
Here is the final result, which I’ve gazed at for far too long already:
This project turned into a deeper dive into BGP than I expected. Watching as routing updates happen live exposes patterns that are impossible to find with a static snapshot. It has been a rewarding project and I am extremely happy with the result.
So please, toss the live stream on your TV, sit back, relax, and watch the Internet route the world’s network traffic as you listen to relaxing lofi in the background.
For the past few years, I’ve been trying to make the physical reality of the Internet visible with my Internet Infrastructure Map. This map shows the network of undersea fiber-optic cables along with peering bandwidth, grouped by city. I update the map annually, but I don’t want to just pull the latest data and call it a day. In this post I discuss how the map evolved this year and what I did to make it happen, but you can skip to the good part by viewing it here: map.kmcd.dev.
For the 2026 edition, I wanted to better answer the question: where does the Internet actually live? By layering on BGP routing tables alongside physical infrastructure data, I’m now closer to answering that question.
The result is a concept I call “Logical Dominance.” Each city’s dominance is calculated by summing total address space of IPv4 subnets that are “homed” in that city. How can I tell where IP addresses are homed? This required analyzing global routing tables to trace IP ownership back to specific geographies. Read on to find out how I accomplished this!
How the Internet Routes Traffic
Previous versions of the map focused on physical infrastructure: cables and exchange points. The physical path is only half the story. To understand how data moves, we have to look at BGP (Border Gateway Protocol).
BGP is the protocol that distinct networks, known as Autonomous Systems (AS), use to announce which IP addresses they own and how to reach them. If the cables are the hardware, BGP is the software that ties the Internet together. Cloudflare has an excellent primer.
When you load a webpage, your request doesn’t just “know” the path. Your ISP’s routers consult the global BGP routing table to decide the best next hop. Visualized, it looks a little bit like this:
In this state, the route from Router -> Netstream (AS8283) -> Google (AS15169) was chosen, at least for now. The underlying routes of the global Internet change thousands of times per second, constantly reshaping the topology.
Sources of BGP Data
To visualize this layer, we need access to routing tables. I explored three ways to get this data, each with its own trade-offs between real-time visibility and historical context.
Query a Looking Glass
We can connect to public routers via projects like University of Oregon Route Views. These allow you to telnet in and run standard CLI commands like show ip bgp to see exactly what a backbone router sees.
BGP routes for 8.8.8.8 (click to expand)View on GitHub
$ telnet route-views.routeviews.org 23
**********************************************************************
RouteViews BGP Route Viewer
route-views.routeviews.org
RouteViews data is archived on https://archive.routeviews.org
This hardware is part of a grant by the NSF.
Please contact [email protected] if you have questions, or
if you wish to contribute your view.
This router has views of full routing tables from several ASes.
The current list of all RouteViews peers is at
https://www.routeviews.org/peers/peering-status.html
NOTE: If you are using macOS and seeing the error message
"no default Kerberos realm" when logging in, you may want to
add "default unset autologin" to your ~/.telnetrc
To login, use the username "rviews".
**********************************************************************
User Access Verification
Username: rviews
route-views>show ip bgp 8.8.8.8
BGP routing table entry for 8.8.8.0/24, version 941738530
Paths: (16 available, best #15, table default)
Not advertised to any peer
Refresh Epoch 1
4826 15169
114.31.199.16 from 114.31.199.16 (114.31.199.16)
Origin IGP, localpref 100, valid, external
Community: 4826:5203 4826:6510 4826:52032
path 7F168F059710 RPKI State valid
rx pathid: 0, tx pathid: 0
Refresh Epoch 1
57866 15169
37.139.139.17 from 37.139.139.17 (37.139.139.17)
Origin IGP, metric 0, localpref 100, valid, external
Community: 57866:200 65102:56393 65103:1 65104:31
unknown transitive attribute: flag 0xE0 type 0x20 length 0x30
value 0000 E20A 0000 0065 0000 00C8 0000 E20A
0000 0066 0000 DC49 0000 E20A 0000 0067
0000 0001 0000 E20A 0000 0068 0000 001F
path 7F15A63304E8 RPKI State valid
rx pathid: 0, tx pathid: 0
Refresh Epoch 1
6939 15169
64.71.137.241 from 64.71.137.241 (216.218.253.53)
Origin IGP, localpref 100, valid, external
path 7F1555102CB8 RPKI State valid
rx pathid: 0, tx pathid: 0
Refresh Epoch 1
20130 6939 15169
140.192.8.16 from 140.192.8.16 (140.192.8.16)
Origin IGP, localpref 100, valid, external
path 7F1588A8BD60 RPKI State valid
rx pathid: 0, tx pathid: 0
Refresh Epoch 1
3333 1257 15169
193.0.0.56 from 193.0.0.56 (193.0.0.56)
Origin IGP, localpref 100, valid, external
Community: 1257:50 1257:51 1257:3528
path 7F16B16DD098 RPKI State valid
rx pathid: 0, tx pathid: 0
Refresh Epoch 1
7018 15169
12.0.1.63 from 12.0.1.63 (12.0.1.63)
Origin IGP, localpref 100, valid, external
Community: 7018:2500 7018:37232
path 7F1626828FD8 RPKI State valid
rx pathid: 0, tx pathid: 0
Refresh Epoch 1
20912 15169
77.39.192.30 from 77.39.192.30 (77.39.192.1)
Origin IGP, localpref 100, valid, external
Community: 20912:65002 20912:65022
path 7F15D19801C0 RPKI State valid
rx pathid: 0, tx pathid: 0
Refresh Epoch 1
3356 15169
4.68.4.46 from 4.68.4.46 (4.69.184.201)
Origin IGP, metric 0, localpref 100, valid, external
Community: 3356:3 3356:86 3356:576 3356:666 3356:901 3356:2012
path 7F1679EB7640 RPKI State valid
rx pathid: 0, tx pathid: 0
Refresh Epoch 1
3549 3356 15169
208.51.134.254 from 208.51.134.254 (67.16.168.191)
Origin IGP, metric 0, localpref 100, valid, external
Community: 3356:3 3356:22 3356:86 3356:575 3356:666 3356:901 3356:2011 3549:2581 3549:30840
path 7F16C7D32728 RPKI State valid
rx pathid: 0, tx pathid: 0
Refresh Epoch 1
101 15169
209.124.176.223 from 209.124.176.223 (209.124.176.224)
Origin IGP, localpref 100, valid, external
Community: 101:20400 101:22200 101:24100
path 7F1648388978 RPKI State valid
rx pathid: 0, tx pathid: 0
Refresh Epoch 1
1351 15169
132.198.255.253 from 132.198.255.253 (132.198.255.253)
Origin IGP, localpref 100, valid, external
path 7F15C90D61B8 RPKI State valid
rx pathid: 0, tx pathid: 0
Refresh Epoch 1
3257 15169
89.149.178.10 from 89.149.178.10 (213.200.83.26)
Origin IGP, metric 10, localpref 100, valid, external
Community: 3257:8052 3257:30306 3257:50001 3257:54900 3257:54901
path 7F154665E650 RPKI State valid
rx pathid: 0, tx pathid: 0
Refresh Epoch 1
2497 15169
202.232.0.2 from 202.232.0.2 (58.138.96.254)
Origin IGP, localpref 100, valid, external
path 7F1660B0A1C8 RPKI State valid
rx pathid: 0, tx pathid: 0
Refresh Epoch 2
3303 15169
217.192.89.50 from 217.192.89.50 (138.187.128.158)
Origin IGP, localpref 100, valid, external
Community: 3303:1004 3303:1007 3303:3067
path 7F16E10C0508 RPKI State valid
rx pathid: 0, tx pathid: 0
Refresh Epoch 1
8283 15169
94.142.247.3 from 94.142.247.3 (94.142.247.3)
Origin IGP, localpref 100, valid, external, best
Community: 8283:1 8283:101 8283:102
unknown transitive attribute: flag 0xE0 type 0x20 length 0x30
value 0000 205B 0000 0000 0000 0001 0000 205B
0000 0005 0000 0001 0000 205B 0000 0005
0000 0002 0000 205B 0000 0008 0000 001A
path 7F16DC0E5B28 RPKI State valid
rx pathid: 0, tx pathid: 0x0
Refresh Epoch 1
49788 12552 15169
91.218.184.60 from 91.218.184.60 (91.218.184.60)
Origin IGP, localpref 100, valid, external
Community: 12552:10000 12552:14000 12552:14100 12552:14101 12552:24000
Extended Community: 0x43:100:0
path 7F15CAD0C378 RPKI State valid
rx pathid: 0, tx pathid: 0
route-views>⏎
These paths often carry metadata called BGP Communities. These are optional tags that networks use to signal things like geographic origin or peering policy. While perfect for debugging today’s Internet, this approach lacks historical context; you can’t telnet into 2012 to check a routing table from 14 years ago.
Subscribe to a Stream
For real-time views, services like RIPE RIS Live aggregate BGP data from global collectors and stream it over a public WebSocket. You can watch the Internet “breathe” as routes are announced and withdrawn thousands of times per second. This is fascinating for a live dashboard, but useless for backfilling history.
Here’s an example script consuming this stream:go/stream_bgp/main.go (click to expand)View on GitHub
package main
import (
"fmt"
"log"
"os"
"os/signal"
"time"
"github.com/gorilla/websocket"
)
// RIPE RIS Live WebSocket URL
const risLiveURL = "wss://ris-live.ripe.net/v1/ws/?client=kmcd-internet-map"
// Message defines the structure of the JSON messages we receive
type Message struct {
Type string `json:"type"`
Data map[string]interface{} `json:"data"`
}
func main() {
// Handle Ctrl+C gracefully
interrupt := make(chan os.Signal, 1)
signal.Notify(interrupt, os.Interrupt)
fmt.Printf("Connecting to %s...\n", risLiveURL)
c, _, err := websocket.DefaultDialer.Dial(risLiveURL, nil)
if err != nil {
log.Fatal("dial:", err)
}
defer c.Close()
// Subscribe to the firehose (all messages)
// You can filter this! e.g., {"host": "rrc21"} for a specific collector
subscribeMsg := map[string]interface{}{
"type": "ris_subscribe",
"data": map[string]interface{}{
"moreSpecific": true,
"type": "UPDATE", // Only show route updates
},
}
if err := c.WriteJSON(subscribeMsg); err != nil {
log.Fatal("subscribe:", err)
}
fmt.Println("Connected! Streaming global BGP updates...")
fmt.Println("------------------------------------------------")
done := make(chan struct{})
go func() {
defer close(done)
for {
var msg Message
err := c.ReadJSON(&msg)
if err != nil {
log.Println("read:", err)
return
}
// We only care about BGP UPDATE messages
if msg.Type == "ris_message" {
path := msg.Data["path"]
prefix := msg.Data["announcements"]
// Handle withdrawals (routes being removed)
if prefix == nil {
prefix = "WITHDRAWAL"
}
// Print the timestamp, the route prefix, and the AS path
fmt.Printf("[%s] Prefix: %v | Path: %v\n",
time.Now().Format("15:04:05"),
prefix,
path,
)
}
}
}()
// Wait for interrupt
<-interrupt
fmt.Println("\nDisconnecting...")
err = c.WriteMessage(websocket.CloseMessage, websocket.FormatCloseMessage(websocket.CloseNormalClosure, ""))
if err != nil {
log.Println("write close:", err)
return
}
select {
case <-done:
case <-time.After(time.Second):
}
}
The output looks like this:Websocket Stream Output (click to expand)View on GitHub
Connecting to wss://ris-live.ripe.net/v1/ws/?client=kmcd-internet-map...
Connected! Streaming global BGP updates...
------------------------------------------------
[22:52:35] Prefix: [map[next_hop:2001:7f8:24::b1 prefixes:[2804:70c0::/32]]] | Path: [58057 6939 22381 1031 263444 22381 270746]
[22:52:35] Prefix: [map[next_hop:2001:7f8:24::aa,fe80::8a7e:25ff:fed3:420b prefixes:[2a13:9404::/32]]] | Path: [6939 215120 34689]
[22:52:35] Prefix: [map[next_hop:2001:7f8:24::aa,fe80::470:71ff:fec5:b6ad prefixes:[2a14:7c0:1740::/48 2a10:ccc7:b110::/44]]] | Path: [196621 6939 215120 214497 6204 215120 214497]
[22:52:35] Prefix: [map[next_hop:2001:7f8:24::aa,fe80::224:38ff:fea4:a907 prefixes:[2a14:7c0:1740::/48 2a10:ccc7:b110::/44]]] | Path: [29691 6939 215120 214497 6204 215120 214497]
[22:52:35] Prefix: [map[next_hop:91.206.52.177 prefixes:[2a10:ccc7:b110::/44 2a14:7c0:1740::/48]]] | Path: [58057 6939 215120 214497 6204 215120 214497]
[22:52:35] Prefix: [map[next_hop:2001:7f8:24::b1 prefixes:[2803:5d10::/32]]] | Path: [58057 6939 3356 28343 272053]
[22:52:35] Prefix: [map[next_hop:2001:7f8:24::aa,fe80::8a7e:25ff:fed3:420b prefixes:[2a14:7c0:1740::/48 2a10:ccc7:b110::/44]]] | Path: [6939 215120 214497]
[22:52:35] Prefix: [map[next_hop:2001:7f8:24::aa,fe80::470:71ff:fec5:b6ad prefixes:[2a13:9404::/32]]] | Path: [196621 6939 215120 34689]
2026/02/09 22:52:35 read: websocket: close 1000 (normal)
Download Historical Snapshots
To build the historical model, I processed raw RIB (Routing Information Base) files. These are snapshots of the entire routing table as seen by a backbone router at a specific moment in time. Because BGP is a “chatter” protocol that only announces changes, these full table dumps are essential for reconstructing the state of the Internet at any point in the past.
I specifically fetched snapshots from February 1st at 12:00 UTC for every year in my timeline. To ensure a comprehensive view, I aggregated data from multiple global collectors maintained by the University of Oregon Route Views project.
Other excellent resources for this kind of data include:
- RIPE RIS (Routing Information Service): Provides high-fidelity snapshots from a dense network of collectors, primarily in Europe.
- CAIDA BGP Stream: A framework for analyzing both real-time and historical data from various sources.
How BGP Shapes the Global Internet Map
For this edition, I processed over 15 years of BGP snapshots and PeeringDB archives to build the Logical Dominance model. Reconstructing this history was easily the hardest part of the project. I quickly realized that reliable archival data for physical peering effectively vanishes before 2010, which set a hard limit on how far back I could take the timeline.
Defining Logical Dominance
Logical Dominance is calculated by summing the number of unique IPv4 addresses originated by an ASN and attributed to a given city. Overlapping prefixes are deduplicated using longest-prefix normalization so that no address space is counted twice.
The Scaling Problem: Why not IPv6?
You might notice this model focuses entirely on IPv4. While IPv6 is the future of the protocol, its sheer scale currently breaks the “Logical Dominance” math. I measure dominance by counting unique IP addresses; if I treated IPv4 and IPv6 as equals, the numbers wouldn’t just be skewed; they’d be nonsensical.
Consider the math: The smallest standard IPv6 assignment is a /64. That single subnet contains 18,446,744,073,709,551,616 addresses. You could fit the entire global IPv4 routing table (4,294,967,296 addresses) inside that one subnet 4.3 billion times over.
If I treated every IP equally, a single residential IPv6 connection would statistically obliterate a city hosting the entire legacy IPv4 Internet. Until I develop a weighted model for IPv6, perhaps based on prefix density rather than raw address count, IPv4 remains the only way to compare global “weight” on a 1:1 scale.
Finding the Truth in the Noise
Mapping a BGP prefix to a specific city is more difficult than you may think. A subnet might be registered to a corporate HQ but serve users thousands of miles away. To solve this, I built a prioritized “waterfall” of attribution logic. I check sources in order of reliability, stopping as soon as I find a match:
- Geofeeds (RFC 8805): These are machine-readable CSVs where network operators explicitly self-report where their subnets are used.
- Cloud Provider Ranges: I ingest live IP lists from AWS, Google Cloud, and others, mapping logical regions (like
eu-west-1) to their physical locations (Dublin). - Network Hints (Communities & Next-Hops): At this point, I look to the routing table itself for hints. If a prefix is only announced at the London Internet Exchange, or tagged with a “London” BGP Community, I attribute it there.
- Historical WHOIS: My final fallback for specific location data is the APNIC/RIPE databases.
- Footprint Heuristic: For anything remaining, I assign the IP weight to every city where that network maintains physical peering capacity as listed in PeeringDB.
This approach ensures that accurate, granular data (like a specific cloud region) always overrides broad, administrative data (like a generic WHOIS entry).
Building this pipeline presented unique engineering hurdles; here are the most significant ones:
The Local Cache
Downloading 15 years of archives is slow. I threw together a quick file-based cache to avoid hitting the network repeatedly. It was the simplest code I wrote but easily the most valuable, turning 30-minute download waits into near-instant local reads.
RAM remains stubbornly finite
Loading millions of IP prefixes, WHOIS records, PeeringDB entries, and their associated metadata into a standard in-memory map consumes gigabytes of RAM instantly. Frustratingly, my laptop only has so much. To avoid out-of-memory errors I built a custom on-disk trie data structure using BadgerDB v4, which is a Go KV store built on an LSM tree, which makes IP prefix lookups very efficient. I might show it off in a later blog post after I clean it up a little bit. By using IP prefixes as keys in a sorted KV store, I can perform efficient longest-prefix matching directly against the disk.
Cleaning Up the Spaghetti
While investigating all of these different data sources, I ended up writing several programs that generated output of different shapes that would be used by other programs. It all made sense to me at the time but it spiraled out of control into a confusing mess. Now, I have one script for generating this city data. I was only able to do this because of the improvements mentioned above: caching and using on-disk data structures. Now, the script has clear stages of:
- Fetch: Downloads and caches raw data (WHOIS, BGP, PeeringDB).
- Index: Builds searchable on-disk tries and resolves authoritative network names from RIRs.
- Process: Scans BGP routes and attributes each prefix using the various data sources mentioned above.
- Output: Produces clean, normalized city results without duplicate entries (e.g., merging “Seoul” and “SEOUL”).
What Changed When IP Dominance Was Added
When I layered IP dominance onto the physical map, many additional cities became visible.
In earlier versions, visibility depended heavily on registered Internet Exchange Points. That highlighted the traditional coastal hubs and major peering metros. But once routing table data was incorporated, the map revealed cities without major IXPs. These are places with substantial address space and large originating networks, even if they do not host a major public exchange. This is most noticeable in India, Japan, China, Indonesia, and in secondary metros beyond traditional hubs in the EU and United States.
The physical meeting points of networks only tell us a part of the story. The global routing table reveals where address space is actually controlled and originated. Some cities carry significant weight without being major public peering hubs. The IP dominance layer makes that distinction visible.
The Chinese Internet
The Chinese internet is giant, but it presents a unique attribution challenge. Because so much of China’s domestic routing remains internal to national carriers, the global BGP table often only sees these massive networks when they peer at international hubs like Hong Kong, Los Angeles, or Frankfurt. An earlier version of my attribution code ended up adding all of China’s IP space to these select few international hubs, which was clearly incorrect. It looked like China Telecom was the biggest ISP in Germany, which made it appear that China Telecom dominated Germany. It does not, at least not yet. To fix this, I implemented specific logic for China-based networks. I used pattern matching to parse provincial hints from APNIC WHOIS data. This mapped prefixes like GD or SH to their respective provincial capitals. I also linked ASNs to their parent organizations in PeeringDB to prevent Chinese networks from being misattributed to foreign exchange points. This resolved attribution for the vast majority of prefixes. Any remaining IP space attributed only at the country level is distributed across major domestic hubs.
The result is a far more realistic view of China’s internal internet topology.
Ghost Networks and Spurious ASNs
Not every entry in the global routing table represents a real network with a physical footprint. While investigating the data, I found several “spurious” Autonomous Systems that I had to filter out to keep the map accurate.
For example, I had to add safety checks to prevent “IP swallowing.” There is a massive 0.0.0.0/0 block often pinned to Australia in the APNIC database. Since 0.0.0.0/0 matches every single IPv4 address, that one entry would incorrectly claim the entire global IP space for Australia. I know they have a lot of open space down there, but that seemed excessive.
Another prominent example was the Department of Defense (DoD). The DoD holds several massive /8 blocks (like 7.0.0.0/8 and 11.0.0.0/8). While this space is technically routed, it does not represent commercial internet traffic. In early versions of my model, the registration data for these blocks linked them to administrative offices in New York City. This caused my script to dump millions of military IPs onto Manhattan and incorrectly made it look like the absolute center of the universe.
I also built a blocklist to ignore other non-geographic entities:
- Administrative Containers: I filter out WHOIS entries containing
IANA-NETBLOCK,CIDR-BLOCK, orERX-NETBLOCK. These are typically placeholders for unassigned pools managed by regional registries rather than active networks. - Registry Placeholders: Specific ASNs like 721, 56, and 37069 often function as loopbacks or registry tests.
By explicitly ignoring these, the resulting map represents the actual commercial Internet rather than the administrative database of the Internet.
UX and Rendering
In addition to adding more data to the map, I’ve also made several improvements to the map itself.
Dynamic Cluster Grouping
Layering BGP data onto an already complex physical map created a major design challenge: information density. With hundreds of new cities “lighting up” globally, the map became significantly cluttered when zoomed out.
To solve this, I implemented Dynamic Cluster Grouping. Close-by cities now group together into aggregate hubs at low zoom levels, which then split into individual markers as you zoom in. This isn’t just a visual fix; by reducing the number of active SVG shapes in the DOM, it significantly improves panning performance on mobile devices.


Dynamic Cluster Grouping ensures the map remains legible, preventing the increased data density from overwhelming the map. When you click on a cluster, the details panel expands to list every city contained within that group.

Viewport Culling
I also introduced Viewport Culling. The map now only renders assets currently within your bounds. As you pan to a new region, cities “pop in” dynamically, ensuring the browser isn’t wasting resources on rendering things on the other side of the planet.
Updates to City Sizing
The visual size of cities on the map also now dynamically reflects their importance. Previously, cities were sized based only on their relative peering bandwidth. Now, their size depends on a weighted combination of aggregate peering bandwidth and IP dominance, contributing 80% and 20% to the size calculation respectively. Although this ratio is arbitrary and was picked for aesthetic reasons, peering bandwidth is a stronger signal of real traffic concentration than raw IP space alone, so I think it should be emphasized significantly more.
Enable/disable layers
Now, the map can be sliced into three layers: Cables, peering bandwidth, and IP allocations. There are controls that allow you to show or hide each of these layers individually.

Permalinks
I also added permalinks to make the map state fully shareable. The URL now encodes the current latitude, longitude, zoom level, selected year, and active text filters. If you zoom into Southeast Asia in 2016 and search for “Singapore”, that exact view can be copied and shared. The resulting link will look like this:
https://map.kmcd.dev/?lat=3.1625&lng=103.4033&z=5.00&year=2016&q=singapore
…which will show exactly how amazingly connected Singapore is when others click on it.
Better Exports
One of the most requested features for the map has been a way to export the current view for use in presentations, reports, posters, or just as a high-quality wallpaper.
Previously, I was using a standard Leaflet plugin for this, but it was not great. It would often fail in weird ways, leaving you with a glitched or incomplete rendering of the map. It also exported as PNG, which meant the beautiful vector data of the cables and cities was flattened into a low-resolution raster format.
Now there’s a new export button that renders an isolated SVG. Because the map itself is built on SVGs, this new export method is lossless. It respects your current zoom level and position, allowing you to focus on a specific region and generate an incredibly high-quality vector file that you can scale to any size without losing a single pixel of detail. Most images in this post were generated using this new export feature.
Show Me the Data
Another one of the biggest requests I’ve had in previous years is for access to the raw data behind the visualizations. For the 2026 edition, I have exposed the underlying JSON datasets that power the map. These files are curated from TeleGeography (for modern cables), PeeringDB (for IXPs), and historical data is curated from various sources including submarinenetworks.com and archived maps.
You can access these directly to build your own visualizations, analyze the growth of global bandwidth, or double check my numbers.
all_cables.json: The Core Map Data. A GeoJSON FeatureCollection containing all submarine cables. Each feature includes properties likename,rfs_year(Ready for Service),decommission_year,owners, andlanding_points. This follows the standard GeoJSON format.year-summaries.json: Brief textual descriptions of notable events or milestones for specific years, displayed in the footer.city-dominance/{year}.json: Per-year JSON files (e.g., 2026.json) with detailed city-level peering capacity, regional information, and coordinates. Used for rendering city markers and calculating regional statistics.meta.json: Metadata including the minimum and maximum years covered by the visualization.
See You Next Year
You might ask why I burned so much time manually attributing IP space when services like MaxMind or IPInfo already exist. The honest answer? Buying the data isn’t fun. The joy of this project comes from the archaeology and the work involved in bringing order to chaotic and disjointed datasets and transforming them into something beautiful.
This was a great project, and I am extremely happy with the results. If you’ve gotten this far without checking out the map, I’m impressed with your restraint, but here’s one more link for you to take a look: Explore the Map »
Introduction
Welcome to the first installment of our “HTTP from Scratch” blog series! In this series, we’ll embark on a journey through the evolution of the Hypertext Transfer Protocol (HTTP), the backbone of the World Wide Web. By building simple implementations of each major HTTP version in Go, we’ll gain a deep understanding of how this essential protocol has shaped the internet we use every day and how it has evolved to what we have now. Be warned that none of the code will be the most performant, secure or featureful.
In this post, we’ll travel back to the early days of the web (1991) and explore HTTP/0.9, HTTP’s initial incarnation. At the time, HTTP/0.9 was a groundbreaking technology that enabled the first web browsers and servers to communicate, laying the foundation for the World Wide Web that we know today. But HTTP/0.9 had its limitations. We’ll discuss these shortcomings, which ultimately paved the way for subsequent versions of HTTP that introduced features like headers, status codes, and support for additional HTTP methods, connection reuse, binary framing and, eventually, abandoning TCP for UDP for better reliability and performance. For a more formal description of the HTTP/0.9 specification, you can reference http.dev or w3.org.
To get a hands-on understanding of HTTP/0.9, we’ll take a practical approach. Since no modern web servers support this early version, we’ll create our own HTTP/0.9 server from scratch using Go. This will allow us to experiment with the protocol and gain valuable insights into its inner workings.
By the end of this post, you’ll have a solid grasp of HTTP/0.9 and the foundation it laid for the modern web. You will be well-equipped to continue our journey through the evolution of HTTP in the upcoming articles.
Understanding HTTP/0.9
HTTP/0.9, the inaugural version of the Hypertext Transfer Protocol, laid the groundwork for the web as we know it today. It introduced a simple request-response model that facilitated communication between web clients and servers.
The Request
In HTTP/0.9, a client sends a single-line request to the server. This request line consists of only two components:
- GET Method: The only method supported in HTTP/0.9. It instructs the server to retrieve the specified resource.
- Resource Path: The path to the resource on the server that the client wants to access.
For example, to request the index.html file from the server’s root directory, the client would send:
GET /index.html
That’s… it. It can’t get much simpler than that. Note that there are no headers to indicate content size, compression, content type, etc. This isn’t a part of HTTP/0.9 as all of that was introduced in HTTP/1.0.
The Response
Upon receiving a request, the server responds with the contents of the requested resource. This response is a simple stream of bytes, usually representing an HTML document. Notably, there are no headers in an HTTP/0.9 response or a status code that tells us if we’re receiving an error page or the resource that we asked for. Since there were no status codes, there was no way to indicate if a requested resource was not found — a concept that would later be introduced with the 404 status code.
Full example
> GET /index.html
< <html>
< Hello World!
< </html>
Again, this is an extremely simple protocol at this point. Request a resource and get the data. There’s no place to put metadata when requesting or responding. By the way, lines prefixed with > are from the client to the server and < are from the server to the client. Remember this, because this is important for understanding verbose curl output, which we’ll use a good amount in the future.
sequenceDiagram
actor Client
rect rgb(47,75,124)
Client ->> Server: TCP SYN
Server ->> Client: TCP SYN-ACK
Client ->> Server: TCP ACK
end
rect rgb(200,80,96)
Client ->> Server: HTTP Request
Server ->> Client: HTTP Response
end
rect rgb(47,75,124)
Server ->> Client: TCP FIN
Client ->> Server: TCP ACK
end
Limitations
While revolutionary for its time, HTTP/0.9 had some significant limitations that paved the way for future improvements:
- No Headers: The absence of headers meant that the server could not convey any additional information about the response, such as content type or length.
- Only GET: The only supported method was GET, which meant that clients could only request resources and not submit data to the server.
- No Status Codes: There was no mechanism to indicate errors or other status information.
Implementing an HTTP/0.9 Server
Okay, now let’s implement the server.
type Server struct {
Addr string
Handler http.Handler
}
func (s *Server) ListenAndServe() error {
if s.Handler == nil {
panic("http server started without a handler")
}
l, err := net.Listen("tcp", s.Addr)
if err != nil {
return err
}
defer l.Close()
for {
conn, err := l.Accept()
if err != nil {
log.Fatal(err)
}
go s.handleConnection(conn)
}
}
You can see here that our ListenAndServe() listens on the configured TCP port and then loops forever, accepting and handling new connections in separate goroutines. Now let’s look at s.handleConnection()
func (s *Server) handleConnection(conn net.Conn) {
defer conn.Close()
reader := bufio.NewReader(conn)
line, _, err := reader.ReadLine()
if err != nil {
return
}
fields := strings.Fields(string(line))
if len(fields) < 2 {
return
}
r := &http.Request{
Method: fields[0],
URL: &url.URL{Scheme: "http", Path: fields[1]},
Proto: "HTTP/0.9",
ProtoMajor: 0,
ProtoMinor: 9,
RemoteAddr: conn.RemoteAddr().String(),
}
s.Handler.ServeHTTP(newWriter(conn), r)
}
Handling a connection is very simple in HTTP/0.9 because clients can only create a new connection, send a single request and then the connection is closed. This code reads the first line given by the user and then calls strings.Fields to split the different parts of the request up. As a reminder, this is what the request looks like in HTTP/0.9:
GET /path/to/resource
Okay, now let’s look at what newWriter does. ServeHTTP expects a http.ResponseWriter, which looks like this:
type ResponseWriter interface {
Header() Header
Write([]byte) (int, error)
WriteHeader(statusCode int)
}
Here’s what our HTTP/0.9 looks like:
type responseBodyWriter struct {
conn net.Conn
}
func (r *responseBodyWriter) Header() http.Header {
// unsupported with HTTP/0.9
return nil
}
func (r *responseBodyWriter) Write(b []byte) (int, error) {
return r.conn.Write(b)
}
func (r *responseBodyWriter) WriteHeader(statusCode int) {
// unsupported with HTTP/0.9
}
func newWriter(c net.Conn) http.ResponseWriter {
return &responseBodyWriter{
conn: c,
}
}
The important thing to note here is that HTTP/0.9 doesn’t support status codes or headers so we don’t need to do anything in Header() and WriteHeader(statusCode)
Now let’s put it together in a main function:
func main() {
addr := "127.0.0.1:9000"
s := Server{
Addr: addr,
Handler: http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Write([]byte("Hello World!"))
}),
}
log.Printf("Listening on %s", addr)
if err := s.ListenAndServe(); err != nil {
log.Fatal(err)
}
}
Note that the HTTP handler is a normal-looking http.Handler so this server could work with any existing HTTP router or framework.
See the full source at Github: main.go.
Testing the server
Now we just need to run the server:
$ go run server/main.go
2024/07/27 21:55:28 Listening on 127.0.0.1:9000
Now that we have an HTTP/0.9 server, how do we test it?? Since HTTP/0.9 pretty much isn’t used anywhere, how do find a client to test this server? Luckily, curl supports HTTP/0.9, so let’s try that!
$ curl --http0.9 http://127.0.0.1:9000/this/is/a/test
Hello World!
The –http0.9 flag instructs curl to accept the headerless responses of HTTP/0.9.
You should note that curl is sending a request as if it were HTTP/1.1 but is configured to accept the headerless responses of HTTP/0.9. Here’s what the curl manpage says about the flag:
--http0.9
(HTTP) Tells curl to be fine with HTTP version 0.9 response.
HTTP/0.9 is a completely headerless response and therefore you can also connect with this to non-HTTP servers and still get a response since curl will simply transparently downgrade - if allowed.
I verified this behavior a bit more when I mixed --http1.0 and --http0.9. Instead of using HTTP/1.1 in the first line of the request, curl declares that it is using HTTP/1.0 while still treating the body correctly (not expecting a status code or headers):
$ curl --http1.0 --http0.9 http://127.0.0.1:9000/this/is/a/test
Hello World!
By the way, if you don’t use the --http0.9 (or if your server returns the status code/headers in a format that doesn’t make sense) you will receive an error message, “Received HTTP/0.9 when not allowed”:
$ curl http://127.0.0.1:9000/this/is/a/test
curl: (1) Received HTTP/0.9 when not allowed
You can test this server with simpler tools. For example, netcat (ncat), can be used to make this same request. This will be a theme for the text-based protocols where often it’s simpler just to write text out directly to netcat than it is to use other kinds of tooling:
$ echo GET this/is/a/test | ncat 127.0.0.1 9000
Hello World!
After writing most of this article, I finally realized that I never even attempted to test my web server using a real web browser. It seems like the only major browser that still supports HTTP/0.9 is Firefox, so let’s see it!

Yes, it works! Success!
Implementing an HTTP/0.9 Client
Now that we’ve made a server and used existing clients, we might as well make a client in Go. Don’t worry, this one is super simple:
conn, err := net.Dial("tcp", "127.0.0.1:9000")
if err != nil {
log.Fatalf("err: %s", err)
}
if _, err := conn.Write([]byte("GET /this/is/a/test\r\n")); err != nil {
log.Fatalf("err: %s", err)
}
body, err := io.ReadAll(conn)
if err != nil {
log.Fatalf("err: %s", err)
}
fmt.Println(string(body))
See the full source at Github: main.go.
This code does the following:
- Establishes a TCP connection to the server.
- Sends the HTTP/0.9 request.
- Receives the response and displays it to the user.
Simple, right? Almost too simple.
$ go run client/main.go
Hello World!
Conclusion
In this post, we delved into the origins of the web by exploring HTTP/0.9. While simple in its design, HTTP/0.9 was a groundbreaking protocol that enabled the first web browsers and servers to communicate.
We explored the basic request-response model of HTTP/0.9, understanding how clients could request resources and servers could deliver them. We also acknowledged the limitations of this early version, including the lack of headers, status codes, and support for other HTTP methods besides GET.
By building a rudimentary HTTP/0.9 server and client in Go, we gained hands-on experience with the protocol’s core concepts. We learned how to handle TCP connections, parse requests, and send responses, laying a solid foundation for understanding more advanced HTTP versions.
In the upcoming parts of this series, we’ll delve into how HTTP evolved to overcome the limitations of HTTP/0.9 and address the needs of the evolving World Wide Web. We’ll explore the introduction of headers, status codes, and additional methods, which enabled more robust and feature-rich communication between web clients and servers. Stay tuned for the next part of our series, where we’ll dive into HTTP/1.0 and its significant enhancements over HTTP/0.9. As a sneak peek, these are the major features added to each version:
- HTTP/0.9: First attempt to transfer generic resources by paths
- HTTP/1.0: Adds status codes, headers, verbs
- HTTP/1.1: Adds connection re-use so connections can be reused
- HTTP/2: Switches from a text-based protocol to binary
- HTTP/3: Built on QUIC instead of TCP
Major Projects
protoc-gen-connect-openapi
A protoc plugin that generates OpenAPI 3 definitions from Connect and gRPC service definitions, enabling seamless integration with OpenAPI-based tools.
fauxrpc
A tool that generates fake gRPC and Connect servers based on your Protobuf definitions. It produces realistic, randomized data that adheres to your service's constraints.
Internet Maps
A collection of projects visualizing and explaining the global routing and physical infrastructure of the Internet: map.kmcd.dev (cables & peering map), livemap.kmcd.dev (real-time routing updates stream), and bgp.kmcd.dev (interactive BGP explainer).












