The Missing Faces

Three people write one paper and the graph remembers three friendships. A reactor valve reports the command it was sent, not the position it reached. A finished skyscraper turns out to have a load case nobody ever checked. The relation that only means something all at once is exactly the one a clean diagram throws away — and the piece it drops is where the accident waits.

June 13, 202614 min read

essays
systems
topology

A clean systems diagram lies in a particular way.

It draws boxes for the parts and arrows for the ways they touch, and if its author has read a little topology it will draw triangles for the three-way relationships and leave holes where the structure runs out. The result is genuinely useful, and it is also too generous, because the diagram quietly grants every small relationship implied by every large one — and most real systems never agreed to that.

The last piece, Seams and Signals, drew a clean line between a local measurement and a global explanation. A cochain could be locally quiet and globally loud; a seam could pass every small test and still announce itself around a loop. But that whole story assumed we already knew the space the measurements lived on. The vertices, the edges, the triangles, the holes — the mesh — were handed to us. This piece is about the step before that one. What if the mesh is wrong? What if the triangle we are measuring around was never really there?

When triangles lie

Three people write a paper together. A graph turns that into three collaborations: Alice with Bob, Bob with Carol, Alice with Carol. If all you ever wanted to know was whether two of them have worked together, the translation costs nothing. If the thing you needed to preserve was the paper — one event, three names, written at once — the translation has already thrown it away. Three pairwise collaborations are not the same object as one three-way collaboration, and no amount of staring at the three edges will rebuild the triangle that was actually there.

The stakes climb the moment the relation is something a system leans on. A safe mode engages only when the sensor is alive, the override is enabled, the operator is trained, the upstream service is degraded but not dead, and the policy engine still holds a fresh token. Remove any one of those and the safe mode does not weaken by a fifth. It stops being the safe mode at all. That condition is not five independent edges between pairs of components; it is one relation that happens to involve five things, and it is meaningless in pieces.

A simplicial complex is the object that makes the generous reading official. Its defining rule is downward closure: if a triangle belongs to the complex, all three of its edges do; if a four-way simplex belongs, so does every smaller face inside it. Any higher relation drags its lower relations along by law. For a mesh this is exactly right and exactly why the machinery runs — a filled triangle missing an edge is not much of a triangle. For a system it is often fiction. A hypergraph makes the weaker, more honest promise: an edge may contain any number of vertices, and it carries none of its sub-edges automatically. The five-way safe mode is a single hyperedge. Whether the pair of it that says sensor-alive-and-override-enabled means anything on its own is a question, not a guarantee. The simplicial view is excellent when that lower-order compatibility is real. It is dangerous when the compatibility is assumed because the mathematics wanted it.

That is not five edges. It is one hyperedge.

A simplicial model says: if the whole interaction exists, its faces exist. A hypergraph asks whether they do. The gap between those two sentences is where the accident lives.

Three Mile Island and the false face

The accident at Three Mile Island, on March 28, 1979, is usually told as a valve story. A pilot-operated relief valve stuck open, coolant escaped, the operators misread what was happening, and the core overheated. That telling is true and too flat. The more honest object is not the valve but the relation among five things: the valve, the indicator that was supposed to report it, the plant model the operators carried in their heads, the direct measurement they did not have, and the procedure that made a full pressurizer look more dangerous than a starving core.

The indicator is the part worth slowing down on. It did not show the valve's actual position; it showed that a close signal had been sent. Other instruments were misleading or simply absent — there was no direct water-level instrument for the reactor vessel, so the operators inferred core coverage from the pressurizer, read it as full, and cut back the emergency cooling that was keeping the core alive. [1]

Every local reading had a defensible story. The close signal was sent, so the valve was treated as closed. The pressurizer was high, so the core was treated as covered. The procedure discouraged a solid pressurizer, so reducing emergency flow looked locally correct. Treat the situation simplicially and the triangle fills itself in — command, indication, interpretation, each consistent with its neighbor. The face that was missing was the actual state of the plant.

This is the exact hinge between the last piece and this one. Closed but not exact was the signal problem: a measurement that passes every local test and still fails to come from a global potential. The hypergraph problem sits one level beneath it. Before you can ask whether a measurement is exact, you have to ask which local tests were ever legal faces of the real relation. At Three Mile Island, the test “a close signal was sent” was being read as the test “the valve is shut,” and those were never the same face.

Challenger and the launch hyperedge

Challenger gets flattened too, just along different seams. One telling blames the O-rings. Another blames managers who overrode their engineers. Another blames the cold; another the schedule. Each is true and none is sufficient, because the launch decision was not any one of those facts. It was a hyperedge that held all of them at once: the overnight temperature, the O-ring's resilience when cold, the behavior of the field joint, the documented history of seal erosion, the contractor's recommendation, the management reversal, the ice on the pad, and the institutional pressure to fly.

The Rogers Commission found the decision flawed in a very specific way. The people who made it lacked the recent history of O-ring and joint problems, were unaware that the contractor had recommended in writing against launching below fifty-three degrees Fahrenheit, and did not know that Thiokol's engineers had kept objecting after their own management reversed position. [2] Read simplicially, the review process is a tidy chain of local certifications — contractor to project office, project office to readiness review, readiness review to launch — and every interface has its paperwork. But the fatal condition was never pairwise. “The O-ring concern was communicated to some managers” is not the same fact as “the concern survived as a hard launch constraint across the whole decision.” “Erosion was seen before” is not “erosion has changed the temperature we are allowed to launch in.” “The engineers objected” is not “the objection outlived the management caucus as a system state.”

A model that respected the system would not just draw those communication edges. It would mark the high-arity relation directly: do not launch when cold-weather seal uncertainty, prior erosion evidence, unresolved engineering dissent, and incomplete escalation all coincide. That is not a triangle waiting to be filled by its smaller faces. It is a named operating mode — a forbidden simplex, illegal unless the missing faces underneath it have actually been built. Writing it down is the move an earlier piece called loop shedding: you cut an ambiguous circulation by turning it into a boundary, so that below some temperature no quantity of local reassurance can recirculate the system back into go. That is not bureaucracy. It is topology repair.

The blackout was not a line

The 2003 Northeast blackout is usually filed under cascade, and the word does quiet damage. A cascade sounds like a row of dominoes, each one knocking the next, a single line of causation you could have interrupted anywhere along its length. What actually happened on the afternoon of August 14 was a many-way loss of the ability to see and act, spread across systems that were never supposed to fail together.

The U.S.–Canada task force described a normal afternoon degrading through several interacting phases. The MISO state estimator became ineffective because it was fed inaccurate data. FirstEnergy lost its Eastlake 5 generator. Shortly after 2:14 in the afternoon, FirstEnergy's alarm and logging system failed and was not restored until after the blackout, so the operators lost both the visual and the audible signals they relied on and went on making decisions from stale information while, out in the field, lines sagged into trees and tripped one after another. [3]

Sort those events by layer and they refuse to stay sorted. A line touching a tree is physics. An alarm processor dying is software. A state estimator running on old data is control-room epistemology. A phone call that did not happen is human coordination. The blackout was not any one layer's failure; it was the relation high load, lost generation, stale estimate, blind alarms, tree contacts, and no timely load shedding — lethal not because any pair of those is fatal but because the whole of it is. The task force's timeline reads almost like a hypergraph written out in prose: electrical events, computer events, and human events that become dangerous only where they align.

The management lesson everyone draws — trim the trees, fix the alarms, answer the phone — is correct and too small. The systems lesson is stricter. Any layer that believes it can stay locally correct while another layer has gone blind has already become a source of curvature: the operating loop no longer returns to the same state after a trip around it. Dispatch action, telemetry, line loading, alarm state, a neighbor's awareness, and an operator's belief do not commute. Walk the loop one way and the grid looks tolerable; walk it another way and it is already gone.

Curvature shedding is the deliberate removal of the path-dependence a system cannot afford to keep, and the cheapest version of it on that grid would have been almost embarrassingly small: an alarm whose only job is to announce that the other alarms can no longer be trusted. That single signal converts a hidden cross-layer dependency into a visible boundary — if the state estimate is stale and alarm processing is down, the operating regime changes by itself, before anyone has to notice. The bad loop is not repaired by adding more loop. It is repaired by changing what the loop is allowed to enclose.

Citicorp and the load that had no face

The Citicorp Center story earns its place here precisely because it is not a disaster. It is the ghost of one. In 1978, after the tower in midtown Manhattan was already built and occupied, its structural engineer, William LeMessurier, reconsidered how it would behave under quartering winds — wind striking the building on the diagonal rather than head-on. The stresses in certain members went up, and the problem was sharper than it should have been, because the bracing's joints had been built bolted rather than welded, a substitution made for cost that had never been checked against the quartering-wind case. Revised wind-tunnel work confirmed the danger, and the building was quietly reinforced, welder by welder, through the autumn. [4]

The error people remember is a calculation. The error that matters is representational. The design process had treated a list of facts as if they were separable: the code's perpendicular-wind requirement, the building's quartering-wind behavior, the switch from welds to bolts, the way a load figure was interpreted, the assumptions behind the tuned mass damper, the probability of a great storm, and the fact that people slept inside. Each had been checked. A downward-closed review can validate every one of those faces and feel, honestly, like it has validated the building. But the dangerous thing was the relation among them — quartering wind and bolted joints and that particular bracing and that reading of the code — and that relation was a face nobody had built. The repair was steel. The deeper repair was learning that those facts had always been one object.

The hole in the simplicial story

Step back from the cases and the pattern is a warning about a tool. The appeal of a general theory of systems is that one language might cover circuits and cities and software and aircraft and power grids and supply chains without grinding them all into the same mush. The graph view bought us a lot: flow, centrality, bottlenecks, reachability, feedback. The simplicial view from the last piece buys more — higher-order consistency, the difference between a local obstruction and a global one, holes, cochains, the Hodge decomposition, signals that cannot be explained away. It is a real upgrade, and it has a hole in it when it is reached for too early.

It assumes the face relation. It assumes that when a many-way relation exists, its parts exist in the right way underneath it. For a mesh that assumption is the whole point. For a system it is a thing you have to check. Most systems are not born simplicial. They contain simplicial patches — a truss calculation really does rest on its lower faces, a ranking really is assembled from pairwise comparisons — but the system around those patches is usually hypergraphic. Some interactions are irreducibly many-way. Some of the sub-relations the mathematics expects are simply absent. Some pairwise edges are legal only inside a larger state, some local tests are meaningless outside a named mode, and some interfaces behave perfectly until a rare boundary condition combines them. This is not merely a modeling preference: the homological-algebra work on hypergraphs is explicit that multi-way data cannot always be represented unambiguously by a graph, and that forcing the extra topological structure can manufacture interactions the data never contained. [5]

So the simplicial reflex — fill the missing faces so the machinery can run — is the wrong first move. The systems instinct is to stop and ask whether a face is missing for a reason.

Missing faces are evidence

That gives a rule you can actually use. When you find a high-order condition in a system, resist the urge to decompose it into pairwise dependencies right away. Write it as a hyperedge first, all of it on one line, and only then go looking for which of its faces are real. If A, B, and C are safe together, ask whether A and B alone mean anything, whether B and C do, whether A by itself does. Where the answer is no, that face is missing — and the diagram should not be allowed to fill it in behind your back.

A deployment is a clean example. Migration complete, feature flag off, old workers drained, new schema backward-compatible, rollback still possible: five green checks that a dashboard will happily show as five independent lights. They are not independent. “Migration complete” together with “old workers still running” can be an actively dangerous state; “flag off” together with “rollback impossible” is worse than either looks alone. The safe deploy is not the conjunction of five checkboxes. It is a hyperedge with several illegal faces, and a release process that treats it as a checklist will pass every individual test on its way into an outage.

The same shape turns up on a shop floor — gas selected, regulator open, material profile loaded, fixture grounded, laser enabled, the right goggles on, assist pressure confirmed — where the safe operating mode is a single many-way condition, and a machine that exposes it as seven independent toggles can satisfy every switch and still sit in a globally unsafe state. It turns up in a small business, where the customer promise, the inventory, the cash timing, the supplier lead time, the labor, the weather, and the permit can each be reasonable in every pair and jointly impossible. The missing face is where the future lawsuit lives.

Shedding the loop

A loop is not the enemy. Life is loops; control is loops; learning is loops; a system with no feedback at all is a corpse with a dashboard. The enemy is the loop that encloses something it cannot see.

Watch one form of it. A support team notices a billing bug and stands up a manual refund process. Product notices the refund volume and adds a retention offer. Finance notices the margin leak and tightens approvals. Support notices the new delays and invents a side channel. Sales notices the side channel and starts promising it to customers. Every move around that loop is locally sensible, and every trip around it adds curvature — come back to where you started and the business is not the same business. It has more hidden policy, more exception debt, and less ability to say plainly what it sells.

Loop shedding is not “remove the feedback.” It is removing or weakening the feedback path that is carrying curvature nobody is accounting for. Sometimes you shed it by cutting it: no side-channel refunds, no override without a ticket type, no launch below the temperature rule. Sometimes you shed it by drawing a boundary around it — this is now an exception mode, with an owner, entry and exit criteria, a log, and a timer. And sometimes you shed it by changing its dimension, taking a messy five-way condition the organization has been pretending is a checklist and naming it as a single state. That last move is the most underrated of the three. A great many systems are haunted precisely because they have states that exist operationally but not representationally: they live only in Slack threads, in tribal knowledge, in a weird calendar ritual, and in a dashboard nobody trusts. Name the hyperedge and half the ghost leaves the room.

Shedding the curvature

Curvature is what you have when the order of the local corrections changes the result. Patch the software, migrate the data, notify support — fine. Notify, then patch, then migrate — probably fine. Migrate, then notify, then patch — and suddenly the system is a different shape. When a loop hands back a different state depending on the path you took around it, that is curvature, and not all of it is bad. Design has curvature; learning has curvature; markets have curvature. A perfectly flat system would be rigid and useless. The goal is not zero. The goal is to know which curvature you are paying for.

Curvature shedding is the deliberate removal of the path-dependence a system cannot afford. Sometimes it means building a true missing face: at Three Mile Island, a direct readout of the valve's actual position and a real core-state instrument would have closed the gap between the command that was sent and the condition in the vessel. Sometimes it means forbidding a false one: at Challenger, a hard constraint tying temperature and seal uncertainty to a launch hold would have stopped local managerial circulation from filling in a safety face that did not exist. Sometimes it means isolating a region — islanding and automatic load shedding are not admissions of defeat but controlled cuts through a dangerous loop. And sometimes it means refusing to decompose at all: some procedures should stay atomic precisely because their safety lives in the co-occurrence of their parts, and optimizing them into independent checkboxes is how you lose it. The aim is never to make the system simpler in the lazy sense. It is to make the system honest about which of its parts are actually simple.

Refusing the wrong topology

None of this is a finished mathematics, but it is already a method. Begin by listing the vertices — not just the components, but the people, services, permissions, physical states, time windows, instruments, procedures, and load-bearing assumptions. Then list the hyperedges, and resist writing “A talks to B” unless that really is the important fact; what you are after are the conditions that require several of those vertices to hold at once. For each hyperedge, test downward closure the slow way: ask whether its smaller faces actually exist, and wherever the whole relation is real but a sub-relation is not, mark the missing face. Those marks are not cleanup to be tidied away later. They are the most valuable thing on the diagram.

Then put the signals back on the object. The manual interventions, the standing exception meetings, the overrides, the rework, the stale dashboards, the alarm floods, the unexplained variance, the emergency calls — these are cochains in work clothes, the system measuring itself in the only language it has. Read them for loops whose correction pressure comes back changed each time around, because those are the loops carrying curvature. And when you find one, choose a move rather than reach for another coordination layer:

Cut the loop when the feedback path has no business existing.
Fence the loop when it is real but dangerous — give it an owner, entry and exit criteria, a log, and a timer.
Name the hyperedge when the system has been pretending a many-way state is a checklist.
Fill the face when the missing lower-order relation should exist and can be built for real.
Instrument the seam when the missing face cannot be filled but must be watched.

None of these makes the system simpler than it is. They make it stop lying about its own shape. That is the whole of the discipline: refusing the wrong topology, even when the tool is begging you to accept one.

The honest mesh

The seductive thing about a simplicial complex is that it hands you homology for free. Once you have the mesh, the machinery just runs: cycles, boundaries, cohomology, Laplacians, the harmonic signal that finds the hole. The last piece lived happily inside that gift. But a system does not owe you a mesh. Sometimes you have to earn one, and a hypergraph is where the earning starts — it is the more honest object, the one that still lets you say this interaction is irreducibly many-way, this lower face is absent, this condition is atomic, this overlap is weighted, this missing relation is structural. Only after that have you earned the right to decide what to close, what to leave open, what to collapse, and what to merely watch.

Seams and Signals ended with a seam turning into a signal — the data noticing the shape of the room it lives in. This piece adds the question that comes before it: did we draw that seam, or did the model erase it? A theory of systems worth using has to keep that question alive. It has to tell a real triangle from one we filled because the tool wanted a triangle, a loop that teaches from a loop that only hoards hidden state, the curvature we are paying for on purpose from the curvature we are simply too scared to name.

The world is not made of boxes and arrows. It is not even made only of loops. It is made of conditions, and some of those conditions have faces and some do not. So the next time a system you maintain passes every check it has and still feels unsafe — every light green, every interface signed off, and a quiet wrongness no single test will name — try asking which of those checks were ever really separate, and which one is a face you have been filling in because the diagram looked tidier that way. The missing face is where the work begins.

References

[1] U.S. Nuclear Regulatory Commission, Backgrounder on the Three Mile Island Accident. Describes the stuck-open pilot-operated relief valve, the indicator that reported the close signal rather than the valve's position, the absence of a direct reactor-vessel water-level instrument, and the operator actions that reduced emergency cooling.

[2] Report of the Presidential Commission on the Space Shuttle Challenger Accident (the Rogers Commission), Volume 1, Chapter 5. Finds the launch decision flawed and documents the missing O-ring and joint history, the contractor's written recommendation against launching below fifty-three degrees Fahrenheit, and the continuing opposition of Thiokol engineers after management reversed position.

[3] U.S.–Canada Power System Outage Task Force, Final Report on the August 14, 2003 Blackout in the United States and Canada. Describes the ineffective state estimator, the FirstEnergy alarm and logging failure, the line-to-tree contacts, the loss of situational awareness, and the interleaving of electrical, computer, and human event phases.

[4] Online Ethics Center, William LeMessurier — The Fifty-Nine-Story Crisis: A Lesson in Professional Behavior. Summarizes the quartering-wind reconsideration, the bolted joints substituted for welds, the underestimated joint stress, and the eventual reinforcement of Citicorp Center.

[5] Gasparovic, Purvine, Sazdanovic, Bei Wang, Yusu Wang, and Ziegelmeier, “A survey of simplicial, relative, and chain complex homology theories for hypergraphs.” Argues that multi-way data cannot always be represented unambiguously by a graph, and that forcing extra topological structure can add interactions not present in the data; the weighted nerve construction preserves enough multi-way intersection information to recover the hypergraph up to isomorphism under stated conditions.