arXiv:2026.01235 [math.CT]

CategoricalProbability

Probability Maps as First-Class Arrows

Treat probability maps as composable arrows with Wasserstein distance built in—clean plumbing for stochastic systems and robust belief comparison.

§0The Big Idea

Treat probability maps as first-class arrows you can compose, and equip distributions with a principled distance (Wasserstein). That buys clean plumbing for stochastic systems and a robust way to compare beliefs.

§1Three Foundational Concepts

Giry Monad

A functor sends each space X to 'distributions on X,' with unit (point mass) and bind (pushforward/mixture). The Kleisli category has Markov kernels as morphisms — stochastic programs you can compose.

Markov Categories

A diagrammatic axiomatization where conditioning, disintegration, and conditional independence are algebraic laws. Prove statistical theorems at the type level.

Kantorovich Monad

On metric spaces, the probability functor carries a Wasserstein metric. Measures arise as colimits of finite samples — bridging sampling code and measure theory.

§2Why This Matters Practically

Compositional Pipelines

Model your system as Sensors → Latent State → Controller using arrows X → Dist(Y). Compose kernels instead of juggling PDFs. For epistemic transport, this provides clean glue between sheaf layers and transport costs.

Robustness Knobs

Penalize Wasserstein (W₁/W₂) between predicted and observed belief states. Fuse models by minimizing distance to the set/mixture you trust. This enables robust MPC, ensemble fusion, and distributional RL.

Sample ↔ Measure Sanity

The Kantorovich monad justifies treating batches as measures (and vice-versa), so your empirical code matches the math. Finite samples converge to true distributions in a principled way.

§3Runnable Example: Compare Beliefs

A minimal example showing how to compute the Wasserstein distance between two belief snapshots. Use this number to regularize a controller (e.g., add λ·W₁ to your cost) or to gate model switching.

belief_distance.py

python

from collections import Counter
import numpy as np
from scipy.stats import wasserstein_distance

# Two belief snapshots as samples
a = np.array([0.1, 0.2, 0.2, 0.8])
b = np.array([0.15, 0.25, 0.7])

# Compute Wasserstein-1 distance
w1 = wasserstein_distance(a, b)
print("W1 =", w1)  # lower = closer beliefs

# Use in controller cost function
def augmented_cost(state, action, lambda_reg=0.1):
    base_cost = compute_base_cost(state, action)
    predicted_belief = predict_belief(state, action)
    observed_belief = get_observation()

    # Robustness penalty
    belief_mismatch = wasserstein_distance(
        predicted_belief,
        observed_belief
    )

    return base_cost + lambda_reg * belief_mismatch

§3.1Kernel Composition Pattern

Build stochastic pipelines by composing Markov kernels. Each arrow transforms distributions through a stochastic step.

kernel_composition.py

python

class MarkovKernel:
    """A stochastic map X -> Dist(Y)"""

    def __init__(self, transition_fn):
        self.transition = transition_fn

    def __call__(self, x):
        """Apply kernel to a point, get distribution"""
        return self.transition(x)

    def pushforward(self, dist_x):
        """Apply kernel to a distribution (bind)"""
        return flatten([self(x) for x in dist_x.samples])

    def compose(self, other):
        """Kleisli composition: (f >=> g)(x) = bind(f(x), g)"""
        def composed(x):
            intermediate = self(x)
            return other.pushforward(intermediate)
        return MarkovKernel(composed)

# Build a sensor-state-controller pipeline
sensor_to_latent = MarkovKernel(sensor_model)
latent_to_action = MarkovKernel(policy)

# Compose into end-to-end pipeline
sensor_to_action = sensor_to_latent.compose(latent_to_action)

§4Minimal Mental Model

Unit: Wrap a value into a degenerate distribution (Dirac delta). This is how deterministic values enter the stochastic world.
Map / Pushforward: Transform a distribution through a deterministic function. If f: X → Y and μ is a distribution on X, then f#μ is the distribution on Y.
Bind (>>=): Run a stochastic step that returns a new distribution, then flatten. This is kernel composition — the essential plumbing operation.

§4.1Connection to Sheaf Fusion

In your epistemic transport framework, categorical probability provides:

Clean glue between sheaf layers via kernel composition
Transport costs via built-in Wasserstein metrics
Type-level reasoning about conditional independence via Markov categories

§5Further Reading

Giry's Categorical Approach

The foundational paper on treating probability as a monad, with stochastic maps as morphisms in a Kleisli category.

Fritz's Markov Categories

A synthetic approach to Markov kernels with algebraic laws for conditioning, disintegration, and conditional independence.

Kantorovich Monad

Perrone & Fritz on measures as colimits of finite samples, with Wasserstein metrics built into the categorical structure.