CategoricalProbability
Probability Maps as First-Class Arrows
Treat probability maps as composable arrows with Wasserstein distance built in—clean plumbing for stochastic systems and robust belief comparison.
§0The Big Idea
Treat probability maps as first-class arrows you can compose, and equip distributions with a principled distance (Wasserstein). That buys clean plumbing for stochastic systems and a robust way to compare beliefs.
§1Three Foundational Concepts
Giry Monad
A functor sends each space X to 'distributions on X,' with unit (point mass) and bind (pushforward/mixture). The Kleisli category has Markov kernels as morphisms — stochastic programs you can compose.
Markov Categories
A diagrammatic axiomatization where conditioning, disintegration, and conditional independence are algebraic laws. Prove statistical theorems at the type level.
Kantorovich Monad
On metric spaces, the probability functor carries a Wasserstein metric. Measures arise as colimits of finite samples — bridging sampling code and measure theory.
§2Why This Matters Practically
Compositional Pipelines
X → Dist(Y). Compose kernels instead of juggling PDFs. For epistemic transport, this provides clean glue between sheaf layers and transport costs.Robustness Knobs
Sample ↔ Measure Sanity
§3Runnable Example: Compare Beliefs
A minimal example showing how to compute the Wasserstein distance between two belief snapshots. Use this number to regularize a controller (e.g., add λ·W₁ to your cost) or to gate model switching.
from collections import Counter
import numpy as np
from scipy.stats import wasserstein_distance
# Two belief snapshots as samples
a = np.array([0.1, 0.2, 0.2, 0.8])
b = np.array([0.15, 0.25, 0.7])
# Compute Wasserstein-1 distance
w1 = wasserstein_distance(a, b)
print("W1 =", w1) # lower = closer beliefs
# Use in controller cost function
def augmented_cost(state, action, lambda_reg=0.1):
base_cost = compute_base_cost(state, action)
predicted_belief = predict_belief(state, action)
observed_belief = get_observation()
# Robustness penalty
belief_mismatch = wasserstein_distance(
predicted_belief,
observed_belief
)
return base_cost + lambda_reg * belief_mismatch§3.1Kernel Composition Pattern
Build stochastic pipelines by composing Markov kernels. Each arrow transforms distributions through a stochastic step.
class MarkovKernel:
"""A stochastic map X -> Dist(Y)"""
def __init__(self, transition_fn):
self.transition = transition_fn
def __call__(self, x):
"""Apply kernel to a point, get distribution"""
return self.transition(x)
def pushforward(self, dist_x):
"""Apply kernel to a distribution (bind)"""
return flatten([self(x) for x in dist_x.samples])
def compose(self, other):
"""Kleisli composition: (f >=> g)(x) = bind(f(x), g)"""
def composed(x):
intermediate = self(x)
return other.pushforward(intermediate)
return MarkovKernel(composed)
# Build a sensor-state-controller pipeline
sensor_to_latent = MarkovKernel(sensor_model)
latent_to_action = MarkovKernel(policy)
# Compose into end-to-end pipeline
sensor_to_action = sensor_to_latent.compose(latent_to_action)§4Minimal Mental Model
- Unit: Wrap a value into a degenerate distribution (Dirac delta). This is how deterministic values enter the stochastic world.
- Map / Pushforward: Transform a distribution through a deterministic function. If f: X → Y and μ is a distribution on X, then f#μ is the distribution on Y.
- Bind (>>=): Run a stochastic step that returns a new distribution, then flatten. This is kernel composition — the essential plumbing operation.
§4.1Connection to Sheaf Fusion
In your epistemic transport framework, categorical probability provides:
- Clean glue between sheaf layers via kernel composition
- Transport costs via built-in Wasserstein metrics
- Type-level reasoning about conditional independence via Markov categories
§5Further Reading
Giry's Categorical Approach
The foundational paper on treating probability as a monad, with stochastic maps as morphisms in a Kleisli category.
Fritz's Markov Categories
A synthetic approach to Markov kernels with algebraic laws for conditioning, disintegration, and conditional independence.
Kantorovich Monad
Perrone & Fritz on measures as colimits of finite samples, with Wasserstein metrics built into the categorical structure.