Horcrux has been running in production since early 2021 — threshold Ed25519 signing for Tendermint validators. Split your key across multiple signers, require 2-of-3 (or 3-of-5) to sign blocks, get high availability without double-sign risk.

The cryptography worked from day one. The operations were another story.

v1 required careful manual configuration. State synchronization between signers was fragile. Running a cluster took expertise and vigilance — the kind where you’re watching logs at 2am hoping nothing drifts out of sync.

v2 introduces Raft, and it changes everything.


What Raft Brings#

Leader election — The cluster automatically elects a leader to coordinate signing. No more manual designation, no more confusion about who’s in charge.

High watermark consensus — Before signing, signers agree on the latest signed height. This prevents double-signs even during network partitions or signer restarts. The cluster remembers what’s been signed, not just individual signers.

Automatic failover — If the leader goes down, another takes over. No manual intervention. No pager alerts at 3am.

The result: a cluster that’s harder to misconfigure and dramatically easier to operate.


Performance#

Raft adds coordination overhead, but the implementation is fast:

Benchmark showing signing performance

Signing latency stays well under block time. The consensus overhead is negligible compared to the operational benefits.


The Architecture#

                    ┌─────────────┐
                    │   Sentry    │
                    │   Node      │
                    └──────┬──────┘
                           │
┌─────────┐  ┌─────────┐  ┌─────────┐
│ Horcrux │◄─┤ Horcrux │◄─┤ Horcrux │
│ Signer 1│  │ Signer 2│  │ Signer 3│
│ (Raft)  │  │ (Raft)  │  │ (Raft)  │
└─────────┘  └─────────┘  └─────────┘
         threshold = 2

Signers form a Raft cluster for leader election and watermark consensus. They connect to sentry nodes (not directly to p2p) for signing requests. Geographic distribution is straightforward — put signers in different datacenters, tolerate regional failures.


Migrating from v1#

# Existing v1 configs work with minimal changes
horcrux config migrate

# Or start fresh
horcrux config init --cosigner

The migration docs cover upgrading from v1 or from a traditional single-signer setup.


What’s Next#

  • Prometheus metrics for signing performance and cluster health
  • Simplified key migration tooling
  • Continued hardening based on production deployments

Horcrux is deployed across hundreds of validators now. The original threshold signing work came from Roman Shtylman at Polychain Labs. Andrew Gouin led the Raft integration that makes v2 what it is.

Code at github.com/strangelove-ventures/horcrux.