Your AI.Your hardware.Your rules.
Trigan operates a distributed LLM inference mesh across owned GPU infrastructure. We built the serving layer, the load balancers, the model routing, the observability stack. All of it from scratch.
What it is
A production-grade distributed inference system
Running across multiple locations on owned hardware.
Three load balancers. Seven managed nodes. Heterogeneous GPU support — Ollama, llama.cpp, vLLM, and more.
Raft consensus. Automatic model distribution. Hardware-aware routing.
This is not a prototype. It is running today at mesh.trigan.org.
Why it matters
Every other AI company rents compute.
We own ours.
Cloud-dependent
Costs scale against you
When you rent compute, your unit economics get worse as you scale. Every token costs money. Every request has a margin.
Sovereign
Costs scale with you
When you own the hardware, your unit economics get better as you scale. The only ongoing cost is electricity and maintenance.
Our marginal cost per inference is electricity.
Yours could be too.
The economics
£400 for 8 K80 Tesla cards.
That's the economics of sovereign compute done right.
£400
8× K80 Tesla cards
100+
GPUs in the cluster
0
Cloud dependencies
3
Load balancers
7
Managed nodes
Owned hardware. No monthly cloud bills. No per-token markup. No vendor lock-in.
The mesh runs on hardware we bought outright. The only ongoing cost is electricity and maintenance.
For enterprise
Your compliance requirements.
Met without compromise.
On-premises deployment. Data never leaves your infrastructure. Full audit trail. EU data sovereignty compliant. GDPR aligned by design.
On-premises deployment
Run the entire mesh on hardware you control. No data ever leaves your infrastructure.
Data sovereignty
EU-compliant by design. Data stays within your jurisdiction at all times.
Full audit trail
Every inference request, routing decision, and model invocation is logged and traceable.
GDPR aligned by design
Privacy-first architecture. No data sharing, no third-party processing, no exceptions.
Zero vendor lock-in
Standard interfaces. Open model formats. Export everything, anytime, with no friction.
Your models. Your data. Your compliance requirements.
Architecture
How the mesh works
A layered architecture designed for resilience, performance, and observability.
Load balancers at the edge
Three independent load balancers distribute traffic across the mesh.
Managed GPU nodes across locations
Seven managed nodes with heterogeneous GPU support.
Raft consensus for coordination
Fault-tolerant coordination across all mesh participants.
Hardware-aware routing
Requests are routed based on GPU capability, model availability, and current load.
Automatic model distribution
Models are distributed to nodes based on hardware capabilities and demand.
Observability and monitoring built in
Full-stack observability: metrics, traces, and logs across every layer.
Own your AI infrastructure.
Sovereign compute. Zero cloud dependency. Near-zero marginal cost.