Early access - now open

Distributed AI inference at any scale

Run top open-source LLMs across a global network of GPUs and Apple Silicon via a simple API. Pay per request with standard USD - no contracts, no minimums.

Start building

< 40% Cost

OF TRADITIONAL CLOUD APIs

< 100ms

AVG TIME TO FIRST TOKEN

100% Prepaid

TOP UP WITH CREDIT CARD

Natively supporting open-source champions

Llama 3 (8B & 70B)Mistral v0.3Qwen 2.5Gemma 2

One API, any model

Get an API key, top up your balance, and run inference across the network with a single endpoint.

True pay-per-use

No subscriptions or seat licenses. Deposit with Stripe and spend only what your workload needs.

Hardware-Optimized

Requests are routed to the best available hardware, from high-VRAM gaming rigs to massive Apple Silicon unified memory.

example.py

import hiveops

client = hiveops.Client(api_key="your-api-key")

response = client.chat.completions.create(
    model="llama-3-70b",
    messages=[
        {"role": "user", "content": "Hello, world!"}
    ]
)

print(response.choices[0].message.content)