Skip to content

VakeDomen/HiveNode

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

  hivecore

HiveNode

HiveNode is the worker component of the Hive system. It connects to a central HiveCore proxy and runs local inference using Ollama. By running HiveNode on any machine (on-premise, cloud, or behind firewalls), you can seamlessly join that machine’s compute resources to the HiveCore network, making it available to serve requests routed by the central proxy.

Table of Contents

  1. Overview
  2. Key Features
  3. Installation & Setup
  4. Configuration
  5. Running
  6. How it Works
  7. Logging & Monitoring
  8. Contributing
  9. License

1. Overview

In the Hive architecture:

  • HiveCore serves as the central proxy and gateway, managing queues and distributing inference requests.
  • HiveNode runs on worker machines. It connects out to HiveCore (so the worker does not need to be publicly accessible). Once connected, HiveNode polls inference jobs from HiveCore, which then uses its local Ollama server to process requests.

This design allows multiple machines, possibly spread across different networks, to operate as a single, unified inference cluster.

2. Key Features

  • Local Ollama Integration

    HiveNode processes requests by forwarding them to an Ollama instance running on the same machine (or accessible from that machine).

  • Multiple Concurrent Connections

    Each HiveNode can open several parallel connections to HiveCore, letting you scale inference throughput per worker.

  • Centralized Configuration & Scaling

    Workers require minimal configuration—just point them to HiveCore and set a valid key.

  • Extensible Logging

    Built-in InfluxDB logging for GPU usage (via NVML) and system metrics if environment variables are set.

3. Installation & Setup

  1. Prerequisites

    • Rust toolchain (for building HiveNode)
    • Ollama installed locally (or accessible at a defined URL). For some help for this see Configuration. Additinally you can use our provided script to handle installation and multi-instance setup.
    • valid Worker key generated by HiveCore. (See HiveCore’s management endpoints for instructions.)
  2. Clone the repository

    git clone https://github.com/VakeDomen/HiveNode.git
    cd HiveNode
  3. Configure

    • rename .env.sample to .env
    mv .env.sample .env
  4. Run

    cargo run --release

4. Configuration

A sample .env might look like:

# The address and port of HiveCore’s node connection server (default 7777 in HiveCore).
HIVE_CORE_URL=hivecore.example.com:7777

# Worker key provided by HiveCore admin. Must have "Worker" role.
HIVE_KEY=my-secret-key

# URL to your local or remote Ollama instance, e.g. "http://localhost:11434"
OLLAMA_URL=http://localhost:11434

# Number of parallel connections to open to HiveCore. Best to match Ollama configuration 
CONCURRENT_RQEUESTS=4

# (Optional) InfluxDB settings for logging
INFLUX_HOST=http://localhost:8086
INFLUX_ORG=MY_ORG
INFLUX_TOKEN=my-token
  • HIVE_CORE_URL: Where HiveNode connects to HiveCore (must match HiveCore’s NODE_CONNECTION_PORT, by default 7777).
  • HIVE_KEY: The Worker key from HiveCore’s admin interface. Required for authentication.
  • OLLAMA_URL: The local or remote address of the Ollama service. Typically, Ollama runs on the same machine at http://localhost:11434.
  • CONCURRENT_RQEUESTS: Sets how many parallel connections (and thus concurrent tasks) this HiveNode should proxy. Adjust based on your hardware resources and Ollama configuration.
  • INFLUX_*: (Optional) If configured, HiveNode will record logs and GPU usage metrics to InfluxDB. If not provided, it simply won’t log to Influx.

Ollama setup

HiveNode expects an active Ollama server at the URL specified by OLLAMA_URL in your .env. The default installation of Ollama is enough if you only need one Ollama instance and want to place all requests through that instance.

If you have multiple GPUs or want to run several Ollama servers (perhaps each pinned to different GPUs), you can use our provided start_ollama.sh helper script.

This script:

  • Checks if netstat and curl are installed (and installs them if not).
  • Installs Ollama if not already present.
  • Lets you pick how many Ollama instances to run and how to assign GPUs to each instance.
  • Finds free ports for each Ollama instance, runs it, and logs their output.
chmod +x start_ollama.sh
./start_ollama.sh

5. Running

After configuring the .env file, run:

cargo run --release

Or compile and execute binary directly:

cargo build --release
./target/release/hivenode

6. How it Works

  1. Authentication
    • On startup, each of the CONCURRENT_RQEUESTS threads tries to authenticate to HiveCore using the key in HIVE_KEY.
    • Upon successful auth, HiveNode advertises its versions and its supported models to HiveCore.
  2. Polling & Proxying
    • HiveNode periodically polls HiveCore for incoming tasks. If HiveCore’s queue has work for a given model, it dispatches it to the node.
    • HiveNode forwards the request to OLLAMA_URL for local inference (via the Ollama HTTP API), then streams the response back to HiveCore.
  3. Reconnection & Control
    • If the connection drops or an error occurs, HiveNode waits briefly, then reconnects.
    • HiveCore can issue commands like REBOOT or SHUTDOWN, which HiveNode listens for in the incoming messages (handy for maintenance or upgrades).
  4. Scaling
    • To allow more capacity on the same machine, increase the CONCURRENT_RQEUESTS count.
    • To add more workers across multiple machines, simply run additional HiveNode instances (each with its own .env and valid Worker key).
    • Hive supports having multiple workers on the same machine, but each worker should have it's own token. (handy if multiple ollama servers are split among multiple GPUs)
  hivecore

7. Logging & Monitoring

HiveNode can log system metrics (like GPU usage, memory and requests proxied) to InfluxDB:

  • Enable Influx Logging: Provide INFLUX_HOST, INFLUX_ORG, and INFLUX_TOKEN in the .env.
  • GPU Metrics: HiveNode uses NVML to gather GPU info (power usage, memory usage, etc.). This is only collected if an NVIDIA GPU is present and NVML is available on the system.
  • Request Streaming: All inference requests and responses can be logged with success/error tags.

These metrics are pushed to Influx in the background. If any of the Influx environment variables are missing or invalid, HiveNode just skips that monitoring.

8. Contributing

We welcome pull requests! Before submitting, please open an issue to discuss your proposed changes. Make sure to:

  • Keep code style consistent.
  • Update documentation if adding or changing features.

9. License

HiveNode is distributed under the MIT License, just like HiveCore. See the LICENSE file in this repository for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published