Algroveon-AI – Local AI Infrastructure on Your Own Home Server
Fully local AI infrastructure on dedicated hardware: GPU passthrough, Ollama-based language model inference, speech input and output.
Fully local AI infrastructure on dedicated hardware that enables language, text, and image processing without cloud dependency.
Proxmox home server with GPU-passthrough VM, Ollama for LLM inference, Faster-Whisper for speech input and Piper TTS for speech output.
Algroveon-AI is the heart of the entire AI structure. Here, the central AI services for my Algroveon projects run completely locally on dedicated hardware within my home network.
The reasoning is clear: I didn't want a distributed solution made up of individual cloud services, but rather my own controllable foundation where language, text, embeddings, and image generation converge in one place. This is precisely what defines Algroveon-AI's role within the overall project.
Infrastructure Overview
The home server runs on Proxmox as a virtualization platform. The core is a dedicated VM with full PCIe GPU passthrough:
Proxmox Home Server
├── GPU-VM (Ubuntu 24.04, 32 GB RAM, RTX PRO 2000 Passthrough)
│ ├── Ollama – LLM Inference (Gemma-4 26B A4B)
│ ├── Faster-Whisper – Speech Recognition (STT)
│ ├── Piper TTS – Speech Output (TTS)
│ └── ComfyUI – Image Generation
├── Agent-VM – Algroveon-Agent FastAPI services
└── other services – Mail, Git, News, Document Management
The AI services run in their own VM with direct GPU access. Other services, such as Algroveon-Agent, the news service, or other parts of the infrastructure, access this instance but remain technically separated by design.
This separation is crucial for the project. It ensures that the actual AI runtime remains stable and clearly defined, while other services can be independently developed, tested, or replaced.
Proxmox as a Foundation
In this project, Proxmox is not just the hypervisor, but the actual basis for the clean separation of roles within the system. The AI services run in a dedicated VM with direct GPU passthrough, while other services are intentionally operated in separate VMs or containers.
This offers two advantages for me: Firstly, the AI runtime remains stable and clearly isolated. Secondly, the rest of the infrastructure can be maintained, restructured, or expanded independently without having to touch the central AI instance every time.
GPU Passthrough and Resources
The GPU is passed directly to the AI VM via PCIe passthrough. This is exactly what makes the setup practical for local inference, as the language model and other AI services do not have to access the graphics card through indirect workarounds. The VM is allocated 32 GB of RAM and exclusive GPU access for this purpose.
This allocation was a deliberate choice. It provides Algroveon-AI with enough overhead for the permanently loaded language model, speech recognition, and other AI services, without restricting the entire server to just this one project.
Hardware
| Component | Model |
|---|---|
| CPU | Intel Core Ultra 7 265 (20 cores) |
| GPU | NVIDIA RTX PRO 2000 Blackwell, 16 GB GDDR7 |
| RAM | 64 GB DDR5 |
| GPU Power Consumption | 70 W |
| Noise Level | max. 0.6 Sone |
Choosing the GPU is the most important individual decision for such a system. For my setup, the RTX PRO 2000 is a sensible middle ground: relatively power-efficient, quiet enough for continuous home operation, and simultaneously powerful enough to run a larger language model locally. More VRAM would have been desirable, of course, but given the current prices of an RTX PRO 4000 or 6000, it was not justifiable for private use. Therefore, 16 GB GDDR7 at 70 W and a maximum of 0.6 Sone represent a conscious compromise between performance, noise, power consumption, and cost.
This is a vital point, especially in a home environment. A system can be technically fascinating—but if it becomes too loud, too expensive, or too inefficient during continuous operation, it quickly loses its practical value.
Why This Hardware
The hardware is not designed for maximum benchmarks, but for sensible continuous operation within a home network. The CPU, RAM, and GPU were chosen so that Algroveon-AI can run reliably as a central instance without noise, power consumption, or acquisition costs spiraling out of control.
In a private setting, this is a central consideration for me. A local AI system must not only work technically but also remain practical over the long term. That is exactly why the setup is more balanced than extreme.
Running Models
| Service | Model | VRAM |
|---|---|---|
| LLM | Gemma-4 26B A4B IQ4_XS (text-only) | ~13.7 GB |
| STT | Faster-Whisper large-v3-turbo (int8_float16) | ~1 GB |
| Embeddings | nomic-embed-text | CPU |
The primary model is Gemma-4 26B A4B in a text-only variant. To put it simply, it does not work with the complete model at every single step, but only with the parts required at that moment. This is exactly what makes it interesting for a local setup of this kind. Since Algroveon-AI does not require direct image processing within the model during speech operation and the context window is deliberately limited, the model fits entirely within the VRAM without needing to fall back to the CPU.
The primary model remains permanently loaded in the VRAM. Pure inference typically takes about 0.3 to 0.8 seconds. For the entire speech chain—from the wake word to the response—the system operates roughly in the range of 0.6 to 1.2 seconds.
This is the decisive factor for the project: Algroveon-AI should not only run locally but also feel direct and responsive in everyday use. Only then does a mere demo become a usable foundation for a real assistant in the home network.