Author at Infinum

Author

Vjekoslav Drakšić

DevOps Engineer

Gemma 4: Half the GPUs, Most of the Quality

Google's new open-weight model matches GPT-OSS 120B on chat at half the hardware cost. Here's what our benchmarks show - and where it falls short.

GPU server rack in a dark blue data centre environment, representing self-hosted AI model inference infrastructure

Self-Hosting AI Models: A Practical Guide to Building Your Own Stack

Infrastructure decisions, model selection tradeoffs, and performance optimization techniques we encountered while building a self-hosted multi-model inference stack.