Tool profile

defilantech/LLMKube

stable

Kubernetes operator for self-hosted LLM inference across a heterogeneous GPU fleet: NVIDIA CUDA, AMD Vulkan, and Apple Silicon Metal. Runtimes: llama.cpp, vLLM, TGI, mlx-server. Multi-GPU sharding, model caching, OpenAI-compatible endpoints. Apache-2.0, run across homelab and on-prem fleets, actively developed.

aiapple-siliconautoscalingedge-computingggufgpuhomelabinference

Velocity score

0.72/ 10

[STARS]

162

[FORKS]

[CONTRIBUTORS]

[LAST_COMMIT]

today

OPEN_ON_GITHUB

Is defilantech/LLMKube still actively maintained?

Score breakdown

680/ 1000

inference · defilantech/LLMKube

Velocity50%

Adoption30%

Maintenance15%

Community5%

[CODE_GROWTH]

802

[INSTALL_VEL]

498

[ACTIVITY]

653

[COMMUNITY_SIGNAL]

632

Terminal score: 0–1000 raw, weighted across 4 dimensions. Public score: 0–10 normalized (shown in the 30-day stars chart above).