On-prem Cloud Engineer
May 09, 2026
Job DutiesBuild, configure, and operate on‑prem Kubernetes/OpenShift AI platforms for deploying and serving GenAI models and LLM inference workloads.· Design and optimize high‑performance inference stacks using vLLM, TensorRT‑LLM, Triton Inference Server, SGLang, and advanced techniques (continuous batching, speculative decoding, KV caching).· Manage GPU orchestration and capacity using Run:AI, MIG, CUDA/NCCL, …
Charlotte, NC, United States of America