Bereits vergeben

Lass dir die nächste nicht entgehen — erhalte passende Stellen direkt per Mail.

CL

Cloudiax

DevOps/Platform Expert (AI & Agentic Systems)

Remote
Gleitzeit, Vollzeit
vor 1 Monat
Deutschland
Stellenbeschreibung

About Cloudiax

Cloudiax is a leading provider of cloud technologies with more than 280 international partners and over 1100 SAP customers in 90 countries. We enable small and medium-sized businesses worldwide to use applications such as SAP Business One and AI solutions in our cloud. As a global market leader, we offer a secure, fast, and reliable cloud platform -- Made in Germany.

With data centers in Germany, Canada, and Singapore, we ensure the highest service quality around the clock.

To strengthen our team, we are looking for you -- dedicated, technically strong, and ready to take responsibility. Starting immediately and remote.

Responsibilities

Cloud-Native & Infrastructure

  • Kubernetes: Deep experience in cluster orchestration, GPU scheduling, device plugins, and tenant isolation in data centers.
  • Hardware Abstraction: Practical experience with Multi-Instance GPU (MIG) for efficient, secure partitioning of physical GPUs across different customer workloads.
  • Managed Backends: Secure operation of Managed Postgres and scaling of vector databases for performant retrieval architectures.
  • Interfaces & Security: Knowledge of Keycloak, Kong API Gateway, or comparable tools for secure access and precise billing.
  • DevOps & CI/CD: Experience with Git, CI/CD pipelines, and Infrastructure-as-Code for fast, reliable, and documented deployments.

AI Expertise & Inference Logic

  • Inference Optimization: Experience with KV caching, batching, quantization, and serving frameworks like vLLM or NVIDIA Triton.
  • Model Combination & Cost Management: Knowledge of how to combine small specialized and large generalist models to optimize costs and latency -- Open and Closed Models.
  • Quality Assurance: Techniques for reducing hallucinations, e.g., Retrieval-Augmented Generation (RAG) and providing valid data contexts at the infrastructure level.
  • Agents & Frameworks: Operationalization of LangChain, LangGraph, or AutoGen, as well as management of complex Deep Agents that autonomously execute multiple steps.

Monitoring & Scaling (AI-Native)

  • Observability: Tracing for agent decisions (e.g., OpenTelemetry, LangSmith) to make data center processes traceable.
  • AI-Specific Auto-Scaling: Scaling based on token throughput or model context utilization, not just CPU metrics.

Qualifications

You don't have to be a prompt engineer, but you understand how AI "works". It is important that you can quickly grasp new approaches (e.g., inference methods or agent structures) and integrate them into stable, tenant-capable data center infrastructures.

  • Experimentation: Enjoy working with systems that do not always react deterministically.
  • Security & Safety: Awareness of AI security (sandboxing, protection against prompt injections) in every system.

Benefits

  • 100% remote workplace with great time flexibility
  • Attractive annual salary, as well as automatic KPI-based salary increases and attractive annual bonuses.
  • 30+ days of vacation.
  • Fully equipped premium home office workspace.
  • Company (e)bike, supplementary company health insurance, and other corporate benefits.
  • Work in an international environment at one of the world's leading cloud providers in the SAP environment.

Have we sparked your interest? Then upload your complete application documents here (CV, certificates, salary expectations, earliest possible start date).