- Startseite
- Remote Jobs
- Platform Engineer — Kubernetes & AI Infrastructure
Platform Engineer — Kubernetes & AI Infrastructure
Eckdaten
Arbeitsmodell
Platform Engineer — Kubernetes & AI Infrastructure
Remote anywhere in Germany | HQ in NRW | Work from anywhere for up to 180 days per year
This is not a "keep the lights on" DevOps role...
You'll be part of the team responsible for running and operating a large-scale AI platform used in complex customer environments --- including highly customised on-premise infrastructure deployments.
The challenge here isn't just Kubernetes.
It's making a highly distributed, containerised system reliably run in environments you don't fully control.
That means troubleshooting under pressure, improving deployment processes, working directly with customer-side infrastructure teams, and owning the operational reality of a production AI platform end-to-end.
You'll be working on systems running more than 1,000 containers in production across a large microservice architecture, helping improve everything from CI/CD pipelines and observability to runtime stability and deployment reliability.
This role is heavily focused on runtime operations, incident handling, and delivery infrastructure --- not feature development.
The Engineering Muscle You Bring
- Experience with Kubernetes or OpenShift
- Strong understanding of distributed systems and microservice architectures
- Experience with CI/CD tooling such as Jenkins, GitHub Actions, Ansible, or ArgoCD
- Experience with monitoring and observability tooling such as Grafana, Loki, Prometheus, OpenTelemetry, Dynatrace, or Instana
- Knowledge of technologies like Redis, Postgres, MariaDB, Kafka, Elastic, or Minio
- Strong troubleshooting and analytical skills
- Hands-on engineering mindset with a strong sense of ownership
- Fluent German and English communication skills
Why This Role Appeals to People Who Like Complexity
- Large-scale production systems with 1,000+ containers running live
- Complex Kubernetes and OpenShift environments
- Real operational ownership instead of pure maintenance work
- Challenging on-premise customer deployments
- Exposure to modern AI platforms and distributed architectures
- High-impact work with lots of technical depth and learning potential
Khalifa@thryvetalent.com