Senior Software Engineer - Grafana Databases, Managed Services | Germany | Remote

Eckdaten

Deutschland
Computer Software

Arbeitsmodell

Vollständig remote
Nur DE
vor 2 Tagen
Stellenbeschreibung

Grafana Labs, the company behind the open observability cloud, is founded on the principles of open source, open standards, open ecosystems, and open culture. Grafana Cloud, our fully managed observability platform, is flexible and built for scale. With Grafana Cloud's actually useful AI, organizations can see, understand, and act on all their disparate data to move at the speed of their ambitions. Today, more than 35 million users and 7,000+ customers -- including Anthropic, Bloomberg, NVIDIA, Microsoft, and Salesforce -- trust Grafana Labs to ensure reliability of their applications and systems, resolve incidents quickly, and optimize their telemetry to reduce noise and cost. We are a 100% remote company with 1,600+ team members across 40+ countries.

We're scaling fast and staying true to what makes us different: an open-source legacy, a global collaborative culture, and a passion for meaningful work. Our team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything we do.

You may not meet every requirement, and that's okay. If this role excites you, we'd love you to raise your hand for what could be a truly career-defining opportunity.

This is a remote opportunity and we would be interested in applicants living in German time zones only at this time.

The Opportunity

The Managed Services team is a newly formed squad within the Databases department. It owns and operates shared, production-critical infrastructure that powers Grafana Cloud's next-generation database products (Mimir, Loki, and Tempo). Today, this includes operating 100+ WarpStream clusters across multiple cloud providers and regions, with continued growth anticipated for the future. WarpStream acts as the streaming backbone for ingestion and read/write decoupling across databases. It sits directly on the hot path for metrics, logs, and traces, handling high-throughput, multi-consumer workloads at massive scale.

In addition to streaming infrastructure, the team works closely with high-volume analytical and storage systems that power query-heavy and aggregation-heavy workloads, where latency, compression behavior, storage layout, and scaling characteristics matter deeply.

What You'll Be Doing

As a Senior Engineer on Managed Services, you will take ownership of running these systems in production. This involves:

  • Operating and evolving 100+ multi-cloud streaming clusters and related database infrastructure
  • Diagnosing and eliminating cross-layer failure modes (e.g., object storage latency, noisy neighbors, control-plane bottlenecks, query performance regressions, etc.)
  • Designing safe upgrade and rollout strategies at scale
  • Improving observability, automation, and operational ergonomics
  • Partnering closely with database and platform teams to ensure safe scaling, partitioning, consumer fan-out, and query performance
  • Working directly with distributed systems behavior, Kubernetes scheduling dynamics, storage engines, compression trade-offs, etc.
  • Serving as a primary escalation point and on-call for relevant incidents
  • Owning the relationship with all system vendors, including WarpStream Labs and others.

Requirements

  • 6+ years of engineering experience, including meaningful time in SRE, platform engineering, production engineering, infrastructure engineering, or distributed systems roles.
  • Experience operating distributed systems in production (e.g., streaming systems, analytical databases, large-scale storage backends). Examples of these include Kafka, Redpanda, WarpStream, Postgres, ClickHouse, Snowflake, or Cassandra.
  • Strong Kubernetes experience in AWS, GCP, or Azure, and familiarity with infrastructure-as-code tooling (Helm, Terraform, Jsonnet, etc.).
  • Solid understanding of distributed systems design and large-scale system trade-offs.
  • Proficiency in at least one programming language (Go preferred, but not required).
  • Working knowledge of Linux internals, networking, cloud storage, and performance/scaling behavior.
  • Experience participating in blameless incident response and writing high-quality post-incident reviews.
  • Clear communicator who can collaborate across teams and work autonomously.
  • Curious, pragmatic, action-oriented, and kind.

Compensation & Rewards

In Germany, the Base compensation range for this role is EUR 97,034 - EUR 116,44. Actual compensation may vary based on level, experience, and skillset as assessed in the interview process. Benefits include equity, bonus (if applicable) and other benefits.

All of our roles include Restricted Stock Units (RSUs), giving every team member ownership in Grafana Labs' success.

Why You'll Thrive At Grafana Labs

  • 100% Remote, Global Culture
  • Scaling Organization
  • Transparent Communication
  • Innovation-Driven
  • Open Source Roots
  • Empowered Teams
  • Career Growth Pathways
  • Approachable Leadership
  • Passionate People
  • In-Person onboarding
  • Balance is Key (30 days annual leave)

Equal Opportunity Employer

Grafana Labs is an equal opportunities employer. We welcome applications from everyone regardless of race, colour, nationality, origin, caste, sex, gender reassignment identity or expression, sexual orientation, age, religion or belief, disability, veteran status, genetic information, pregnancy, maternity, marital, family or carer status, or any other characteristic which is protected by local law.