AI Engineer (f/m/d) - Remote in Germany

Eckdaten

Bremen

AI Training

Arbeitsmodell

Vollständig remote

Nur DE

vor 1 Monat

Stellenbeschreibung

Most AI teams talk about evaluation. At Synera, you'd own it --- end-to-end, in production, for two real agentic products used by companies like BMW, Airbus, and NASA.

WHAT YOU WILL DO

You'll join the Agentic Ants team as our second native AI engineer. You'll build and extend our agentic systems alongside the rest of the team --- new tools, sub-agents, prompt iterations --- and co-own the evaluation framework that gates every change: golden datasets, LLM-as-judge pipelines, regression suites, production-trace mining. This isn't about adding scores to a notebook --- it's about shipping agents the team can trust, and proving it.

You'll work across both Synera MAS and Synera Assistant, picking up reliability concerns (error handling, retries) as they show up in your work. Partnering with Ruben and the wider engineering team, you'll also help the team go deeper on customer insights data.

👉 What a week at Synera could look like:

Monday: Kick off sprint planning with the Agentic Ants team, review Langfuse traces from the weekend, and flag any new failure modes worth triaging.
Tuesday: Work on the golden dataset for the supervisor routing surface --- curating examples, versioning the set, and writing evaluators with Ahmed.
Wednesday: Join a cross-team sync with QA and product to align on new eval coverage for an upcoming agent feature, then push a CI integration so eval regressions block the next PR.
Thursday: Pair with the AI and software engineers on extending our agentic system --- a new tool, a routing tweak, or a prompt iteration --- then write the eval that gates the change before it ships.
Friday: Review a calibrated LLM-as-judge output alongside human labels, refine the rubric, and share findings in the eng review.

⚡ In 6 months:

The evaluation framework is live with golden datasets across multiple agent surfaces, CI gates are blocking on regressions, and the team actually trusts the results.
You've shipped multiple meaningful changes to the agent graphs --- new tools, sub-agents, or routing improvements --- that are measurably better in production.

⚡ In 12 months:

Online evaluation is a habit --- production traces feed continuously back into datasets, and the improvement loop is running without manual intervention.
You co-own at least one agent surface with the team and have become the go-to person for evaluation methodology at Synera. The combination has measurably improved product quality.

Here is our team at the summer event - join us at the next one! 👋

🌱 WHAT'S IN IT FOR YOU?

We believe in transparent conversations about compensation from the start. For this role, our planned salary ranges are:

Experienced Level: 77,000 - 97,000 EUR annually

We determine the level we hire you for based on your experience, the scope of responsibilities you'll take on, and the impact you can drive. While we typically hire within these bands, we're open to some flexibility for candidates who bring exceptional value to the role.

What we offer beyond salary:

Flexible working: you decide when & where to work (as long as you have a residency in Germany).
Flexible public holidays: swap days off according to your values and beliefs!
Home office setup support access to our office in Bremen.
Personal development budget of €2,000 to attend conferences and trainings or buy interesting books to improve in an area of your choice.
We don't count your vacation days, as we trust all our team members to decide what's best for them and the company.
Prefer two wheels over four? 🚲 We've got you covered with JobRad.
To support your personal and professional well-being, we offer company fitness with Wellpass and mental health platform nilo.
Regular team events, virtual coffee breaks, and spontaneous afterworks. We also get together as a whole company for 2-3 day off-sites twice a year! 🌈