Are you a senior engineer who can keep large, AI-augmented systems running\\nreliably at Apple scale? Apple's Stability Engineering team is looking for a\\nseasoned engineer to join our Core team in San Diego. We build and operate the\\nplatforms, services, and infrastructure that turn crash reports from Apple\\ndevices into actionable engineering insights. You'll work on systems where\\nLLMs and agents are already part of the production fabric - evolving them,\\nhardening them, and using AI tools to extend what a small team can deliver.
Our team owns the end-to-end platform behind stability analysis at Apple:\nsymbolication of crash logs across the company's hardware portfolio, the data\npipelines that aggregate and cluster crash logs, and the applications and\nservices that engineers across Apple use every day to drive operating-system\nquality. This role is about keeping that platform healthy, extending it\ndeliberately, and making the engineering team itself more effective by using\nAI tools well.\n\nDay to day, you'll spend most of your time on the engineering work of running\nreal systems: tuning evaluation infrastructure, tightening operational\ncontrols, improving auditability and debug trails, and scaling the workflows\nour analysts rely on. When new capabilities are needed, you'll prototype and\nintegrate them into the platform. You'll partner closely with stability\nanalysts who are domain experts in OS reliability, and with the broader team\nresponsible for symbolication, ETL, and service infrastructure. You'll also\nbe expected to use AI-assisted development tools fluently to investigate\nissues, refactor at scale, and ship more with a small team.\n\nWe're looking for someone with the rigor of a seasoned production engineer\nwho is also comfortable operating systems that include LLMs and agents as\nfirst-class components. If you enjoy taking responsibility for a complex,\nalready-running platform and making it steadily better, we want to talk.
5+ years of professional software engineering experience building and operating production systems\nBS in Computer Science or a related field, or equivalent practical experience\nFluent use of AI-assisted development tools (coding agents, code review assistants, etc.) to work effectively at scale\nDemonstrated experience designing and scaling distributed systems (load balancing, active-active topologies, capacity planning, throughput-bound services)\nTrack record of maintaining and evolving production services - observability, operational controls, incident response, and steady iteration on existing systems\nStrong full-stack instincts; comfortable spanning data infrastructure, backend services, and the user-facing surfaces that consume them\nProven ability to operate independently on ambiguous, open-ended problems where the right answer is not obvious
Experience operating LLM- or agent-based features in production environments over time\nExperience building or maintaining evaluation harnesses, audit trails, or\nreplay infrastructure for AI systems\nBackground in developer tools, observability, crash/stability analysis, or other operating-system-quality domains\nFamiliarity with one or more of: Ruby on Rails, Node.js/TypeScript, Python for production services\nExperience working in environments with significant deferred scalability work (capacity-constrained, long-lead-time infrastructure)
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
- Dice Id: 90733111
- Position Id: 295eecb0bad83884c048d4a92f39af0e
- Posted 12 hours ago