Apply Now

Senior AWS Agentcore Platform Engineer

Reading, PA, US • Posted 5 days ago • Updated 2 hours ago

Full Time

On-site

$As per market

Fitment

Dice Job Match Score™

🧠 Analyzing your skills...

Job Details

Skills

Senior AWS Agentcore Platform Engineer

Summary

Job Title: Senior AWS Agentcore Platform Engineer

Location: Reading, PA (Hybrid 2-3 days a week from office)

Job Type: Full time position

Interview process: Team Interview

Job Description

We are looking for a highly technical Lead Platform Engineer to architect the observability, cost governance, and security framework for our enterprise AI agent ecosystem. You will be responsible for ensuring our agentic workflows-built on AWS Bedrock, AgentCore, and MCP servers-are scalable, observable, and cost-efficient.

The ideal candidate bridges the gap between traditional DevOps and the emerging world of LLMOps, with a deep focus on distributed tracing for non-deterministic AI workloads.

Requirements

Experience: 8+ years in Platform Engineering, DevOps, or Site Reliability Engineering (SRE).

Cloud Expertise: Deep proficiency in AWS (IAM, CloudWatch, Bedrock, Lambda).

Observability Tools: Proven experience with Dynatrace, Jaeger, or Honeycomb, and distributed tracing standards.

AI/LLM Interest: Familiarity with the LLM lifecycle, including prompt execution, token usage, and frameworks like LangChain or AgentCore.

Automation: Advanced experience with Terraform and CI/CD pipeline design.

Collaboration: Experience working in an Agile environment with integrated tools like Microsoft Teams and Confluence.

Job Responsibilities

Observability
Assess CloudWatch, X-Ray, Bedrock logging, AgentCore traces vs. agentic workflow requirements; produce gap analysis, Setup observability in Dynatrace
Design post-deployment validation pipeline for agents & MCP servers (deployment health + tool registration checks)
Implement distributed tracing & structured logging: LLM decisions, tool selections, sub-agent calls, MCP interactions
Evaluate LangFuse / LiteLLM proxy vs. AWS-native; deliver target-state observability architecture recommendation
Cost Tracking & TCO
Extend tagging taxonomy to cover agent runtimes, MCP servers, vector DBs, Bedrock token consumption per namespace
Design cost visibility model: aggregate agent, MCP, vector DB, and Bedrock token costs per team/department
Build CloudWatch (or equivalent) dashboards for per-team spend; configure AWS Budgets with alerting thresholds
Automate cost reports delivered via email / Microsoft Teams; implement anomaly detection rules
Monitoring & Alerting
Define P1 P4 alerting rules: deployment failures, runtime errors, tool invocation failures, MCP connectivity issues
Integrate alert notifications to Microsoft Teams channels and email; route by resource ownership tags
Author runbooks linked to every alert; publish in Confluence for developer self-service resolution
Evaluate AWS-native vs. third-party monitoring stack; deliver recommendation aligned to observability architecture
Security & Access Control
Assess current IAM + tagging approach for multi-team isolation; identify scalability gaps and risks
Evaluate Cedar policy engine (AgentCore) for fine-grained tool access control; document enterprise-scale gaps
Design scalable ABAC-based identity model for multi-team isolation without IAM policy sprawl; deliver Terraform modules

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10421780
Position Id: 2026-63056
Posted 5 days ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Lead Site Reliability Engineer

Philadelphia, Pennsylvania

•

Today

Location: Philadelphia, PA Salary: $150,000.00 USD Annually - $180,000.00 USD Annually Description: We are seeking a Lead Site Reliability Engineer (SRE) who combines deep technical expertise with strong leadership and client-facing capabilities. This is a high-impact role responsible for ensuring the reliability, scalability, and performance of our cloud infrastructure and kiosk platform. You will lead a team of engineers while remaining hands-on, owning uptime, SLAs, and incident managemen

Contract

USD 150,000.00 - 180,000.00 per year

Platform Engineer

Remote or Florida

•

Today

job summary: Looking to drive the future of automated, intelligent cloud infrastructure? A premier, remote-first financial technology firm is seeking an execution-focused engineer to build and operate a high-scale AI inference platform. In this role, you will bridge the gap between cutting-edge data science models and production application code within a completely cloud-native, Kubernetes-free environment. This full-time permanent placement provides a competitive base salary of $160,000 - $190

Full-time

USD160,000 - USD190,000

Senior Site Reliability Engineer - Security

No location provided

•

Today

Scopely is looking for a Site Reliability Engineer- Security to join our Gen AI team in Bangalore! At Scopely, we care deeply about what we do and want to inspire play, every day - whether in our work environments alongside our talented colleagues, or through our deep connections with our communities of players. We are a global team of game lovers who are developing, publishing and innovating the mobile games industry, connecting millions of people around the world daily. For this role, we are

Full-time

Cloud Engineer - Senior (Observability - Datadog)

Remote

•

Today

The Cloud Engineer - Senior (Observability - Datadog) supports the SEC ISS contract by engineering, operating, and continuously improving the enterprise observability platform across hybrid cloud and containerized environments. This role is hands-on: instruments services with distributed tracing, code-level profiling, and custom metrics; builds and tunes Datadog (or comparable) dashboards, alerts, APM, log pipelines, RUM, and synthetic monitors; then uses that telemetry to solve production perfo

Full-time

USD 87,100.00 - 157,450.00 per year

Search all similar jobs