Software Engineering Manager (AI Platform)

Remote • Posted 4 hours ago • Updated 4 hours ago

Full Time

Occasional Travel Required

Remote

$175,000 - $225,000/yr

Fitment

Dice Job Match Score™

🧠 Analyzing your skills...

Job Details

Skills

BEDROCK
AWS
RAG

Summary

You will join a globally distributed engineering organization of fewer than forty people as the tech lead for a scrum team and the people manager for that team. You’ll be accountable for architecture and execution, writing production code, and partnering across Product, Security, and Platform/Reliability to ensure AI features are trustworthy in the real world (evaluation/monitoring, tenant isolation, permissioned tool use, cost controls, and incident-ready operations).

This is a hands-on technical leadership role. Expect to spend significant time designing and shipping production code (Python on AWS). If you are primarily a people manager and are not currently hands-on, this role will not be a fit.

What You’ll Do

Build & Run the AI Platform Layer (Hands-On)

· Lead architecture and delivery of our AI platform services in Python 3.12+ using proven service patterns & platforms (FastAPI, Uvicorn, Pydantic, SQLModel/SQLAlchemy) and production-grade API behavior.

· Own AWS runtime and deployment patterns for the platform: ECS Fargate (API + MCP services), Lambda (doc processing + knowledge ingestion), and event-driven integration via S3 Events and EventBridge.

· Establish “paved road” standards so teams ship safely: service templates, PR/review discipline, CI/CD and environment promotion using Terraform + GitHub Actions (OIDC to AWS), and Docker build practices (including multi-arch where required).

Document Intelligence Pipelines (Warranty Docs → Structured Data)

· Own the end-to-end document processing pipeline (Amazon Textract, Claude Vision).

· Improve extraction quality using deterministic parsing/normalization and custom extractors (e.g., VIN, dates, currency, codes), with strong validation, traceability, and clear failure modes.

· Engineer for reliability and reprocessing: idempotency, bounded retries/timeouts, durable error handling, and controlled replay of failed/changed documents.

Retrieval & Knowledge Base Engineering (RAG That’s Measurable)

· Own the full lifecycle of Amazon Bedrock Knowledge Bases (multiple KBs such as policies/TSBs/procedures/codes): ingestion strategy, change control, and safe promotion across environments.

· Build evaluation and regression testing for retrieval quality (golden sets, automated checks, drift detection) and enforce quality gates so KB changes don’t silently degrade outcomes.

· Implement cost-aware, AWS-native retrieval using Bedrock RetrieveAndGenerate with vector storage in S3 Vectors, and track unit economics (latency and cost per workflow/document/claim).

Agent Workflows & Guardrails (Production, Not Demos)

· Deliver agent orchestration on Bedrock AgentCore Runtime using LangGraph state machines (checkpointing, interrupts, human-in-the-loop steps) with predictable behavior and well-defined failure handling.

· Integrate tools/connectors via the MCP SDK (e.g., DMS connector, VIN decoder, OEM portal tools) with permissioned access, auditable tool calls, and strict boundaries.

· Standardize operational guardrails: CloudWatch logs/metrics (structured JSON logging), security-by-default (Cognito OIDC/PKCE, WAF, Secrets Manager, least-privilege IAM), and runtime discipline for ARM64 AgentCore containers (repeatable builds, including QEMU multi-arch in CI when needed).

The Profile We’re Looking For

You bring strong technical depth, pragmatic leadership, and you can mentor and develop talent. You are proficient being hands-on, and you can scale standards, systems, and a team over time — balancing rapid iteration with production discipline and cost awareness.

· 8–12+ years building and operating backend/platform systems in B2B SaaS.

· Proven hands-on technical leadership for a small team (4–6); comfortable being accountable for architecture and delivery.

· Mastery of Python in production: FastAPI/services, async patterns, workflow orchestration, test discipline, and observability.

· Strong AWS depth (Lambda, ECS/Fargate, S3, IAM, RDS/Postgres, CloudWatch/EventBridge) plus IaC (Terraform preferred).

· Direct experience shipping AI-enabled systems on AWS (Bedrock/RAG, document intelligence such as Textract or multimodal extraction, and evaluation/quality monitoring).

· Experience building production agent workflows with guardrails (permissions, auditability, cost controls, failure modes).

Education

Bachelor’s degree in Computer Science, Engineering, or a related field — or equivalent professional experience delivering and operating enterprise-grade software platforms. Advanced degree is a plus but not required.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: CONTEMP
Position Id: 39243
Posted 4 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Remote or Plano, Texas

•

Today

The job profile for this position is Software Engineering Senior Advisor, which is a Band 4 Senior Contributor Career Track Role with Cigna-Evernorth Services Inc. Responsibilities- Partner with multiple vendors to create estimates and delivery plans for projects or programs across application families.Ensure the quality of technology delivery of multiple vendors' technology solutions for multiple concurrent projects or programs for application families.Provide technology leadership and deliver

Full-time

Gen AI / Agentic Engineer

Remote or Chantilly, Virginia

•

19d ago

#W2 Role Job Title: Gen AI / Agentic Engineer Location: Remote Type: W2 Contract Job Summary We are looking for a GenAI / Agentic Engineer to design, build, and deploy LLM-powered applications on AWS. This role is focused on real production engineering-APIs, RAG pipelines, agent workflows, evaluation, deployment, monitoring, and performance/cost tuning. Responsibilities Build and maintain LLM-powered backend services using Python and FastAPI (chat, search, summarization, Q&A). Design and impl

Easy Apply

Contract

Sr. IT Engineering Manager, Salesforce & Vault - Foster City

Remote or Foster City, California

•

Today

At Gilead, we're creating a healthier world for all people. For more than 35 years, we've tackled diseases such as HIV, viral hepatitis, COVID-19 and cancer - working relentlessly to develop therapies that help improve lives and to ensure access to these therapies across the globe. We continue to fight against the world's biggest health challenges, and our mission requires collaboration, determination and a relentless drive to make a difference. Every member of Gilead's team plays a critical ro

Full-time

USD 169,320.00 - 219,120.00 per year

Senior Software Engineer - Typescript and AWS services

Remote or Schaumburg, Illinois

•

Today

Description - External Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best.Here, you will find a culture guided by inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you he

Full-time

USD 91,700.00 - 163,700.00 per year

Search all similar jobs

Software Engineering Manager (AI Platform)

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs