Overview
Skills
Job Details
Client Context
- Reason for Need:
- Recently closed a seed funding round, enabling budget for fractional support.
- Approaching General Availability (GA) release, which requires more rigor in deployment, monitoring, and alerting.
- Current State:
- Existing engineer is handling DevOps tasks but is not an expert relies on research and chatbots.
- Risk of losing this engineer to core development work.
Role Details
- Commitment:
- ~10 hours/week to start. with potential to scale as the company grows.
- Workload may spike during releases.
- Scope Beyond Deployment:
- Implement alerting on failures, monitoring, and occasional advanced operational tasks.
- Build cloud automation and improve reliability for GA readiness.
- Skills:
- Azure expertise (aligns with JD).
- Microsoft certifications (e.g., AZ-400) preferred.
- Familiarity with multi-cloud (some services on AWS, optional).
- Ability to work with SharePoint integrations (integral to product).
Candidate Profile
- Experience:
- Must have taken a platform from nothing to a cloud operations environment
- Strong ability to build from scratch and implement best practices.
- Soft Skills:
- Ability to collaborate with engineering teams, explain decisions clearly.
- Creative problem-solving company values thinking outside the box to modernize an antiquated system with thousands of rules.
- Interview Process:
1. Introductory interview: assess soft skills, personality, cultural fit.
2. High-level technical discussion.
3. Deep technical interview: live coding task (candidate chooses language), starting from an empty editor.
About the Role/Full JD
We are looking for a DevOps/CloudOps Engineer to own and evolve the infrastructure and delivery pipelines for our SaaS platform. You will work closely with backend, frontend, and product teams to build reliable, scalable, and secure environments on Microsoft Azure, with a stack centered on:
- Frontend: React
- Backend: Python, FastAPI, SQLAlchemy
- Data: Azure-hosted PostgreSQL (multi-database / multi-tenant)
- Infra & CI/CD: Azure Container Apps (ACA), GitHub / GitHub Actions
- Integrations: SharePoint (via Microsoft Graph / APIs)
If you enjoy automating everything, designing clean delivery pipelines, and keeping complex systems running smoothly, this role is for you.
Key Responsibilities
Platform & Infrastructure
- Design, build, and maintain Azure infrastructure for a multi-instance SaaS platform using Azure Container Apps for containerized workloads, Azure Postgres for multi-database setups, and supporting services such as Key Vault, Application Insights, Log Analytics, and Storage.
- Implement scalable, highly available architectures across development, staging, and production environments.
- Manage networking and security, including VNets, private endpoints, network security groups (NSGs), RBAC, and managed identities.
CI/CD & Automation
- Own and improve CI/CD pipelines using GitHub Actions, including build/test workflows for React and FastAPI services.
- Configure Docker image builds and pushes to Azure Container Registry, and automated deployments to Azure Container Apps (including blue-green or canary strategies).
- Integrate database migrations (Alembic/SQLAlchemy) into release workflows to ensure consistent schemas across environments.
- Standardize and document branching, versioning, and environment promotion strategies.
Application Runtime & Operations
- Collaborate with engineers to containerize services and define runtime configuration, including environment variables, secrets, health checks, probes, and resource limits.
- Implement and tune connection pooling and database access for high-throughput FastAPI workloads.
- Set up monitoring, logging, and alerting for application health, infrastructure performance, and capacity, including scaling rules, throughput, latency, and error rates.
- Troubleshoot and resolve production incidents, perform root cause analysis, and implement preventive improvements.
SharePoint & Ecosystem Integration
- Support and enhance integrations with SharePoint, typically via Microsoft Graph API or SharePoint API.
- Configure and manage OAuth2 / Azure AD application registrations, permissions, and token flows for secure SharePoint access.
- Ensure secure and reliable data exchange between the SaaS platform and SharePoint for document management, file synchronization, and metadata handling.
Security, Reliability & Compliance
- Manage secrets and sensitive configuration with Azure Key Vault and GitHub Secrets.
- Implement security best practices in CI/CD, including scanning, least privilege access, protected branches, and deployment approvals.
- Contribute to backup, restore, and disaster recovery strategies for infrastructure and data.
- Support compliance and governance efforts, including auditability, data retention, and data residency.
Collaboration & Culture
- Work closely with engineering teams to promote DevOps practices and a you build it, you run it mindset.
- Produce clear documentation for environments, pipelines, and runbooks.
- Provide technical guidance to other engineers on cloud-native patterns, observability, and deployment best practices.
Required Qualifications
- 3 7+ years of experience in DevOps, Site Reliability Engineering, or Infrastructure Engineering roles, ideally in a SaaS environment.
- Strong hands-on experience with Microsoft Azure, including Azure Container Apps or AKS, Azure Postgres, Key Vault, Application Insights, Log Analytics, and related services.
- Solid experience designing and maintaining CI/CD pipelines, preferably with GitHub Actions.
- Proficiency with Docker and container-based deployment patterns.
- Experience deploying and operating Python/FastAPI services in production, including logging, metrics, and health checks.
- Practical knowledge of SQLAlchemy and database migrations (e.g., Alembic) in a multi-environment setup.
- Good understanding of PostgreSQL operations, including performance tuning, indexing, connection pooling, and backup/restore.
- Familiarity with front-end build and deployment workflows (React, npm/yarn, environment-specific builds).
- Experience in integrating with SharePoint or other Microsoft 365 services using Graph API, OAuth2, and Azure AD is strongly preferred.
- Strong scripting skills (e.g., Python, Bash, or PowerShell) for automation.
- Comfort with infrastructure as code (e.g., Bicep, ARM templates, Terraform, or similar).
Nice-to-Have
- Experience with multi-instance SaaS architectures and multi-database strategies.
- Background with AKS, Azure Functions, or other Azure PaaS offerings.
- Familiarity with security and compliance frameworks such as SOC 2, ISO 27001, or GDPR.
- Experience with monitoring and observability stacks (e.g., Prometheus, OpenTelemetry).
- Previous work in a regulated or enterprise environment (e.g., healthcare, finance, manufacturing).
Soft Skills
- Strong communication skills and the ability to explain technical concepts to non-technical stakeholders.
- Ownership mindset, with a focus on reliability, performance, and developer experience.
- Comfort working in a collaborative, cross-functional team environment.
- Pragmatic, outcome-oriented, and able to balance speed versus safety.