Overview
Skills
Job Details
Senior Cloud Engineer
100% Remote
Hours: Eastern, Central and Mountain time zones
Security Clearance: Must be able to obtain Public Trust Clearance
NO C2C, W2 ONLY
ALTA IT Services is seeking a detail-oriented and proactive Sr. Cloud Engineer to design, implement, and manage observability solutions across our cloud infrastructure. In this role you ll be responsible for ensuring system reliability and visibility through best-in-class monitoring, logging and alerting practices across AWS. You ll work across operations and compliance teams to ensure our AWS workloads meet performance expectations while managing security, regulatory and cost-efficiency standards. This role is key to driving visibility, governance and financial accountability in our cloud environment.
Responsibilities:
Design and implement health checks and probes for cloud infrastructure and applications across AWS
Define and deploy readiness and liveness probes for containers running in EKS/ECS
Write custom scripts for CloudWatch custom metrics and alarms based on application specific probes
Implement alerting and remediation automation based on probe outputs
Document monitoring strategies, probe configurations and operational playbooks
Define monitoring strategies for cloud resources, microservices and containerized workloads
Implement automated health checks and uptime monitoring
Continuously optimize and evolve the observability stack to improve reliability and reduce noise
Configure and manage monitoring tools (CloudWatch, Grafana, Datadog, Prometheus)
Set up monitoring thresholds, dashboards, and metrics for application and infrastructure
Perform root cause analysis and incident correlation using monitoring and performance analysis tools
Maintain a central inventory of all licensed software deployed in AWS environments (Windows, Oracle, Red Hat, SQL Server)
Ensure compliance with vendor-specific licensing terms
Monitor usage patterns and perform license audits and reconciliation
Identify and remediate latency issues, throughput bottlenecks and underutilized resources
Recommend and implement right-sizing of compute, memory and storage resources
Analyze and optimize the performance of AWS resources, including EC2, RDS, Lambda, S3, ECS and EKS
Conduct performance profiling and benchmarking for applications hosted on AWS
Contribute to capacity planning, disaster recovery strategies and performance testing initiatives
Create reports on system performance trends and opportunities, capacity planning and cost-performance trade-offs
Required Qualifications:
BA/BS in IT, Computer Science or related field (or equivalent work experience may be accepted in lieu of the degree)
3+ years of experience in cloud infrastructure with emphasis on AWS
Strong experience with CloudWatch (metrics, logs, alarms) CloudWatch Synthetics (canary scripting), Route 53 health checks and failover strategies
Proficient in scripting languages like Python, Bash or Node.js.
Hands-on experience with CI/CD tools (GibHub, GitLab, Kubernettes, DevOps)
Cloud certifications (AWS DevOps Engineer, Solutions Architect Associate)
Proficient with license management tools and cost optimization platforms
Solid understanding of cloud architecture principles, autoscaling strategies and load balancing
Strong written and verbal communication skills for technical and non-technical stakeholders
Excellent analytical and problem-solving skills
Must be able to obtain and maintain a Public Trust clearance
Preferred Qualifications:
Hands-on experience with observability stacks like Grafana, OpenSearch, Datadog
Familiarity with FinOps practices and cost-performance trade-offs