Job Title: System Engineer (Application & Batch)
Location: Merrimack, NH (Onsite/Hybrid)
Duration: Long Term Contract
Job Overview
We are seeking an experienced System Engineer to join a Production & Site Reliability Engineering (SRE) team supporting large-scale, distributed systems. This role involves troubleshooting application and batch processing issues while ensuring system reliability, scalability, and performance in a cloud-based environment.
Shift & Support Expectations
12-hour team coverage model (US + offshore team) Standard shift: 8:00 AM – 5:30 PM EST Rotational extended shift (once every 5–6 weeks): 1:30 PM – 9:30 PM EST Weekend on-call rotation every 5–6 weeks (~5–6 total hours across weekend)
Key Responsibilities
- Troubleshoot and resolve application issues (70%) and batch processing issues (30%)
- Support and maintain highly distributed, multi-tiered systems
- Monitor system health using modern observability tools
- Analyze code and logs to identify root causes and recommend long-term solutions
- Collaborate with engineering teams to improve system reliability and performance
- Support cloud-based infrastructure and assist with migration initiatives
Required Skills & Experience
- 10+ years of hands-on experience supporting or deploying distributed systems at scale
- 4+ years of experience in AWS cloud support and migration
- Strong experience with monitoring tools such as DataDog, Prometheus, or Splunk
- Expertise in Oracle database and PL/SQL
Strong troubleshooting skills with ability to analyze:
- PL/SQL
- Shell scripts
- Perl scripts
- Advanced experience in writing:
- Stored Procedures
- Functions
- Triggers
Preferred Qualifications
- Experience in SRE or production support environments
- Familiarity with automation and performance tuning
- Strong problem-solving and communication skills
Additional Notes
- This role requires participation in an on-call rotation
- Candidate should be comfortable working in a high-availability, fast-paced environment