Overview
On Site
USD 160,000.00 - 240,000.00 per year
Full Time
Skills
Interfaces
Database
Testing
Management
Managed Services
Data Centers
Unit Testing
Continuous Integration and Development
Performance Tuning
SLA
Disaster Recovery
Bloomberg
Python
TypeScript
Computer Science
Unix
Shell Scripting
TCP/IP
Computer Networking
OSI Model
Client/server
Continuous Integration
Continuous Delivery
Writing
CHAOS
Splunk
Grafana
Reporting
GitHub
JIRA
Product Ownership
Training
Life Insurance
Job Details
Senior Software Engineer/SRE - Automated Disaster Recovery
Location
New York
Business Area
Engineering and CTO
Ref #
10045491
Description & Requirements
The Team: We are the Platform Database Services Disaster Recovery as a Service SRE team (DRaaS), charged to administer the end-to-end testing of Bloomberg's datacenters for disaster recovery scenarios of numerous services which support applications that constitute Bloomberg's line of products! On any given day we're inventing, engineering, developing, building, coding, trouble-shooting and maintaining a wide range of: tools, monitors, frameworks, interfaces, protocols, solutions and best-practices around Disaster Recovery. These components stitch together a robust suite of automated and self-healing systems that manage the services that the Platform Database Services SRE team provides to the rest of the firm.
What's in it for you:
You will be part of a team that works to help meet company and regulatory defined Disaster Testing standards. Manage and develop solutions that support various disaster recovery tools, creating these applications to integrate the services they provide into the Bloomberg operational environment as well as Bloomberg products. This in-house tooling suite is required to test our clusters and managed services that reside in our datacenters and nodesites in an automated, scale-able and self driven fashion, complete with accompanying metrics and transparency tools that would be required for internal and external clients. Tooling is expected to be written with end-to-end unit testing and continuous integration to provide the highest level of stability.
We have product ownership and "the classic SRE responsibilities" such as: system tuning, performance analysis, defining and following availability targets such as SLA's, SLO's and SLI's as well as having immediate access to the experts that are designing and coding the Bloomberg specific components, APIs and methods used by and supporting the disaster recovery infrastructure. You'll receive insight and entry to the lowest levels of how Bloomberg applications interact with each other and the runtime environments for the purposes of both in-depth troubleshooting and enhancing stability, reliability, performance and feature-set.
You'll need to have:
Salary Range = 00 USD Annually + Benefits + Bonus
The referenced salary range is based on the Company's good faith belief at the time of posting. Actual compensation may vary based on factors such as geographic location, work experience, market conditions, education/training and skill level.
We offer one of the most comprehensive and generous benefits plans available and offer a range of total rewards that may include merit increases, incentive compensation (exempt roles only), paid holidays, paid time off, medical, dental, vision, short and long term disability benefits, 401(k) +match, life insurance, and various wellness programs, among others. The Company does not provide benefits directly to contingent workers/contractors and interns.
Location
New York
Business Area
Engineering and CTO
Ref #
10045491
Description & Requirements
The Team: We are the Platform Database Services Disaster Recovery as a Service SRE team (DRaaS), charged to administer the end-to-end testing of Bloomberg's datacenters for disaster recovery scenarios of numerous services which support applications that constitute Bloomberg's line of products! On any given day we're inventing, engineering, developing, building, coding, trouble-shooting and maintaining a wide range of: tools, monitors, frameworks, interfaces, protocols, solutions and best-practices around Disaster Recovery. These components stitch together a robust suite of automated and self-healing systems that manage the services that the Platform Database Services SRE team provides to the rest of the firm.
What's in it for you:
You will be part of a team that works to help meet company and regulatory defined Disaster Testing standards. Manage and develop solutions that support various disaster recovery tools, creating these applications to integrate the services they provide into the Bloomberg operational environment as well as Bloomberg products. This in-house tooling suite is required to test our clusters and managed services that reside in our datacenters and nodesites in an automated, scale-able and self driven fashion, complete with accompanying metrics and transparency tools that would be required for internal and external clients. Tooling is expected to be written with end-to-end unit testing and continuous integration to provide the highest level of stability.
We have product ownership and "the classic SRE responsibilities" such as: system tuning, performance analysis, defining and following availability targets such as SLA's, SLO's and SLI's as well as having immediate access to the experts that are designing and coding the Bloomberg specific components, APIs and methods used by and supporting the disaster recovery infrastructure. You'll receive insight and entry to the lowest levels of how Bloomberg applications interact with each other and the runtime environments for the purposes of both in-depth troubleshooting and enhancing stability, reliability, performance and feature-set.
You'll need to have:
- 4+ years of experience in Python and/or TypeScript
- A degree in Computer Science, Engineering or similar field of study or equivalent work experience
- 5+ years experience with Unix, Unix tools and shell scripting
- Experience designing stable, long-lasting APIs
- Deep understanding of TCP/IP networking and the OSI model
- Experience designing and automating repeatable processes in a client/server modeled environment
- Ability to build and maintain highly sophisticated, available, performant, and scalable, critically important systems
- Experience building monitors and alarms for system performance, status and stability
- Experience with CI/CD systems and writing robust unit and system tests
- Basic knowledge in Rapid framework
- Experience analyzing existing systems and identifying shortcomings with proven methods for improvement
- Experience with Chaos Engineering
- Experience with Splunk/Humio and Grafana or other metric based reporting tools
- Experience with GitHub and JIRA
- Passion for product ownership
Salary Range = 00 USD Annually + Benefits + Bonus
The referenced salary range is based on the Company's good faith belief at the time of posting. Actual compensation may vary based on factors such as geographic location, work experience, market conditions, education/training and skill level.
We offer one of the most comprehensive and generous benefits plans available and offer a range of total rewards that may include merit increases, incentive compensation (exempt roles only), paid holidays, paid time off, medical, dental, vision, short and long term disability benefits, 401(k) +match, life insurance, and various wellness programs, among others. The Company does not provide benefits directly to contingent workers/contractors and interns.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.