Senior Software Engineer/SRE - Core Communications

  • Posted 15 hours ago | Updated 15 hours ago

Overview

On Site
USD 160,000.00 - 240,000.00 per year
Full Time

Skills

Messaging
Microsoft Exchange
Pricing
Backbone.js
FOCUS
Root Cause Analysis
User Experience
Software Development
SAFE
Management
Scalability
Bloomberg
InfiniBand
Finance
Decision-making
Infrastructure Architecture
Debugging
Dashboard
Provisioning
Incident Management
Network
Knowledge Sharing
Software Engineering
Python
C++
Collaboration
Communication
Computer Science
Grafana
Splunk
Apache Kafka
Java
CHAOS
Testing
Capacity Management
Open Source
Regulatory Compliance
System On A Chip
Reliability Engineering
Big Data
Apache Spark
Amazon S3
Training
Life Insurance

Job Details

Senior Software Engineer/SRE - Core Communications

Location
New York

Business Area
Engineering and CTO

Ref #
10045384

Description & Requirements

About Core Communications (CC):
We build the core messaging products that power Bloomberg's internal and client communication: IB (Instant Bloomberg), MSG (Message), and other collaboration platforms. These systems are used by the financial industry to exchange billions of messages daily, from trade ideas and pricing quotes to mission-critical communications. We're building the backbone of financial dialogue, operating at massive scale and high stakes.

About our Team:
The Core Communications SRE team are the guardians of reliability and stability for all CC products. Our focus is on enabling teams to build and operate resilient, observable, and scalable systems. We define standards, provide tools, and lead reliability-focused initiatives across all stages of the development lifecycle. Our scope spans infrastructure, application health, and incident response, working closely with over 100 developers and multiple product and platform teams.

We view our systems holistically, from application code and cluster provisioning to monitoring pipelines and reliability governance. As our platforms evolve and scale, we proactively identify architectural and operational risks, and partner with teams to mitigate them. This includes defining meaningful SLOs with Product, strengthening our observability stack, and developing cross-cutting tools that improve diagnosis and response.

We'll Trust You To:
  • Define and promote reliability-focused standards and best practices across observability, alerting, incident response, and provisioning
  • Build and maintain troubleshooting tools leveraging distributed tracing and health signals to accelerate root cause analysis
  • Partner with Product teams to define and measure meaningful SLOs aligned with user experience
  • Lead initiatives to identify and mitigate reliability risks across CC systems - spanning performance, capacity, and resiliency
  • Collaborate with developers to embed reliability into the software development lifecycle, from design through deployment
  • Contribute to the creation of a culture of reliability by advocating for failure-aware design and sharing best practices across teams
  • Develop automation to reduce manual operational effort and support scalable, safe growth of our infrastructure

What's in it for you:
You'll have a direct and visible impact on the stability, resilience, and scalability of Bloomberg's most fundamental and critical products - IB and MSG, which are relied upon daily by the global financial industry for essential decision-making and communication. The work you do will directly shape the reliability experience of our clients and internal users alike.

This role gives you the autonomy to drive reliability initiatives end-to-end, from infrastructure design and tooling to rollout and adoption across engineering teams. You'll play a key role in fostering a culture of reliability within Core Communications, influencing how systems are built, monitored, and maintained.

In your day-to-day, you'll help create tooling and frameworks to define and track reliability metrics that guide long-term stability efforts across our platforms. You'll collaborate with teams to implement distributed tracing and end-to-end health monitoring, enabling faster debugging and deeper visibility into system behavior. You'll contribute to the development of libraries, dashboards, and automation that bring consistency to alerting, provisioning, and incident response across the broader CC organization. And you'll help lead the adoption of chaos testing and failure injection practices to validate how our systems perform under real-world stress.

You'll work closely with engineers, product managers, and SREs across multiple teams and regions - building deep technical expertise and a strong cross-functional network. We also support ongoing learning through conference attendance, industry engagement, and knowledge-sharing, so you can continue to grow and bring fresh perspectives back into the team.

You'll need to have:
  • 4+ years of experience in software engineering, and experience working on a SRE team
  • Proficiency in Python and proven experience with C++
  • Strong understanding of distributed systems and system reliability
  • Familiarity with SLOs, SLIs, and SLAs, and how to relate system performance back to client impact
  • Strong collaboration and communication skills
  • A degree in Computer Science, Engineering, or equivalent practical experience

We'd love to see:
  • Hands-on experience with monitoring and alerting tools (e.g., Grafana, Splunk, distributed tracing)
  • Experience with Kafka and Java
  • Experience with chaos engineering, failure injection, or resilience testing frameworks
  • Exposure to capacity planning and scaling analysis
  • An interest in treating security as part of reliability
  • Contributions to open source or involvement in SRE communities
  • Awareness of industry compliance frameworks (e.g., DORA, SOC 2) and how they relate to system reliability
  • Experience with big data technologies like Apache Spark, Amazon S3

Salary Range = 00 USD Annually + Benefits + Bonus

The referenced salary range is based on the Company's good faith belief at the time of posting. Actual compensation may vary based on factors such as geographic location, work experience, market conditions, education/training and skill level.

We offer one of the most comprehensive and generous benefits plans available and offer a range of total rewards that may include merit increases, incentive compensation (exempt roles only), paid holidays, paid time off, medical, dental, vision, short and long term disability benefits, 401(k) +match, life insurance, and various wellness programs, among others. The Company does not provide benefits directly to contingent workers/contractors and interns.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.