Role : Secure Compute Access Platform Engineer
Location : NYC,NY
Duration : 12+ months
Project Overview
The Compute Access Platform team is responsible for securing access to a wide array of Bloomberg''s compute resources. This includes building and operating solutions for both interactive and non-interactive engineer access, as well as managing secure inter-service communication. These solutions span diverse environments such as Unix, Windows, Appliances, and Network Routers.
This project encompasses maintenance, enhancement, support, validation, testing, continuous integration, monitoring, configuration, and bug-fix activities for the assigned application areas throughout the defined project period.
Key initiatives include:
- Hardware refresh and refactoring services running on them
- Refactoring monolithic applications into functional services deployed to dedicated clusters
- Internal audit remediation
- Enhanced restricted interactive shell containers
Project Scope
Secure SUDO Rules Delivery
Technologies: Python, Linux
- Secure and protect sudo rules endpoint with OAuth token or equivalent.
- Build monitoring and self-service deletion usage capabilities.
Timeframe: Q2 2026 – Q4 2026
SUDO Rule Migration and Recertification
Technologies: Python, Linux INIT
- Migrate existing rules into the target system with recertification.
Deliverable: Existing rules migrated to the target system.
Timeframe: Q1 2027 – Q2 2027
BSHELL Hardware Refresh
Technologies: Python, Load Balancer Concepts, REST Services, MySQL
- Refactor code and develop services.
- Build enhanced gateway shell for BSHELL.
- Setup new infrastructure.
- Open required connectivity and deploy using staged rollout via Chef.
Timeframe: Q2 2026 – Q4 2026
BAMGW Hardware Refresh and Gateway Shell Rearchitecture
BAMGW is used as a jump server by PRQS PW, CP, and CT to access the appliance fleet. It is tagged for ESX migration and serves as critical infrastructure for appliance access.
Activities:
- Infrastructure setup
- Connectivity establishment
- New gateway shell development
- Traffic enablement
Timeframe: Q2 2026 – Q2 2027
NMSGW Hardware Refresh and Rearchitecture (getrouterwin)
The NMSGW gateway is currently unmanaged and powers the getrouterwin functionality used by Network Engineering.
Activities:
- Infrastructure setup
- Connectivity establishment
- New gateway shell development
- Traffic enablement
Timeframe: Q1 2027 – Q3 2027
Sudo Rule Migration and Recertification (Service Rewrite)
Technologies: Python
- Rewrite existing Python service responsible for creating compute objects in Active Directory when cluster changes occur (new or modified clusters).
Timeframe: Q4 2026 – Q2 2027
OP1 Containers
This approach involves using containers configured to block write access to the host file system. It also enables safe deployment of additional debugging tools that are restricted or unsafe in the current rbash OP1 environment.
Activities:
- Enhance the proof-of-concept into a production-grade restricted shell/container.
- Gradual rollout with monitoring and user feedback.
- Fleet-wide rollout.
Timeframe: Q3 2026 – Q2 2027
Internal Audit Remediation
Technologies: Python, Go, Chef (Ruby), INIT, Campaign
Objectives:
- Remove persistent access to production infrastructure and replace with on-demand access.
- Design and build monitoring and certification for high-risk production access.
- Enforce default cluster restrictions for production Windows.
- Build reporting capabilities with clear certification paths.
Timeframe: Q2 2026 – Q4 2027
Internal Audit Remediation – PRQS PW Support for Windows
Technologies: Python, Teleport
Timeframe: Q3 2026 – Q1 2027
PRQS PW Migration to OPA
Technology: Go
Timeframe: Q3 2026 – Q2 2027
Teleport Expansion for Public Cloud Resources
Technologies: Python, Go
- Enable cloud compute resource access through Teleport.
- Begin with AWS and design for extensibility across other cloud providers.
Activities:
- Design
- Proof of Concept
- Implementation
Timeframe: Q1 2026 – Q4 2027
System Security Chef Recipe Refactoring
Technology: Chef (Ruby)
Activities:
- Remove obsolete code
- Move to MSE/applications cluster-specific configurations
- Enhance logging, monitoring, alerting, and dashboards for core Chef client services such as:
Timeframe: Q1 2026 – Q4 2026
INFR Integration for Post-Decommission Cleanup
Technologies: Python, Go, Unix, Kafka
When a machine or cluster is decommissioned, remove compute access artifacts including:
This prevents issues if another host is created later using the same name.
Activities:
- Design and POC
- Implementation
Timeframe: Q1 2027 – Q4 2027
SOR for System Security Data
The goal of this initiative is to push system security machine attributes into SOR.
Attributes include:
- AD domain membership status
- Active Directory domain
- SSSD or VASD version
- Crypto policy
- SSSD version
- OpenSSH version
Activities:
- Design
- Implementation in partnership with SOR
Timeframe: Q2 2027 – Q4 2027
Teleport Resiliency and Stability Improvements
Develop automation to reduce complexity and improve the stability of Teleport deployments and configuration changes.
Activities:
- Build Teleport beta cluster
- Introduce additional labels to improve user experience
- Develop a client utility for end compute devices to enhance user experience
Implementation will occur in phases.
Timeframe: Q3 2026 – Q4 2027
Required Skills
The project requires the following technical skills:
- Middleware development (Go, Python, Nginx)
- Web services development (SOAP, REST)
- Programming across multiple stacks (Python, Go, Ruby preferred)
- Distributed systems
- Database systems (MySQL)
- Distributed caching (Redis, etcd)
- Automation, CI systems, and scripting
- Linux OS
- Troubleshooting, debugging, performance evaluation, and issue resolution
Prioritization
Project work will be prioritized and assigned to staff in 2-week or 4-week sprints.
Assignments will be based on:
- Bloomberg DRQS tickets
- JIRA cards
- Technical specifications
- Design specifications
Staff members are expected to submit effort estimates for each assignment.
Performance will be evaluated at the end of each sprint based on:
- Timeliness
- Code quality
- Completeness