Title:- Senior Software Engineer – Data Lake/Big Data/ Spark
Position:- Full Time
Location – Remote
We are currently looking for a keen Software Engineer to join our data platform team. Working on a greenfield project as part of a world class data platform; this team does exciting work in the area of highly scalable data platform services and data lake for location-based analytics. You will be building the automation and tooling to support retrieval and transformation of tens of thousands of externally sourced datasets. You will work in a team with highly qualified and accomplished data and software engineers to build, enhance, and maintain our data platform that supports our best-in-class products.
What you will do and achieve:
- Develop the delivery of Client's data platform supporting ingestion of over ten thousand datasets, stores over a petabyte of spatial, relational and raster data. Support services serving over five billion API calls per month
- Work with an agile team to deliver solutions and services for data platform to support delivery of Client’s large scale data lake
- Participate in architecture, design and development reviews for data platform services leveraging best available big data tools and technologies such as Presto, Spark, Alluxio, etc.
- Engage with cross organizational teams collaborating on Data Ingestion Services and Data Engineering to develop single consistent series of services and solutions
- Works well with evolving requirements based on results of teams continuing investigation, development and customer feedback
- Adhere to best practices around source code versioning, automated testing and dependency management.
- Investigate and resolve technical and non-technical issues, resolving critical incidents in timely manner and with a through root cause analysis.
- Contribute to Client’s overall technology strategy and roadmap as an active member of its architectural leadership team.
Who you are:
- B.S. in Computer Science (or equivalent)
- 7 or more years of experience in software engineering
- 3 or more years of experience with big data systems and cloud architecture
Knowledge & Skills
- Big data architecture and systems, including distributed data processing systems (such as Spark or Dask), distributed data storage systems (such as Parquet or HDFS), low-latency data lake query architectures (such as Alluxio) and real-time streaming systems (such as Kafka)
- Data lake design strategies for metadata, ontology, governance, authorization, etc.
- Test automation for data quality, data flow, and API endpoints
- Data engineering techniques for big data, including data automation frameworks (such as Airflow or Prefect), metadata management (such as Amundsen) and process management strategies
- Infrastructure management and automation, such as Kubernetes, Terraform and Chef
- Cloud infrastructure management, ideally with experience in AWS, including both technical aspects, such as solutions architecture, and non-technical aspects, such as financial planning
- Modern practices around agile development, release management, continuous integration, system reliability
- Fundamentals of computer science and software engineering
- Execute on a data platform strategy in collaboration with team members, architects, product managers and other groups across the business
- Collaborate as significant individual technical contributor to meet overall team objectives and goals
- Stay up to date on emerging technologies, standards, and protocols