Overview
Skills
Job Details
We are seeking an experienced Software Engineer (AI/ML Big Data) to work on optimizing Python/PySpark jobs within a Hadoop ecosystem. The engineer will design and develop scalable data-driven applications, manage large datasets, and collaborate across teams to deliver high-quality solutions. The ideal candidate will have strong expertise in distributed computing, cloud platforms, and modern data pipelines.
Key Responsibilities:
Develop and optimize Python/PySpark modules in Hadoop ecosystem (Spark, HDFS, YARN, Hive, Oozie).
Design and develop cloud applications (AWS, OCI, or similar).
Work with large datasets and implement data aggregation, quality checks, and reporting.
Conduct unit and integration testing; troubleshoot and resolve technical issues.
Collaborate with cross-functional and global teams to translate business requirements into technical solutions.
Participate in all phases of SDLC with focus on continuous improvement.
Mentor junior developers and provide technical guidance.
Required Skills:
5+ years of Python/PySpark development.
5+ years optimizing PySpark in Hadoop ecosystem (Spark, HDFS, YARN, Hive, Oozie).
5+ years in cloud application development (AWS, OCI, or similar).
Strong experience with distributed/cluster computing concepts.
Expertise in relational databases (MS SQL Server or similar) and NoSQL (HBase preferred).
3+ years creating and consuming RESTful APIs.
Hands-on experience with multi-threaded applications, concurrency, performance tuning, and memory management.
Strong knowledge of shell scripting and file systems.
Preferred Skills:
Experience with CI/CD tools (Git, Maven, Jenkins, Artifactory).
Knowledge of microservices, SOA, Docker, Kubernetes, OpenShift.
Familiarity with Agile/Scrum and SAFe Agile practices.
Experience with project management tools like JIRA.
Healthcare industry background (highly preferred but not required).
Soft Skills:
Strong analytical and problem-solving abilities.
Excellent communication and collaboration skills.
Ability to prioritize and multitask in a fast-paced environment.