Overview
Skills
Job Details
We are seeking a highly skilled System Engineer with strong experience in Java, Python, and PySpark to design, develop, and optimize large-scale data and application systems. The ideal candidate will have a solid background in system architecture, software development, and data engineering, along with hands-on experience integrating distributed systems and ensuring performance and reliability.
Key Responsibilities:
Design, develop, and maintain system components and data pipelines using Java, Python, and PySpark.
Collaborate with cross-functional teams to implement scalable and resilient solutions in cloud or on-premise environments.
Develop and maintain ETL processes for data ingestion, transformation, and loading across multiple data sources.
Optimize and troubleshoot distributed applications for performance and reliability.
Implement system monitoring, logging, and alerting to ensure high availability and system integrity.
Automate deployment and configuration management using tools such as Ansible, Jenkins, or Airflow.
Participate in code reviews, contribute to technical documentation, and follow DevOps and CI/CD best practices.
Work with Big Data ecosystems (Hadoop, Spark, Hive, Kafka, etc.) to handle large-scale data processing.
Analyze and resolve complex technical issues across software, infrastructure, and data layers.
Required Skills and Qualifications:
Bachelor s or Master s degree in Computer Science, Information Technology, or a related field.
10+ years of experience in system engineering, software development, or data engineering.
Strong programming experience in Java and Python.
Expertise in PySpark for distributed data processing and transformation.
Hands-on experience with Hadoop ecosystem components such as Spark, Hive, HDFS, and Kafka.
Solid understanding of Linux/Unix systems, shell scripting, and system-level debugging.
Experience with version control systems (Git, Bitbucket) and CI/CD pipelines (Jenkins, GitLab CI).
Familiarity with cloud platforms (AWS, Azure, or Google Cloud Platform) and data orchestration tools (Airflow, Oozie).
Strong analytical and problem-solving skills, with a focus on scalability and performance tuning.