Job Role: Amazon S3 Engineer
Location: Charlotte, NC/Plano, TX
Job Description:
Must Have Technical/Functional Skills
Primary Skil: Amazon Data Engineer
Secondary: AWS Data Engineer, Amazon S3, Shell Scripting, Autosys
Experience: Minimum 10 years
Key Responsibilities:
- Design, develop, and execute Data Pipelines and test cases to ensure data integrity and quality.
- Develop, implement, and optimize data pipelines that integrate Amazon S3 for scalable data storage, retrieval, and processing within ETL workflows.
- Leverage Amazon S3 for data storage, retrieval, and management within ETL workflows, including the ability to write scripts for data transfer between S3 and other systems.
- Utilize Amazon S3's advanced features such as versioning, lifecycle policies, access controls, and server-side encryption to ensure secure and efficient data management.
- Write, maintain, and troubleshoot scripts or code (using PySpark, Shell, or similar languages) to automate data movement between Amazon S3 and other platforms, ensuring high performance and reliability.
- Collaborate with cross-functional teams to troubleshoot and resolve data-related issues, utilizing Amazon S3 features such as versioning, lifecycle policies, and access management.
- Document ETL processes, maintain technical documentation, and ensure best practices are followed for data stored in Amazon S3 environments.
- Familiarity with Hadoop or Spark is often preferred.
- Validate HiveQL, HDFS file structures, and data processing within the Hadoop cluster.
- Strong analytical and troubleshooting skills.
- Excellent communication for collaborating with developers and stakeholders.
- Knowledge in Metadata dependent ETL process and batch/job framework
Tools & Skills:
- Amazon S3: Data storage, retrieval, and management; scripting for ETL data transfer; advanced features including versioning, lifecycle policies, access controls, and server-side encryption; automation of data movement using Python, Shell, or similar languages; troubleshooting and collaboration for data-related issues; documentation and best practices for ETL processes.
- PySpark
- SQL
- Oracle
Domain: Banking knowledge, Payment s knowledge preferred.
Environment: Cloudera Platform
Concept: Cloud Storage, Amazon S3, AWS, Data Warehousing, Data Transformation, ETL/ELT, Data Quality.