Role: Lead position with Primary skillsets on AWS services with experience on EC3, S3, Redshift, RDS, AWS Glue/EMR, Python , PySpark, SQL, Airflow, Visualization tools & Databricks.
Responsibilities:
Design and implement the data modeling, data ingestion and data processing for various datasets
Design, develop and maintain ETL Framework for various new data source
Ability to migrate the existing Talend ETL workflow into new ETL framework using AWS Glue/ EMR, PySpark and/or data pipeline using python.
Build orchestration workflow using Airflow
Develop and execute adhoc data ingestion to support business analytics.
Proactively interact with vendors for any questions and report the status accordingly
Explore and evaluate the tools/service to support business requirement
Ability to learn to create a data-driven culture and impactful data strategies.
Aptitude towards learning new technologies and solving complex problem.
Connect with Customer to get the requirement and ensure the timely delivery.
Qualifications:
Minimum of bachelor’s degree. Preferably in Computer Science, Information system, Information technology.
Minimum 8+ years of experience on cloud platforms such as AWS, Azure, GCP.
Minimum 8+ year of experience in Amazon Web Services like VPC, S3, EC3, Redshift, RDS, EMR, Athena, IAM, Glue, DMS, Data pipeline & API, Lambda, etc.
Minimum of 8+ years of experience in ETL and data engineering using Python, AWS Glue, AWS EMR /PySpark, Talend and Airflow for orchestration.
Minimum 8+ years of experience in SQL, Python, and source control such as Bitbucket, CICD for code deployment.
Experience in PostgreSQL, SQL Server, MySQL & Oracle databases.
Experience in MPP such as AWS Redshift and EMR.
Experience in distributed programming with Python, Unix Scripting, MPP, RDBMS databases for data integration
Experience building distributed high-performance systems using Spark/PySpark, AWS Glue and developing applications for loading/streaming data into databases, Redshift.
Experience in Agile methodology
Proven skills to write technical specifications for data extraction and good quality code.
Experience with big data processing techniques using Sqoop, Spark, hive is additional plus
Experience in analytic visualization tools.
Design of data solutions on Databricks including delta lake, data warehouse, data marts and other data solutions to support the analytics needs of the organization.
Should be an individual contributor with experience in above mentioned technologies
Should be able to lead the offshore team and can ensure on time delivery, code review and work management among the team members.
Should have experience in customer communication.
DataBricks – Data Engineering