Key Responsibilities:
• Data Storage Architecture:
· Design and implement scalable data storage solutions, including databases, data lakes, and warehouses using AWS services (Amazon S3, Amazon RDS, Amazon Redshift, Amazon DynamoDB) and Databricks' Delta Lake.
· Integrate Informatica IDMC for metadata management and data cataloguing.
• Data Pipeline Development:
· Create, manage, and optimize data pipelines for ingestion, processing, and transformation using AWS Glue, AWS Data Pipeline, AWS Lambda, Databricks, and Informatica IDMC.
· Ensure efficient and quality-driven data integration.
• Data Integration:
· Seamlessly integrate internal and external data sources into AWS and Databricks environments while ensuring consistency and governance with Informatica IDMC.
• ETL Processes:
· Develop ETL processes to cleanse, transform, and enrich data, leveraging Databricks Spark capabilities and Informatica IDMC tools.
• Performance Optimization:
· Monitor and optimize data processing and query performance within AWS and Databricks environments.
· Leverage Informatica IDMC for workflow and data quality optimization.
• Security and Compliance:
· Implement robust security practices, including data encryption and compliance with privacy regulations across AWS and Databricks platforms.
· Use Informatica IDMC for data governance and compliance management.
• Automation:
· Automate routine data tasks using AWS Step Functions, AWS Lambda, Databricks Jobs, and Informatica IDMC for workflow automation.
• Documentation:
· Maintain comprehensive documentation of data pipelines, infrastructure, and configurations, including metadata management through Informatica IDMC.
• Cost Optimization:
· Manage resource usage efficiently to balance performance and cost in AWS, Databricks, and Informatica environments.
• Continuous Improvement:
· Stay updated on the latest AWS, Databricks, and Informatica IDMC technologies and implement best practices.
________________________________________
Qualifications and Requirements:
• Education:
· Bachelor’s or master’s degree in computer science, Data Engineering, or a related field.
• Experience:
· 7+ years in data engineering with expertise in AWS services, Databricks, and/or Informatica IDMC.
• Technical Skills:
· Proficiency in Python, Java, or Scala for building data pipelines.
· Strong SQL and NoSQL database expertise.
· Knowledge of data modelling and schema design.
• Certifications:
· AWS certifications (e.g., AWS Certified Data Analytics - Specialty).
· Databricks and Informatica certifications are a plus.
• Additional Expertise:
· Experience evaluating technical solutions and optimizing complex data transformations.
Preferred Skills:
• Hands-on experience with performance assessment for long-running data processes.
• Familiarity with advanced data governance and metadata management strategies.