Databricks Engineer (Databricks, Python or PySpark)
12 Months Renewable Contract
The Databricks Engineer is responsible for migrating from as-is on-perm data warehouse to Databricks and should have sound knowledge of Databricks. Requires good understanding of data architecture, cloud solutions, building and maintaining a robust, integrated, and governed data infrastructure. The role involves extracting valuable insights from data while ensuring data security, compliance, and high-quality data management.
Roles And Responsibilities:
- Lead end to end data migration project from on-premises environments to Databricks with minimal downtime.
- Work with architects and lead solution design to meet functional and non-functional requirements.
- Hands on experience in Databricks to design and implement the solution on AWS.
- Hands on experience in configuring Databricks clusters, writing Pyspark codes, build CI/CD pipelines for the deployments.
- Highly experienced in optimization techniques (Zordering, Auto Compaction, vacuuming)
- Process near real time data through Auto Loader, DLT pipelines
- Must have strong background in Python and able to identify, communicate and mitigate risks and issues.
- Identify and resolve data-related issues and provide support to ensure data availability and integrity.
- Optimize AWS, Databricks resource usage to control costs while meeting performance and scalability requirements.
- Stay up to date with AWS, Databricks services, and data engineering best practices to recommend and implement new technologies and techniques.
- Proactively implement engineering methodologies, standards, and leading practices.
Requirements / Qualifications:
- Bachelor’s or Master’s degree in computer science, data engineering, or a related field.
- Minimum 5 years of experience in data engineering, with expertise in AWS or Azure services, Databricks, and/or Informatica IDMC.
- Proficiency in programming languages such as Python, Java, or Scala for building data pipelines.
- Evaluate potential technical solutions and make recommendations to resolve data issues especially on performance assessment for complex data transformations and long running data processes.
- Strong knowledge of SQL and NoSQL databases.
- Familiarity with data modelling and schema design.
- Excellent problem-solving and analytical skills.
- Strong communication and collaboration skills.
- Databricks certifications, and Informatica certifications are a plus.
Preferred Skills:
- Experience with big data technologies like Apache Spark and Hadoop on Databricks.
- Experience in AWS Services focusing on data and architecting.
- Knowledge of containerization and orchestration tools like Docker and Kubernetes.
- Familiarity with data visualization tools like Tableau or Power BI.
- Understanding of DevOps principles for managing and deploying data pipelines.
- Experience with version control systems (e.g., Git) and CI/CD pipelines.
- Knowledge of data governance and data cataloguing tools, especially Informatica IDMC.