Responsibilities
• Work on the entire data production by implementing data ingestion pipelines (from multiple sources and in different formats), storage, transformation, then their provision: datamarts, cube, reports, datasets to feed models scoring (data science), API, mainly using Dataiku and Google BigQuery
• Ensure that the integration pipelines are designed in a manner consistent with the overall data framework in collaboration with the HQ’s data tech lead and according to best practices/ define framework.
• Be part of a continuous improvement approach by optimizing and reusing existing assets for streamline.
• Take part in data integration processing aspects of data quality controls, monitoring, alerting and technical documentation, as well as data management (data models and mapping, data documentation, repositories, description of the transformations applied, etc.)
Skills/Requirement
• Minimum of 3 years’ experience in IT development roles such as data integration on Google Cloud & Dataiku or similar cloud services
• Mastery of the data stack components in Google Cloud Platform (certifications appreciated) including but not limited to: Google Big Query (nested fields, partitioning, merge SQL, authorized views, RLS), Cloud storage, Cloud functions, Cloud composer, Google Firestore, Google data catalog or similar services of other cloud services.
• Proficiency of Dataiku (on Google big query): development of dataiku flows, implementation of scenarios, scheduling, management of versioning, releases into production, administration etc.
• Mastery of complex SQL queries
• Good knowledge of Python is a plus
• Development practices with data exchange architectures: webservice, API, streaming.
• Development in an agile team and the tools used in CI / CD (Git, Bash, CLI, Azure devops, Jira, Confluence)
• Knowledge of Microsoft Power BI, data catalog tool, data quality, data management