Data Engineer

Location: London, England
Job Type: Permanent
Specialisation: Information Technology
Salary: £80000 - £90000 per annum + Excellent Benefits
Reference: BBBH2115_1637063114
Contact: Ashley Hayward
Email: email Ashley
Data Engineer required to join a leading healthcare client on a permanent basis.

The data engineer will be part of a multi-discipline team/squad responsible for the automation and optimisation of data transformation (i.e. ETL and pipeline architecture) in line with advanced data engineering practices, version control, and high-quality standards. Focusing on performance and movement of large scale clinical and genomic data using appropriate tooling and methods. Our client currently hold over 50 petabytes of structured and unstructured data.

This role will also be responsible for the generation of key statistics and derivations
to extract information from data and create new derived data products to support researchers, removing repeat manual activity, increasing quality and integrating with core data catalogues and systems to help others to derive meaning and insight.

Essential skills for Data Engineer
* Experience with tooling for data manipulation in programming languages, low-code ETL tooling and/or data visualisationsoftware
* Strong programming skills (see languages) against cloudbased pipelines
* Experience in co-design of curated data products from raw data assets working with business users to meet business needs
* Experience of data modelling and developing/reverseengineering data sets and the necessary components for data model conformance
* Experience developing, optimising and automating data extract, transform and load routines to create a coherent high-quality comprehensive curated data
* Experience with developing scheduled data flows using an integration/workflow engine, including message management and troubleshooting, log integration and complex data lineage
* Machine learning fundamentals and data partitioning

Tooling includes Cloud: AWS, Azure -
* ETL: AWS Glue, Trifacta, KNIME, Alteryx, Talend, SAS, Azure Data Factory
* Metadata & master data management: White Rabbit, Collibra
* Data models: XML, JSON, HL7 FHIR, OMOP, CDISC, i2b2
* Databases: AWS S3 & Athena, AWS DynamoDB, AWS RDS, AWS Aurora (Postgres)