about 1 year ago
You will design and build future-proof databases, large-scale processing systems and APIs in collaboration with Bioinformatics, Machine Learning and modeling experts, by developing, constructing, testing and maintaining data acquisition and dissemination methods. Deciding the best methods to acquire, curate, store and retrieve many primary and secondary data types along with metadata pertaining to various data domains.
Analysing characteristics of data sets (-omics, imaging, structural) required by Bioinformatics, Machine Learning and Science team members, and using that understanding to discover and develop methods to make them available.
Developing and implementing the most optimal methods for regular extraction, curation, transformation, storage, retrieval and delivery of large and complex scientific datasets for Research and Product Development
Recommending and implementing ways to improve data reliability, efficiency, and quality, through systems integration methods, automation of acquisition and quality control/assurance processes
Actively identifying patterns and anomalies in datasets using data surveillance tools as part of data performance reviews, and identify methods to improve existing processing pipelines.
Bachelor's or Master's degree in computing science or equivalent experience
Prior experience of working as a software or data engineer
Experience writing Python/R scripts
Ability and eagerness to rapidly learn new languages/frameworks as required.
Experience working in a Linux command-line environment
Experience with Git version control
Experience working with containerisation e.g. Docker
Experience with Continuous Integration and R packages/ python modules
Knowledge of Data Management best practices
Demonstratable experience with processing and visualisation of biological datasets
Experience working with the Atlassian toolchain (Jira, Bitbucket, Confluence)
Experience working in an Agile/Scrum environment