Projects

Here are some of my professional projects. Most of the time, my role is a data scientist. But I often also perform the role of machine learning engineer & data engineer.

Design and implement an AutoML platform for e2e ML Lifecycle

Role: Project Lead, Machine Learning Engineer
Type: MLOps, Platform Engineering

The AutoML system allows the users to upload a dataset, create analysis and data transformation, run AutoML using state-of-the-art machine learning algorithms and deploy the model into production. The solution empowers subject matter experts to iterate through various designs based on their deep understanding of the business and the data. The fast-iterative feedback loop improves efficiency and improves ROI.

USD/ZAR real-time spot price formation

Role: Data Scientist
Type: Data Science

Designed and implemented a USD/ZAR spot price formation use case in a major South African bank’s Global Market division. The solution includes technical indicators, forming USD/ZAR spot prices on real-time with 10 seconds forward. The enhancement of the solution brings in additional bank’s data to improve inventory management. The solution is deployed with Microsoft Azure.

Profile Image Analyzer to assist with candidate CV screening

Role: Machine Learning Engineer
Type: Machine Learning

The profile image analyzer embeds a machine learning model into an CV-generation tool built by an HRTech start-up. This is part of an automation process that aims to empower the talent advisors by reducing the manual screening time of canidates’ CVs. Transfer learning of state-of-the-art neural networks are used to combat the cold-start problem, achieving an accuracy of 92%. The project is implemented with Kubeflow, ensuring the availability and scalability of the application with Kubernetes.

Early warning distress system for Corporate Investment Clients

Role: Team Lead - Lead Data Scientist
Type: Machine Learning & Data Engineering

Designed and deployed a machine learning model to predict high-risk corporate and investment banking clients. The model input consisted of multiple internal and external data sources. The results are then visualised on an interactive dashboard. The input data and the explanation of the prediction are all included in the dashboard. The model is able to highlight the distressed clients 5 months before they default. With the risk managers, the model reduces unexpected losses across credit products. The dashboard enables relationship managers to transition from reactive to proactive risk management.

Identifying merger and acquisition leads from news data

Role: Data Scientist
Type: Data Science, NLP & Data Engineering

Designed and implemented a near real-time system to predict business leads. The machine learning model notifies the client manager if a news article indicates a business opportunity. The results are then tailored to the client managers based on their portfolio and clientele. This reduces the time spent on manually curating news articles from multiple sources, whereby empowers the client managers to be more proactive in business development.

Retail and business banking rolling forecast for the budgeting process

Role: Lead Data Scientist
Type: Machine Learning, Data Science

Implemented a forecasting model to assist with budgeting and assigning sales targets. The model uses financial drivers to forecast the closing balance, interests generated and fees earned for the period.

This assists with the monthly budget planning and the subsequent sales target alignment. As a result, the finance managers and regional managers are able to have a more fruitful performance discussion.

Augmented analytics to automate insight generation from finance data

Role: Lead Data Scientist
Type: Data Engineering, Data Science

Designed and implemented an augmented analytics solution within the finance dashboard. Using advanced analytics, insights are embedded within the dashboard used across the Group.

The dashboard enables finance managers to spot anomalies or opportunities easily, and empowers them to better support their business partners.

Advanced analytics to quantify patient-reported outcome measures

Role: Lead Data Scientist
Type: Data Science, Statistical Analysis

The patient-reported outcome measures (PROM) are analysed to measure the patients’ need for and progress of care during their admission. 3 years of PROM data are collected across multiple hospitals. Using statistical techniques, the outcomes between the hospitals, patient conditions, length-of-stays are compared and contrasted.

The result is published within the Group 2019 Clinical Outcome Results, and proposed as a standardisation method going forward.

Data anomaly detection in Hybrid-Cloud

Role: Machine Learning Engineer
Type: Data Engineering, Data Analytics, Machine Learning

Designed and implemented a data anomaly detection pipeline. The pipeline combines multiple data sources to identify anomalous Global Markets client and trading data. The hybrid-cloud approach is adopted for the project, leveraging the Group asset as well as the cloud’s elasticity.

The data anomaly pipeline detects and provides remedial actions to ensure the integrity of reporting is preserved. The pipeline empowers the Global Markets business users to self-service anomalous data.

Inbound email classification and sentiment analysis

Role: Data Scientist
Type: Data Science, NLP

Inbound customer request from call centre, social media and direct emails are ingested and analysed. The requests are classified based on the customer sentiment and customer relationship manager’s response are also ranked.

This assists with the customer relationship department to identify, amongst others: hot topics amongst clients, outstanding relationship managers, lag time in responding, etc.