Analytics Solutions for All

About me and my work

My passion and expertise lie in crafting technical strategies that enable companies to realize their business objectives, with an emphasis on decision analytics. This requires a harmonized marriage of data with statistical modeling to enhance decision-making processes. It helps organizations make more informed and data-driven decisions that can lead to improved efficiency, cost savings, competitive advantages, and better overall business outcomes. Decision analytics serves as a crucial instrument for tackling intricate challenges in today’s data-driven business landscape. Below, you’ll find a selected list of strategic solutions I have delivered to different organizations.

Disease Outbreak Surveillance System

Role: Key Responsibility; Lead

Background and Challenges: The client possessed a huge database of records collected from health facilities, but it was marred by a plethora of data quality problems, such as missing values, transcription errors, and unreliable data sources. The messiness in the data posed a substantial barrier that was beyond their team’s ability to extract valuable insights about ongoing public health emergencies, despite the wealth of data at their disposal.

Solutions: I undertook the task of developing, implementing, and optimizing a suite of statistical and machine learning models on this large-scale dataset (with more than 10 million health facilities’ reports) to produce surveillance reports with visualization. Missing values were addressed using a novel Poisson-structured multiple imputation method, with an ensemble of machine learning algorithms designed to handle the remaining data quality concerns.

Outcomes: This solution allows the team to now effectively identify signals of disease outbreaks and deliver actionable insights. This work has been proved to be crucial in driving an evidence-based decision-making and is currently being implemented across multiple countries in Sub-Saharan Africa to monitor the impact of COVID-19 and cholera outbreaks on Sexual and Gender-Based Violence (SGBV), maternal health, vaccination coverage, and mortality. This initiative stands as a testament to the tangible benefits of decision analytics in the realm of health data surveillance.

Keywords: Big Data; Data Quality Control; Data Science; Machine Learning; Statistical Models; Python; R

Healthcare Analytics Initiatives (Ongoing)

Role: Key Responsibility; Partner

Background and Challenges: While Health Information Systems (HIS) provide a wealth of data, they often lack the ability to explain the underlying reasons behind observed trends. Integrating HIS data with surveillance systems, events data, and qualitative insights can offer a more comprehensive understanding of these trends. This calls out the need for International collaboration in devising a framework for better comprehending these observed trends and thus more effectively responding to public health challenges.

My Responsibilities: I collaborated with cross-departmental stakeholders to promote the utilization of health analytics in data-driven decision-making and the deployment of a cutting-edge health analytics framework tailored to the needs of integrated outbreak analyses, enabling smooth routine data collection with automated quality control, seamless modeling deployment, and timely epidemiological analyses.

Preliminary Outcomes: We drove the adoption of decision analytics in over 20 local and international projects, producing data-driven reports with engaging visualizations for high-level stakeholders such as the Ministry of Health, United Nations, World Bank, and British Embassy. I also prepare and deliver regular presentations, publications, and workshops to vouch for the importance of decision analytics in healthcare.

Keywords: Big Data; Decision Analytics; International Collaboration; Technical Consulting; Presentations; Visualization; Python; R

Reactivate Dormant Clients: an upselling and cross-selling model

Role: Key Responsibility

Background and Challenges: The client was an insurance provider, with a century-long history and millions of customers. However, many of its existing customers have become dormant, defined as those holding only one insurance policy and no new interaction beyond paying annual premiums. These dormant clients represent an untapped revenue source, who had already demonstrated an interest in the company’s offerings but had since disengaged. Reconnecting with them requires a data-driven, customer-centric approach to ensure successful re-engagement and conversion

Solutions: I undertook thorough profiling for this vast customer base exceeding millions. My solution included the deployment of a clustering analysis approach and other machine learning models to precisely pinpoint potential upselling and cross-selling opportunities to unlock valuable insights into the dormant client pool, and the results were communicated along with compelling visualizations.

Outcomes: The project has successfully re-engaged dormant clients, fostering a renewed sense of trust and satisfaction. A remarkable increase in revenue has also been achieved by accurately targeting specific upselling and cross-selling products and services that aligned with client needs. The implementation of this data-driven approach equipped stakeholders with the tools to make informed decisions, which not only benefited this project but also set a precedent for future data-centric initiatives within the organization.

Keywords: Clustering Analysis; Customer Profiling; Data Science; Decision Analytics; Machine Learning; Technical Consulting; Visualization; Python; R; SAS

Assessing Risk in Disability Claims

Role: Key Responsibility; Co-Lead

Background and Challenges: This work stemmed from a business case competition where the company at the time relied solely on manual processes to scrutinize the enormous volume of disability claims they received every day, a significant portion of which were either fraudulent or lacked crucial information, resulting in a substantial drain on human resources and company assets. This underscored the imperative for an automated workflow capable of precisely assessing the risk associated with each disability claim. The challenge lay in developing a risk assessment framework that could effectively navigate the diverse range of disabilities while upholding stringent client data privacy standards.

Solutions: I co-led a team of 4 data scientists to devise innovative models that assessed the risks associated with disability claims. To unravel the complexity of the insurance claims, our solution leveraged an ensemble of multiple machine learning algorithms such as gradient boosting, support vector machine and random forest.

Outcomes: Our model delivered the best prediction of fraudulent insurance claims among international teams comprising more than 100 highly skilled data scientists, producing an overall accuracy of over 94%. The proposed workflow also enabled instant customized risk report generation, setting new industry standards.

Keywords: Big Data; Data Science; Machine Learning; Predictive Analytics; Risk Management; Python; R; SAS

A Timely and Low-Cost Data Collection Method During Acute Public health Emergencies

Role: Key Responsibility

Background and Challenges: The Ebola outbreak in West Africa underscored the difficulties of health data collection in regions with weak healthcare structures, particularly during crises. The breakdown of healthcare systems during the outbreak had cascading effects, including the erosion of trust in institutions and reduced healthcare utilization.

Solutions and Outcomes: Leveraging mobile phone data, we employed a propensity score matching to compare outcomes from an SMS-based mobile survey with traditional household surveys, providing real-time, high-frequency, and low-cost insights into healthcare utilization during and after the outbreak. These findings have implications for using mobile phone data to supplement traditional surveys in various healthcare monitoring contexts, particularly during emergencies or acute health crises.

Keywords: Big Data; Inferential Statistics; R; Stata

Reject Inference for Auto-Approval Loans

Role: Assistant; Primarily Responsible for Research

Background and Challenges: The client (a commercial bank) operated an automated approval system for personal loans that handled a substantial daily influx of applications, a huge portion of which was rejected for various reasons. The need for Reject Inference arose from the valuable information hidden within these declined applications. Understanding why an applicant was denied for the loan was vital not only for improving the auto-approval workflow but also for complying with regulatory requirements and ensuring fair lending practices.

Solutions and Outcomes: I conducted research to explore current best practices for reject inference in retail models. This research formed the basis for improving the existing scorecard model by considering a wider range of factors, including considering a wider range of client factors, including population shifts and likelihood of financial delinquency.

Keywords: Big Data; Decision Analytics; Scorecard Models; Risk Management; Python; SAS