How to validate AI & ML risk models

Financial institutions are increasingly relying on artificial intelligence and machine learning (AI/ML) for critical decision-making and efficient client services. However, validating these models presents unique challenges beyond traditional validation practices. This blog provides an overview of additional requirements for model validation in the context of AI/ML.
13 Oct, 2023

Bigger data, bigger problems

One of the main complexities in AL/ML lies in understanding and explaining the predictions made by these models. While they outperform simpler expert rules or statistical inferences, they often operate as black boxes, making it difficult to grasp the decision-making processes behind them. Additionally, the use of financial big data introduces challenges from data quality and consistency, including missing values, differing timestamps, and inconsistent metadata. Unsupervised AI/ML models, such as anomaly detection models used in internal fraud detection, can face additional difficulties when there is a lack of historical data to define performance metrics clearly. Practical considerations also arise with the extensive use of external package dependencies in AI/ML models, as open-source libraries and transfer learning techniques, require meticulous governance for performance, security, and legal compliance. Finally, regulatory compliance adds further requirements for AI/ML. Financial institutions must ensure that AI/ML models adhere to anti-money laundering (AML) and know-your-customer (KYC) requirements and avoid introducing biases or discrimination among different client types.

To validate AI/ML models effectively for financial firms, these specific challenges must be addressed to ensure ongoing accuracy and effectiveness.

Documentation provides the foundation

Validating any model starts with ensuring sufficient documentation and testing to evaluate its quality and appropriateness. In the case of AI/ML models, documentation should additionally cover the following aspects:

  • Selection process for the AI/ML model, including alternative options considered, benchmarking against other approaches, and overall model performance.
  • Implementation details, such as data flows, code structure, versioning of model artifacts and data for reproducibility, and management of open-source dependencies for security.
  • Privacy and security considerations when handling sensitive information, including the use of third-party APIs or platforms.
  • Feature engineering process, explaining factors and their selection rationale or dynamic feature determination.
  • Usage of the model’s predictions and its impact on stakeholders (internal, clients, public), including ethical and regulatory considerations.
  • Calibration frequency, monitoring processes, and steps to address AI/ML model degradation.

Feature engineering drives performance

Once documentation is assessed, evaluating the strength of model inputs becomes crucial. AI/ML models rely on statistical analysis and generalization from observed patterns in the calibration set to make future inferences. The feature generation process involves three steps:

  1. Saving data within a version controlled institutional resource, such as a data lake
  2. Processing of the data into final storage format, such as a data warehouse
  3. Model-specific transformation such as normalization or taking logs or bucketing

Validators should evaluate the model developers’ description of what input sources were considered for the model and why any other readily available sources were ruled out (e.g., on cost, processing complexity, privacy or dependability grounds). Model developers may exclude inputs for a variety of pragmatic reasons, so validation should focus on ensuring that the extent of efforts to investigate different data sources were commensurate to the business impact of the model and overall development time and resourcing.

Similarly, within the consideration of the processing steps and final transformation, the model documentation should explain how the processing steps were determined and to account for any steps that could bias the independence of validation. E.g., normalization by the whole set minimum and maximum range prior to splitting out the validation data from calibration sets will transmit implicit information on the distribution of values that a variable took in sample. Where model-specific annotation has been harnessed, these processes should be justified for robustness, whether algorithmic determination or another classification processes, such as manual tagging.

Core inference model selection

Whilst decent data sourcing and feature choice are a pre-requisite for model quality, the choice of core AI/ML inference model for analysis can lead to material difference in the information extraction. This is especially true where transfer learning is used for critical information processing, e.g., to tokenize or convert features ahead of the main inference. If sufficient information is provided by documentation to understand the operation of the core model and the feature engineering has been well justified, a validation process should check the sufficiency of the AI/ML model selection versus the documented objectives and target prediction metric.

Performance evaluation: Assess the model’s performance by comparing it to alternative approaches. One approach is to run an Auto-ML analysis on the exact same feature set and compare the performance metrics. This helps determine if the existing inference model is still the best choice or if there are better alternatives available. Where the proposed model cannot be replicated independently by the validator, a simpler model with high correlation of decisions can be used to assess overall performance and highlight major discrepancies. The model chosen should be compared to both individual alternatives and ensemble/stacked models that combine the predictions of several simpler models to make inferences.

Data visualization: visual inspection of the input features sets, their correlations, and distributions, can distinguish independent observations from those with time dependence or common unmodelled factors. Model residuals in supervised learning predictions can also be similarly visualized to check they form homoscedasticity or isotropic distributions and model developers have noted any areas of materially increased variance and inaccuracy.

Interpretability of decisions: Using diagnostic techniques like LIME (Local Interpretable Model-agnostic Explanations) or SHAP (SHapley Additive exPlanations) can interpret the decisions made by the model. These methods provide insights into how the model arrives at its predictions, helping to identify any biases or inconsistencies in its reasoning. Analysis should be performed on predictions with inferior performance or large negative impact on stakeholders. For client facing models this should include determination of whether the model’s predictions exhibit bias or disparate impact on different demographic groups.

Model stability: Check if the model’s performance was stable over time and across sub-sets of data. If K-fold cross-validation was not used in the model selection process and testing evidence, then it should be run during the model validation. It is good to evaluate model performance on both recent data and historical data windows, particularly market stress environments, and if conditioning on specific data sets, such as geography. Sudden declines in accuracy or other performance metrics may indicate the need for more frequent model recalibration, use of distinct models for different populations or similar update triggers. Sanity checks for model output given stress testing scenarios, at a minimum covering market implied possibility in financial variables and the resultant features, helps identify any needs for recalibration or mitigating measures for market regime changes.

Sufficiency of ongoing process: Evaluate the process for ongoing monitoring and maintenance of the production model, in particular handling of the additional complexity of big data sources and privacy consideration. Given continuing rapid improvements in machine learning techniques, including a measure of ongoing performance comparison versus open-source alternatives or Auto-ML is typically advisable as a periodic check as well as clarity on how future updates to third-party packages will be incorporated.

The above AI/ML adaptions should also be accompanied by normal model validation checks, such as sufficiency of unit test coverage for models with ongoing development and description of linkages to upstream models. These upstream models are used to pre-process inputs that construct the features, such as pricing algorithms, risk metrics, internal ratings, and market state constructions.

Enhancing model governance and leveraging third-party providers

In the context of model validation for AI/ML models, external providers of financial analytics and data science solutions, such as Quantifi, can offer valuable support. Their expertise can be leveraged in several key areas:

  • Feature Set Generation and Validation: When inputs derive from market states or financial instruments, the feature sets’ performance and robustness are optimized when they accurately represent market-implied expectations. This is particularly crucial when incorporating multiple sources of information, such as inferred curves or inputs from illiquid sources like high-yield bonds, which can exhibit significant variation in value depending on the approach.
  • Model selection and optimization: Evaluate AI/ML model performance, assess the utilization of available information, and identify the most effective model for the task.
  • Market and portfolio simulation: Connect model outputs to capital and historical P&L assessments to comprehensively evaluate the business impact of model risk.
  • Independent Calculation: having third parties replicating key steps in the AI/ML model, potentially including upstream and downstream processing, can assist in validating the overall workflow. Complementing the analysis of test outputs and documentation, independent replication can catch discrepancies to the described methodology or implementation bugs to ensure the accuracy and reliability of the model.

Quantifi’s integrated pre- and post-trade solutions allow market participants to better value, trade and risk-manage their exposures, and respond more effectively to changing market conditions. Quantifi’s investment in the latest technology – including data science, machine learning and APIs – provide clients with new levels of usability, flexibility and integration. Quantifi can also assist you in optimising your AI/ML model validation process.

Let's talk!

Speak with one of our solution experts