Traditional Credit Risk Modeling and Modern Generative AI – The Twain Shall Meet?

PinIt

Gen AI models, which are specifically suited to train on images and perform sentiment analyses, etc., can be used to generate additional features that might predict default probabilities more accurately.

Generative AI (Gen AI) refers to a subset of state-of-the-art artificial intelligence techniques and models that are designed to generate new data samples or outputs that are similar to those seen in the training data. Unlike traditional AI models that focus on tasks like classification or prediction, generative AI models are primarily concerned with creating new content, such as images, text, audio, or other types of data. The key characteristic of generative AI is its ability to learn and capture the underlying patterns and structures in the training data and then use this knowledge to produce novel and realistic outputs.

Some of the most prominent Gen AI tools are Generative Adversarial Networks (GANs) and Large Language Models (LLMs). GANs consist of two neural networks – a generator and a discriminator – that are trained together in a competitive setting. The generator tries to create realistic samples, while the discriminator tries to distinguish between real and generated samples. Through this adversarial training process, GANs can produce highly realistic outputs. LLMs are designed to generate human-like text based on the input they receive. While their primary application is natural language processing (NLP), they also have generative capabilities in the form of text generation and creative content.

Credit risk modeling is a quantitative analysis performed by financial institutions to assess the likelihood of borrowers defaulting on their loans or failing to meet their credit obligations. It comprises the use of statistical models to evaluate the creditworthiness of individuals, businesses, or other entities seeking credit. So far, only traditional modeling techniques, such as logistic regression, time series modeling, etc., have been used to model default risk. However, the up-and-coming AI revolution that promises a transformative impact on the world around us could potentially see some penetration into the traditional world of credit risk modeling as well.

Gen AI could improve model prediction by using unstructured data

Currently, almost negligible unstructured data is being used for standard credit risk modeling. Given the rise and acceptability of new and pervasive technology, it would not be too hard to convince a retail or wholesale borrower to provide ‘unstructured data’ for judging credit worthiness. For instance, the financial ratios of a wholesale borrower might look healthy in a structured format. However, images of the actual business location or sentiment analysis of the employees’ reviews on social media can reveal more insightful information about their current and future likelihood of default.

Gen AI models, which are specifically suited to train on images and perform sentiment analyses, etc., can be used to generate additional features that might predict default probabilities more accurately. Gen AI models are also suitable for speech analysis. The speech analysis of a borrower’s interview can be used to generate predictive features and assess the credit worthiness of a borrower. For instance, behavioural psychological interviews can be designed and customized specifically to provide insights into an individual’s psychological functioning, attitudes, preferences, and patterns of behavior in relation to his credit worthiness. Gen AI models can possibly generate features using speech or text analysis from those interviews that are correlated to default risk.

LLMs can analyze vast amounts of unstructured text data that conventional risk assessment models cannot handle. They can process and understand customer emails, call transcripts, or social media posts to glean insights into a potential borrower’s creditworthiness. For instance, repeated mentions of financial distress or job loss in a customer’s email or social media posts could signal an increased likelihood of default.

See also: Why AI is a Huge Alpha Opportunity for Money Managers

Gen AI could upscale model development using synthetic data

GANs belong to the family of generative models in machine learning, which are processes that can generate new data instances by discovering patterns in the input data and learning them in a way that the model can then generate new samples that retain characteristics of the original dataset. GANs can have potential use cases for generating synthetic data during the credit modeling process. Synthetic data can be generated in the form of (y, X) samples, where y is the outcome (default v/s non-default), and X is the training dataset. Synthetic data can also be generated in the form of new features in the X training dataset for modeling purposes. Synthetic data can be used for lending to a new ‘market’ whose previous credit risk assessment might be non-existent such as a sub-prime demographic in an untapped geography. Synthetic data could also be useful in modeling low-default portfolios by increasing the ‘bad rate’ synthetically.

Whilst the most typical use case of synthetic data from Gen AI models could be in the augmentation process for reject inference in credit risk, there could be many other use cases as well. There are instances in credit risk modeling when a distribution of a statistic of interest is required. For example, a ‘long run average’ probability of default (PD) estimated as the mean PD under unobserved long-run macro-economic scenarios. Synthetic data from GANs can be used to make such estimations where the underlying long-run data doesn’t exist. Another unique instance where this would be helpful is modeling loss-given defaults (LGD). It is known that recovery from losses takes a much longer time to be realized. With the power of Gen AI, analytics can be performed without waiting for a longer period of time for the LGDs to stabilize.

Gen AI could improve modeling process efficiency by automation

  • A lot of time during modeling goes into cleaning the data to make it consumable for modeling purposes. Scripts and cleaning tasks can be ‘outsourced’ to Gen AI models, leaving more room for research and modeling.
  • Gen AI models can help in leading, guiding and summarizing methodology research and documentation thus making things faster and efficient.
  • Often times, regulators ask for a ‘triangulation’ approach to modelling. Whilst the main focus is to develop a primary model; a benchmark model is usually requested to triangulate the results creating another use case for Gen AI models.  
  • ‘Stress testing’ process often demands to model scenarios which have never been observed in the past. Gen AI can help in the generation of such wide-ranging, adverse scenarios, and also in modelling processes such as propagation of transition matrices in stress testing, etc.

It is possible that most of the innovative and creative use-cases of Gen AI would be tested and implemented by some of the less regulated financial institutions such as fin-techs, however, with the advent of the transformative AI revolution, even the larger banks would be looking to invest in Gen AI research to assist and possibly replace the traditional quantitative analysis methodologies.  

Varun Nakra

About Varun Nakra

Varun Nakra is a risk analytics professional with over 15 years of experience developing statistical, machine learning & credit risk models to assist businesses make more informed and profitable decisions. He has extensive knowledge and experience with developing credit risk models such as Probability of Default, Loss Given Default, scorecards, etc., and using machine learning techniques such as Classification, Regression, Decision Trees, Gradient Boosted Trees, Random Forests, Neural Networks, Bayesian Analysis and Markov Chain Monte Carlo (MCMC).

Leave a Reply

Your email address will not be published. Required fields are marked *