Probabilistic graphical model for student loan repayments

Adrien Papaioannou
Andres Oliva Denis
December 9, 2024
ABSTRACT

This report presents a probabilistic graphical model for analyzing and predicting student loan repayments, leveraging the Stochastic Earnings Path (StEP) model previously employed by the UK’s Department for Business, Innovation & Skills (BIS). The StEP model integrates stochastic elements into earnings projections to provide a nuanced understanding of repayment behaviors under varying economic conditions. By coupling the StEP model with a probabilistic graphical model, this approach aims not only to improve the accuracy and interpretability of student loan repayment forecasts but also to provide a framework for scenario analysis and what-if analysis.

1 Introduction

Student loan repayment forecasting plays a critical role in public policy and financial planning. The BIS employed the StEP model to model future graduate earnings and associated repayment patterns. This report builds on that foundation, introducing a probabilistic graphical model that enhances predictive capabilities while maintaining transparency and adaptability.

1.1 Objectives

The proposed approach aims to:

  • Integrate stochastic earnings modeling with a graphical representation to capture the dependencies among key variables.
  • Provide policymakers with tools for scenario analysis under varying economic assumptions.
  • Enhance the accuracy of long-term student loan repayment projections.
  • Enable analysis of full distributions on output results and key metrics.
2 Methodology
2.1 StEP Model Overview

The Stochastic Earnings Path (StEP) model, specifically its third iteration (StEP3), forecasts the cost of student loans within an income-contingent repayment system. It combines historical data, macroeconomic projections, and borrower characteristics to estimate repayment behaviors and the Resource Accounting and Budgeting (RAB) charge—the portion of the loan value not expected to be repaid.

Key Components

Earnings Forecasting: Individual earnings paths are predicted using regression models that factor in historical data, borrower characteristics, career progression, age, gender, course subject, and institution.

Macroeconomic Inputs: Economic forecasts from the Office for Budget Responsibility (OBR), including earnings growth, inflation rates, and discount rates, are incorporated. Real earnings and repayment estimates are adjusted for inflation (RPI+2.2%).

Loan Repayment Parameters: These include repayment thresholds, interest rates, maximum repayment periods, obligatory and voluntary repayments, and overseas payments.

Borrower Characteristics: The model considers non-employment likelihoods, mortality, disability, and migration patterns, drawing on data from the International Passenger Survey (IPS).

Data Sources: Key datasets include Student Loans Company (SLC) administrative data, British Household Panel Survey (BHPS), Labor Force Survey (LFS), and others for demographic and loan take-up insights.

The primary outputs include:

  • The RAB charge, expressed as a percentage of the loan value not expected to be repaid in Net Present Value (NPV) terms.
  • Earnings paths and repayment projections across borrower cohorts.

2.2 Probabilistic Graphical Model

Probabilistic graphical models (PGMs) are frameworks for representing and reasoning about complex systems using probability theory. They consist of a graph-based structure where nodes represent random variables and edges denote probabilistic dependencies. PGMs provide a structured way to handle uncertainty and dependencies in real-world problems, making them essential in various domains from healthcare to finance.

Bayesian Networks

Bayesian networks, a subset of probabilistic graphical models, are directed acyclic graphs (DAGs) that encode conditional dependencies between variables. These networks enable:

Representation of Uncertainty: Probabilities associated with different outcomes based on evidence.

Inference: Calculation of posterior probabilities given observed data, useful for understanding repayment likelihoods under various economic conditions.

Scenario Testing: Simulation of different policy changes and their effects on repayment probabilities. For instance, in the context of student loans, a Bayesian network can model how changes in interest rates or economic growth impact the probability of repayment success.

3 Student Loan Repayment PGM

The student loan repayment PGM is structured as follows:

student_loan-2
Figure 1: Student Loan Repayment Model


Globals: Variables that universally affect all domains, such as year and macroeconomic and policy factors constants.

student_loan_globals
Figure 2: PGM nodes within ‘Globals’ group


Macroeconomic Factors: Includes variables that tracks inflation and earnings projections like rental prince index (RPI), earnings and nominal wage index.

student_loan_macroeconomics
Figure 3: PGM nodes within ‘Macroeconomic factors’ group


Policy Factors: Government and institutional policies affecting repayment terms, thresholds, and interest rates.

student_loan_policy
Figure 4: PGM nodes within ‘Policy factors’ group


Student Profile: Captures demographic attributes, including age, gender, parental income, graduate income decile, and repayment status.

student_loan_profile
Figure 5: PGM nodes within ‘Student profile’ group


Earnings Forecast: Focuses on income predictions, employment status, and variations by demographics to estimate repayment capacity.

student_loan_earings
Figure 6: PGM nodes within ‘Earnings forecast’ group


Repayment Forecast: Tracks repayment behaviors, including balances, due amounts, and written-off debts, showing repayment evolution over time.

student_loan_repayment
Figure 7: PGM nodes within ‘Repayment forecast’ group


Frictions: Models obstacles like dropouts and administrative issues that affect repayment.

student_loan_frictions
Figure 8: PGM nodes within ‘Frictions’ group


Outputs: Calculates final outcomes, including the NPV of repayments and the RAB charge.

student_loan_outputs
Figure 9: PGM nodes within ‘Outputs’ group
3.1 Modeling Time Evolution

To incorporate time evolution, the proposed network structure is replicated in each time-step. Auxiliary or lagged nodes are included at each step to store values from previous time-steps, facilitating a temporal dependency structure and allowing the model to capture dynamic changes in variables over time.

4 Results

Figures 10 and 11 provide visual insights into the dynamic behavior and distribution of variables modeled by the student loan repayments PGM. Figure 10 presents the time evolution of six critical variables under three different policy scenarios. In Scenario 1, the maximum repayment term increases from 30 to 35 years, Scenario 2 assumes an increase in the repayment rate from 9% to 12% and Scenario 3 considers a higher inflation scenario, with the RPI index increasing from 3% to 5%.. Figure 11, on the other hand, shows the distribution of variables at three specific time points—2025, 2035, and 2045. The rows represent different variables, and the columns correspond to the selected years, providing a snapshot of how these variables shift over time under the model’s projections. Together, these figures highlight both temporal trends and cross-sectional variability, offering policymakers robust tools for understanding and forecasting loan repayment dynamics.

student_loan_time_evolution
Figure 10: Time evolution of the variables. a) Resource Accounting and Budgeting (RAB) charge, b) Debt at year start, c) Net Present values of yearly repayments, d) Net present value of cumulative sum of repayments, e) Employed earnings, d) Write-off cases.
student_loan_variables
Figure 11: Variables distribution at different time steps. Each column represents a different year: left 2025, center: 2035, right: 2045. Each row represents a different variable. Fist row: Resource Accounting and Budgeting (RAB) charge, second row: Debt at year start, third row: Net Present values of yearly repayments, last row: Employed earnings.
5 Conclusion

By integrating the StEP model with a probabilistic graphical framework, this approach provides a robust, flexible tool for modeling student loan repayments. The enhanced predictive accuracy and policy simulation capabilities address key challenges in student loan portfolio management, supporting informed decision-making in education financing.

References

Tenokonda