# 2.2 Methodology

This section outlines the methodological framework developed for this thesis. As this research operates at the intersection of financial technology, user experience (UX) design, and behavioral economics, a robust multi-disciplinary approach is required. The methodology is divided into two distinct components: first, the narrative review of existing literature used to establish the theoretical baseline; and second, the proposed experimental design for a longitudinal online controlled experiment. This dual structure ensures that the proposed experimental protocols are grounded in established academic rigor while addressing the specific gaps identified in the integration of machine learning (ML) credit scoring with frontend personalization.

## 2.2.1 Research Design and Narrative Review Protocol

This paper presents a **narrative review** of the literature on digital banking personalization, credit risk assessment, and online controlled experiments. Academic sources were identified through searches of databases including Semantic Scholar, IEEE Xplore, and cross-referencing of citations from seminal works. The search focused on publications primarily from 2015 to 2025 to ensure relevance to the rapidly evolving FinTech landscape, with foundational earlier works included where necessary for theoretical context. A total of 34 key sources were selected based on topical relevance, academic rigor, and their contribution to the domains of ML-driven credit scoring and user experience design. This review approach allows for comprehensive coverage of the disparate fields of computer science, finance, and psychology, acknowledging that the selection process was not conducted following formal systematic review protocols (e.g., PRISMA).

The primary objective of this methodological phase was to synthesize existing frameworks into a coherent experimental design. The literature analysis revealed a bifurcation in current research: studies tend to focus either exclusively on the technical accuracy of credit scoring models {cite_023}{cite_030} or on the psychological impact of personalization on consumer spending {cite_002}{cite_034}. The research design proposed herein aims to bridge this gap by embedding the technical scoring mechanism directly within the user experience experiment.

### 2.2.1.1 Methodological Synthesis from Literature

To construct a valid experimental framework, methodologies from three primary domains were analyzed and synthesized. Table 1 summarizes the key methodological approaches identified in the literature that inform the proposed study design.

| Domain | Key Methodology | Source | Application in Proposed Study |
|--------|-----------------|--------|-------------------------------|
| Exp. Design | A/B Testing (OEC) | {cite_011} | Overall Effectiveness Criterion |
| Credit Risk | ML/XGBoost/SHAP | {cite_023} | Real-time credit scoring model |
| Psychology | Longitudinal Tracking | {cite_034} | Time-inconsistent preferences |
| UX/UI | Client-side Personalization | {cite_040} | Interface adaptation |
| Security | KYC/Identity Verification | {cite_038} | User authentication protocol |

*Table 1: Synthesis of Methodological Approaches from Cited Literature.*

The synthesis of these approaches suggests that a single-point data collection method (e.g., a survey) is insufficient for assessing credit card application behavior. As noted by Frydman and Camerer {cite_034}, financial decisions are often influenced by immediate emotional states that dissociate from long-term well-being. Therefore, the research design must adopt a longitudinal perspective, tracking users from the initial interface interaction through to credit repayment behaviors. This aligns with the "Trustworthy Online Controlled Experiments" framework advocated by Kohavi and Tang {cite_011}, which emphasizes the necessity of defining an Overall Evaluation Criterion (OEC) that captures long-term value rather than short-term vanity metrics.

## 2.2.2 Proposed Experimental Framework

Based on the gaps identified in Section 2.1, this thesis proposes a longitudinal online controlled experiment (A/B/n test). The experiment is designed to isolate the effect of "personalization options" on two distinct dependent variables: *Application Conversion Rate* (short-term) and *Early Default/Delinquency Rate* (long-term).

### 2.2.2.1 Experimental Conditions and Variables

The proposed experiment utilizes a between-subjects design where users landing on the credit card application portal are randomly assigned to one of three experimental conditions. The randomization algorithm must ensure ensuring statistical independence between groups, a critical requirement for valid causal inference in online experiments {cite_011}.

**Independent Variable: Level of Personalization**
1.  **Control Group (A):** Standard static application form. Users see a generic "one-size-fits-all" interface with standard credit card offers, regardless of their demographic or behavioral profile.
2.  **Treatment Group 1 (B - Surface Personalization):** Users see an interface that adapts superficially based on basic browser data (e.g., location, device type). This aligns with client-side personalization techniques discussed by Asif and Krogstie {cite_040}, where the layout adjusts to the device but the core financial product remains static.
3.  **Treatment Group 2 (C - Deep/Risk-Aware Personalization):** Users interact with a dynamic interface driven by a real-time ML assessment. As proposed by Pathi {cite_023}, this condition utilizes a multi-agent framework where preliminary data is processed to offer personalized credit terms and explanations (XAI) during the application process.

**Dependent Variables**
To measure the efficacy of these conditions, the study tracks specific metrics defined in the literature.

*   **Conversion Rate ($CR$):** Defined as the percentage of users who complete the application process.
    $$CR = \frac{N_{completed}}{N_{started}} \times 100$$
    Where $N_{started}$ is the number of unique visitors who initiate the form, and $N_{completed}$ is the number of successfully submitted applications. This metric addresses the "hidden conversion funnel" issues identified by Goldstein and Hajaj {cite_026}.

*   **Delinquency Probability ($P_d$):** A predicted measure of long-term value. Since actual default takes months to manifest, the experiment proposes using the output of a validated credit risk model (e.g., XGBoost) as a proxy for future behavior, as validated in recent FinTech studies {cite_023}{cite_030}.

### 2.2.2.2 Participant Recruitment and Sampling Strategy

The proposed methodology assumes a deployment environment within a live banking or FinTech ecosystem to ensure ecological validity. Unlike laboratory studies which often suffer from lack of realism, an online controlled experiment allows for the observation of natural user behavior. The target population includes first-time visitors to the credit card application portal.

Sample size calculation is a critical component of the design. Following the guidelines by Kohavi and Tang {cite_011}, the required sample size ($N$) per variant is estimated based on the minimum detectable effect (MDE). Assuming a baseline conversion rate of 10% and a desired power of 80% ($\beta = 0.2$) with a significance level of 5% ($\alpha = 0.05$), the sample size calculation would follow standard power analysis protocols:

$$N \approx \frac{16\sigma^2}{\delta^2}$$

Where $\sigma^2$ is the variance of the metric and $\delta$ is the minimum detectable effect. Given the typically high variance in financial conversion data, the study design requires a substantial traffic volume, likely in the range of thousands of unique visitors, to achieve statistical significance. This requirement underscores the importance of the digital banking maturity context discussed by Deloitte {cite_015}, as only mature digital platforms can sustain the necessary traffic for such granular experimentation.

## 2.2.3 Technical Architecture and Implementation

To execute the proposed experiment, a robust technical architecture is required. The literature review identifies several key technologies that enable secure, personalized, and responsive web applications suitable for financial data collection.

### 2.2.3.1 Frontend and User Interface Technologies

The application interface serves as the primary stimulus in the experiment. To support the "Deep Personalization" condition, the frontend must be capable of dynamic rendering based on server-side logic. Litvinavicius {cite_041} highlights **Blazor Server-side** as a potent framework for such applications. Blazor allows C# code to run on the server while updating the client UI over a SignalR connection, enabling complex logic (like real-time credit scoring adjustments) to be executed securely without exposing sensitive algorithms to the client browser.

Furthermore, the design must account for the prevalence of mobile access in financial services. As noted by Huang et al. {cite_007}, short-form and mobile-first interactions are becoming dominant. Therefore, the application must utilize responsive HTML5 design principles. However, security is paramount; Dora and Hluchý {cite_039} warn of "HTML Smuggling" attacks where malicious scripts are hidden in HTML5 and JavaScript. The proposed architecture mitigates this by implementing strict Content Security Policies (CSP) and server-side validation, ensuring that the personalization scripts do not introduce vulnerabilities.

### 2.2.3.2 Machine Learning Integration (Backend)

The "Deep Personalization" condition relies on a backend Machine Learning (ML) engine to generate real-time recommendations. The methodology integrates the framework proposed by Pathi {cite_023}, which utilizes a multi-agent system combining Interpretable Machine Learning (IML) with Large Language Models (LLMs).

1.  **Risk Assessment Agent:** This component utilizes an XGBoost model trained on historical loan data (e.g., Lending Club dataset) to predict applicant risk. The model calculates the probability of default ($P(y=1|x)$).
    $$\text{LogOdds} = \sum_{k=1}^{K} f_k(x_i)$$
    Where $f_k$ represents the $k$-th tree in the ensemble model.

2.  **Explainability Agent:** To address the "Integration Gap" regarding trust, the system uses SHAP (SHapley Additive exPlanations) values to generate user-facing explanations. As Pathi {cite_023} demonstrates, providing reasons for credit decisions (e.g., "Your debt-to-income ratio is slightly high") can improve transparency. In the proposed experiment, these explanations are dynamically presented to users in Group C to test if transparency increases completion rates.

3.  **Fraud Detection Layer:** Before any personalization logic runs, a fraud detection layer must filter out malicious bots. Sam et al. {cite_030} compare traditional and automated ML models for credit card fraud, suggesting that automated feature engineering can significantly improve detection rates. This layer ensures that the experiment's data is not contaminated by non-human traffic.

### 2.2.3.3 Security and Compliance Framework

Given the sensitive nature of financial data, the methodology must adhere to strict regulatory standards. The "Know Your Customer" (KYC) process is a legal requirement that can often introduce friction in the UX. Zentoni et al. {cite_038} discuss KYC models as strategic frameworks for preventing financial abuse. The proposed experiment integrates a streamlined KYC process that balances security with usability.

Additionally, to comply with PSD2 regulations regarding Secure Customer Authentication (SCA), the application flow incorporates multi-factor authentication steps. Sacaleanu and Tak {cite_020} argue that while SCA ensures security, it can negatively impact customer experience if not implemented thoughtfully. The experiment controls for this by keeping the authentication steps constant across all three groups, isolating the variable of *personalization options* rather than security friction.

For data integrity, Osborne {cite_037} suggests the potential of blockchain and smart contracts for immutable KYC records. While full blockchain implementation may be beyond the scope of a standard A/B test, the conceptual framework of immutable audit logs is adopted to ensure that user consent and data handling comply with GDPR and local financial regulations.

## 2.2.4 Data Collection and Analysis Procedures

The data collection strategy involves capturing high-dimensional clickstream data alongside application form inputs. This allows for a granular analysis of user behavior beyond simple submission rates.

### 2.2.4.1 Metrics and Measurement

Table 2 outlines the key metrics proposed for the study, categorized by the phase of the user journey they measure.

| Metric Category | Specific Metric | Definition/Formula | Source |
|-----------------|-----------------|--------------------|--------|
| Acquisition | Form Completion Rate | Submissions / Visits | {cite_026} |
| Engagement | Time on Page | Duration of active session | {cite_008} |
| Risk | Predicted Default Rate | Mean prob. of default (Model) | {cite_023} |
| Trust | Explanation Interaction | Clicks on "Why am I seeing this?" | {cite_023} |
| Tech Perf. | Latency | Server response time (ms) | {cite_041} |

*Table 2: Proposed Metrics for Experimental Evaluation.*

**Acquisition Metrics:** The primary metric for the "short-term" success is the conversion rate. Goldstein and Hajaj {cite_026} emphasize the difference between mobile and desktop funnels; thus, data will be stratified by device type to prevent confounding variables.

**Engagement Metrics:** Gada {cite_008} highlights the importance of user engagement and retention strategies in FinTech. Time-on-page and interaction depth (e.g., adjusting sliders, clicking information tooltips) will serve as proxies for user engagement.

**Risk Metrics:** To assess the "quality" of the acquired customers, the experiment calculates the *Predicted Default Rate* of the successful applicants in each group. If Group C (Deep Personalization) yields a higher conversion rate but also a higher predicted default rate, the personalization strategy may be financially detrimental. This addresses the "Longitudinal Personalization Impact" gap.

### 2.2.4.2 Statistical Analysis Plan

The analysis of the experimental data will employ both descriptive and inferential statistics.

1.  **Hypothesis Testing:** To compare conversion rates between groups (A, B, C), a Chi-square test of independence ($\chi^2$) will be used for categorical outcomes (Converted/Not Converted).
    $$\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}$$
    Where $O_i$ is the observed frequency and $E_i$ is the expected frequency.

2.  **Survival Analysis:** For the longitudinal aspect (time to default or time to churn), the methodology proposes using Kaplan-Meier survival estimates. Although the experiment may not run long enough to observe actual defaults for all users, survival analysis allows for the handling of censored data (users who have not yet defaulted by the end of the study).

3.  **Feature Importance Analysis:** To understand *which* personalization features drive behavior, the study will utilize the SHAP value framework described by Pathi {cite_023}. This allows for the decomposition of the model's output to attribute specific conversion probabilities to specific interface elements (e.g., "Seeing the interest rate explanation increased conversion probability by 5%").

## 2.2.5 Ethical Considerations and Limitations

The proposed methodology involves the manipulation of financial product presentation, which raises significant ethical considerations. Sriram {cite_024} discusses the challenges of bias and ethics in algorithmic credit decisions, noting that AI can democratize credit but also reinforce inequalities.

**Algorithmic Bias:** The ML model used in Group C must be audited for bias before deployment. If the training data (e.g., Lending Club data) contains historical biases against certain demographics, the personalization engine might unfairly steer these groups towards sub-optimal products. To mitigate this, the methodology incorporates "Fairness Constraints" in the optimization objective, ensuring equalized odds across protected attributes (gender, age).

**Informed Consent:** Users participating in online A/B tests are typically unaware of their participation. While standard in industry, academic ethics require a higher standard. The proposed design includes a debriefing mechanism or a terms of service update that explicitly states that "interface features may vary for testing purposes," aligning with the transparency principles advocated by Kohavi and Tang {cite_011}.

**Limitations:**
1.  **Technological Maturity:** As noted by Vaidya {cite_013}, the implementation of such a sophisticated framework requires a high level of digital banking maturity. The results may not be generalizable to smaller institutions with legacy systems.
2.  **Short-term Proxy:** The use of *predicted* default rates (via ML) rather than *actual* default rates (which take years to materialize) is a limitation. However, given the constraints of a master's thesis timeline, this proxy is accepted in the literature as a valid interim measure {cite_035}.
3.  **Device Heterogeneity:** The impact of screen size on credit card application behavior is significant {cite_027}. While the experiment stratifies by device, the sheer variety of Android devices and screen resolutions may introduce noise into the UX metrics.

## 2.2.6 Conclusion of Methodology

This section has outlined a comprehensive methodological framework for assessing the impact of personalization in credit card applications. By synthesizing the rigorous experimental protocols of Kohavi {cite_011} with advanced ML frameworks from Pathi {cite_023} and behavioral insights from Frydman and Camerer {cite_034}, the proposed design is well-positioned to answer the research questions. The use of a longitudinal online controlled experiment addresses the critical need to measure not just *acquisition* (conversion), but *quality of acquisition* (risk), thereby filling the identified gap in the literature. The subsequent sections will detail the expected analysis and results derived from this experimental setup.