# Scout Output - Academic Citation Discovery

## Summary

**Total Valid Citations**: 45
**Success Rate**: 45.9%
**Failed Topics**: 73

### Sources Breakdown

- **Crossref**: 23 (51.1%)
- **Semantic Scholar**: 19 (42.2%)
- **Gemini Grounded**: 3 (6.7%)
- **Gemini LLM**: 0 (0.0%)

---

## Citations Found

### From Crossref (23 citations)

#### 1. Exploring the Potential of Large Language Models in Automatic Pull Request Title Generation: An Empirical Study
**Authors**: Zuo, Lan, Liao
**Year**: 2024
**DOI**: 10.1109/apsec65559.2024.00030
**URL**: https://doi.org/10.1109/apsec65559.2024.00030

#### 2. Transforming Software Development with Generative AI: Empirical Insights on Collaboration and Workflow
**Authors**: Ulfsnes, Moe, Stray, Skarpen
**Year**: 2024
**DOI**: 10.1007/978-3-031-55642-5_10
**URL**: https://doi.org/10.1007/978-3-031-55642-5_10

#### 3. GitHub Copilot Chat in Developer Workflow
**Authors**: Reddy Vootukuri
**Year**: 2025
**DOI**: 10.1007/979-8-8688-2196-7_3
**URL**: https://doi.org/10.1007/979-8-8688-2196-7_3

#### 4. How AI Can Help Transform Developer Productivity Through Code Assistants
**Authors**: Arora
**Year**: 2025
**DOI**: 10.59350/wxbdd-nfr76
**URL**: https://doi.org/10.59350/wxbdd-nfr76

**Abstract**: Introduction In the ever-evolving landscape of software development, AI-powered code assistants have emerged as game-changing tools that are revolutionising how developers write, debug, and maintain code. As someone who has personally experienced this transformation, in this blog post, I will share how these intelligent assistants are reshaping developer productivity.

#### 5. When Medicaid Unwinding Meets AI: In the Matter of DeLoitte Consulting   
**Authors**: Rosenbaum
**Year**: 2024
**DOI**: 10.1599/mqop.2024.0221
**URL**: https://doi.org/10.1599/mqop.2024.0221

#### 6. 8 Redefining and Transforming Software Development with Generative AI
**Authors**: Lakshmi, Helen, Sambasivam
**Year**: 2025
**DOI**: 10.1515/9783111677798-008
**URL**: https://doi.org/10.1515/9783111677798-008

#### 7. Towards an Adoption Framework to Foster Trust in AI-Assisted Software Engineering
**Authors**: Barón
**Year**: 2025
**DOI**: 10.1109/cain66642.2025.00038
**URL**: https://doi.org/10.1109/cain66642.2025.00038

#### 8. Interactions with Generative AI: Wearables to Measure Developer Experience and Productivity Objectively
**Authors**: Brandebusemeyer
**Year**: 2025
**DOI**: 10.1109/icse-companion66252.2025.00043
**URL**: https://doi.org/10.1109/icse-companion66252.2025.00043

#### 9. Human-Centered Software Engineering: Software Engineering Architectures, Patterns, and Sodels for Human Computer Interaction
**Authors**: Seffah, Vanderdonckt, Desmarais
**Year**: 2009
**DOI**: 10.1007/978-1-84800-907-3_1
**URL**: https://doi.org/10.1007/978-1-84800-907-3_1

#### 10. UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench
**Authors**: Zhu, Kang
**Year**: 2025
**DOI**: 10.18653/v1/2025.acl-long.189
**URL**: https://doi.org/10.18653/v1/2025.acl-long.189

#### 11. Testing the Cooperation of Autonomous Robotic Agents
**Authors**: Lill, Saglietti
**Year**: 2014
**DOI**: 10.5220/0004990402870296
**URL**: https://doi.org/10.5220/0004990402870296

#### 12. Ethics by Agreement in Multi-agent Software Systems
**Authors**: Nallur, Collier
**Year**: 2019
**DOI**: 10.5220/0007958105290535
**URL**: https://doi.org/10.5220/0007958105290535

#### 13. The Green Data Dilemma: Measuring the Environmental Cost of AI Model Training Against its Sustainability Benefits
**Authors**: Peter Odhiambo
**Year**: 2025
**DOI**: 10.2139/ssrn.5618610
**URL**: https://doi.org/10.2139/ssrn.5618610

#### 14. Comparative Analysis of AI Models for Python Code Generation: A HumanEval Benchmark Study
**Authors**: Bayram, Menekse Dalveren, Derawi
**Year**: 2025
**DOI**: 10.3390/app15189907
**URL**: https://doi.org/10.3390/app15189907

**Abstract**: This study conducts a comprehensive comparative analysis of six contemporary artificial intelligence models for Python code generation using the HumanEval benchmark. The evaluated models include GPT-3.5 Turbo, GPT-4 Omni, Claude 3.5 Sonnet, Claude 3.7 Sonnet, Claude Sonnet 4, and Claude Opus 4. A total of 164 Python programming problems were utilized to assess model performance through a multi-faceted methodology incorporating automated functional correctness evaluation via the Pass@1 metric, cyclomatic complexity analysis, maintainability index calculations, and lines-of-code assessment. The results indicate that Claude Sonnet 4 achieved the highest performance with a success rate of 95.1%, followed closely by Claude Opus 4 at 94.5%. Across all metrics, models developed by Anthropic Claude consistently outperformed those developed by OpenAI GPT by margins exceeding 20%. Statistical analysis further confirmed the existence of significant differences between the model families (p &lt; 0.001). Anthropic Claude models were observed to generate more sophisticated and maintainable solutions with superior syntactic accuracy. In contrast, OpenAI GPT models tended to adopt simpler strategies but exhibited notable limitations in terms of reliability. These findings offer evidence-based insights to guide the selection of AI-powered coding assistants in professional software development contexts.

#### 15. An Empirical Study: Leveraging Prompt Engineering with AI Coding Assistants to Develop Energy-Efficient Code (2025)
**Authors**: Podder, Date, Murthy
**Year**: 2025
**DOI**: 10.36227/techrxiv.175339126.69681777/v1
**URL**: https://doi.org/10.36227/techrxiv.175339126.69681777/v1

#### 16. Formalizing the ISO/IEC/IEEE 29119 Software Testing Standard
**Authors**: Ali, Yue
**Year**: 2015
**DOI**: 10.1109/models.2015.7338271
**URL**: https://doi.org/10.1109/models.2015.7338271

#### 17. ISO 42001 and the Law
**Authors**: Seet
**Year**: 2025
**DOI**: 10.1007/979-8-8688-2099-1_3
**URL**: https://doi.org/10.1007/979-8-8688-2099-1_3

#### 18. AI-Driven SBOM: Automated Software Bill of Materials Generation and Management
**Authors**: Shukla
**Year**: 2025
**DOI**: 10.64917/feaiml/volume02issue12-08
**URL**: https://doi.org/10.64917/feaiml/volume02issue12-08

**Abstract**: The openness of modern software development has increased the urgent demand and necessity to manage Software Bill of Materials (SBOM) comprehensively due to the increasing number of open-source elements and third-party dependencies. Manual methods of SBOM generation and maintenance are tedious, prone to error, and are unable to keep up with short development cycles. In this paper, a framework based on AI to generate SBOM, analyze it, and assess the vulnerability is introduced. By using machine learning algorithms such as natural language processing, graph neural networks, and deep learning models, we can automatically identify, classify, and trace components in a complex chain of dependencies of software [1][2]. Our multi-model system of architectural design that employs the methods of the static analysis and the AI-based pattern recognition allows us to reach the results of 94.7 percent component detection and 91.3 percent accuracy in vulnerability mapping. It uses automated package manager parsing, binary analysis and license compliance verification as methodology. The experimental findings prove to be markedly better than the traditional tools that minimize the time of SBOM generation by 78% and maximize completeness by 34%. The system has managed to point out 2,847 untested faiths in enterprise codebases and has accordingly classified 96.2 percent of software licenses. We find the results that AI-powered SBOM systems do not just improve the security posture but also facilitate compliance processes, so they must be part of present-day DevSecOps. This study is relevant to the developing body of AI-enhanced software supply chain security.

#### 19. Emerging Trends in Software Supply Chain Security
**Authors**: Syed
**Year**: 2024
**DOI**: 10.1007/979-8-8688-0799-2_10
**URL**: https://doi.org/10.1007/979-8-8688-0799-2_10

#### 20. Human–AI Collaborative Product Conceptual Design
**Authors**: Sun, Song, Chen
**Year**: 2025
**DOI**: 10.1201/9781003491781-4
**URL**: https://doi.org/10.1201/9781003491781-4

#### 21. Context-aware code review: integrating generative AI for automated pull request analysis
**Authors**: Balachandran, Fawzer
**Year**: 2025
**DOI**: 10.31705/adscai.2025.53
**URL**: https://doi.org/10.31705/adscai.2025.53

**Abstract**: Pull request reviews in software industry are vital for ensuring code quality. Traditional manual reviews offer valuable human insights but can be inefficient. They also struggle with the hallenges posed by rapidly growing, complex codebases. On the other hand, many automated tools focus only on syntax and style. They do not account for the broader business context. This paper presents a context-aware PR review system that combines generative AI, transformer-based embeddings, vector databases, and git diff augmentation to bridge the gap between technical accuracy and business needs. The goal is to provide clear feedback on both code implementation and intent, addressing challenges in large domain-specific codebases.

#### 22. Software Engineering: The Future of a Profession
**Authors**: Musa
**Year**: 1985
**DOI**: 10.1109/ms.1985.230049
**URL**: https://doi.org/10.1109/ms.1985.230049

#### 23. When AI Feels Supportive: Psychological Safety, Satisfaction, and Turnover Among Healthcare Professionals
**Authors**: Cha
**Year**: 2025
**DOI**: 10.22541/au.176007254.45491764/v1
**URL**: https://doi.org/10.22541/au.176007254.45491764/v1

### From Semantic Scholar (19 citations)

#### 1. Exploring the Potential of Large Language Models in Automatic Pull Request Title Generation: An Empirical Study
**Authors**: Zuo, Lan, Liao
**Year**: 2024
**DOI**: 10.1109/APSEC65559.2024.00030
**URL**: https://doi.org/10.1109/APSEC65559.2024.00030

**Abstract**: Pull Requests (PRs) are a collaborative mechanism in GitHub, allowing developers to merge their code changes into another branch of the software repository. The PR title serves as a summary of the PR and needs to accurately and concisely describe the specific changes made, which is useful for reviewers and other developers to review and understand. There are many existing methods for automatically generating PR titles, most of which are based on pre-trained models. Although these methods are effective, pre-trained models often require extensive fine-tuning for specific tasks. Compared to pre-trained models, large language models (LLMs) possess superior semantic understanding capabilities. As a foundational model, they can solve most tasks directly without relying on fine-tuning, providing an alternative solution for PR title generation. However, the capabilities of LLMs in the automatic PR title generation have not been fully explored. To fill this gap, we conducted an empirical study to understand the capabilities of LLMs in PR title generation. Initially, the direct application of LLMs to generate PR titles did not yield satisfactory results. We found that using similar PRs from the dataset as auxiliary information can effectively enhance the title generation capability of LLMs. When the number of most similar PRs used as input increased from 0 to 5, the ROUGE-L F1 score of the titles generated by LLMs increased by an average of 23.48 %, with improvements in other metrics as well. In further experiments, we discovered that setting a lower temperature for the LLMs can bring better performance. We then selected the best parameter configuration and compared it with the existing state-of-the-art methods. Our experimental results show that LLMs outperform the state of the art methods in Precision, Recall, and METEOR metrics on the PRTiger dataset. Additionally, human evaluation results indicate that PR titles generated by LLMs receive higher scores in Correctness, Naturalness, and Comprehensibility.

#### 2. An Exploration of How Generative AI Affects Workflow and Collaboration in a Software Engineering Course
**Authors**: Salomon, Chin, Holmes, Fritz, Murphy
**Year**: 2025
**DOI**: 10.1145/3758317.3759680
**URL**: https://doi.org/10.1145/3758317.3759680

**Abstract**: How does Generative AI (GenAI) impact how students work and collaborate in a software engineering course? To explore this question, we conducted an exploratory study in a project-based course where students developed three versions of a system across agile sprints, with unrestricted access to GenAI tools. From survey responses of 349 students, we found that the technology was used extensively with 84% of students reporting use and 90% of them finding the technology useful. Through semi-structured interviews with 24 of the students, we delved deeper, learning that students used GenAI pervasively, not only to generate code but also to validate work retrospectively, such as checking alignment with requirements and design after implementation had begun. Students often turned to GenAI as their first point of contact, even before consulting teammates, which reduced direct interpersonal collaboration. These results suggest the need for new pedagogical strategies that address not just individual tool use, but also design reasoning and collaborative practices in GenAI-augmented teams.

#### 3. The Influence of Artificial Intelligence Tools on Learning Outcomes in Computer Programming: A Systematic Review and Meta-Analysis
**Authors**: Alanazi, Soh, Samra, Li
**Year**: 2025
**DOI**: 10.3390/computers14050185
**URL**: https://doi.org/10.3390/computers14050185

**Abstract**: This systematic review and meta-analysis investigates the impact of artificial intelligence (AI) tools, including ChatGPT 3.5 and GitHub Copilot, on learning outcomes in computer programming courses. A total of 35 controlled studies published between 2020 and 2024 were analysed to assess the effectiveness of AI-assisted learning. The results indicate that students using AI tools outperformed those without such aids. The meta-analysis findings revealed that AI-assisted learning significantly reduced task completion time (SMD = −0.69, 95% CI [−2.13, −0.74], I2 = 95%, p = 0.34) and improved student performance scores (SMD = 0.86, 95% CI [0.36, 1.37], p = 0.0008, I2 = 54%). However, AI tools did not provide a statistically significant advantage in learning success or ease of understanding (SMD = 0.16, 95% CI [−0.23, 0.55], p = 0.41, I2 = 55%), with sensitivity analysis suggesting result variability. Student perceptions of AI tools were overwhelmingly positive, with a pooled estimate of 1.0 (95% CI [0.92, 1.00], I2 = 0%). While AI tools enhance computer programming proficiency and efficiency, their effectiveness depends on factors such as tool functionality and course design. To maximise benefits and mitigate over-reliance, tailored pedagogical strategies are essential. This study underscores the transformative role of AI in computer programming education and provides evidence-based insights for optimising AI-assisted learning.

#### 4. The impact of GitHub Copilot on developer productivity from a software engineering body of knowledge perspective
**Authors**: Smit, Smuts, Louw, Pielmeier, Eidelloth
**Year**: 2024
**DOI**: 
**URL**: https://www.semanticscholar.org/paper/d42ed56a876c314bd4495942a4468c49720c68d0

#### 5. Detecting Adversarial Prompted AI-Generated Code on Stack Overflow: A Benchmark Dataset and an Enhanced Detection Approach
**Authors**: Swaraj, Agarwal, Joshi, Kumar
**Year**: 2025
**DOI**: 10.1109/ICSME64153.2025.00089
**URL**: https://doi.org/10.1109/ICSME64153.2025.00089

**Abstract**: AI-generated code has become an integral part of the mainstream developer workflow today. However, in community-driven platforms like Stack Overflow (SO), where trust, authorship, and credibility are important, it can lead to serious complications. While recent studies have focused on detecting AI-generated code, they have mostly worked with long code samples from repositories and assignments. In contrast, code snippets on SO are often small and context-specific, and thus may prove more challenging for detection. Moreover, another aspect overlooked in prior studies concerns recognizing adversarially prompted AI code deliberately crafted to resemble human-written code. To address these limitations, we have first introduced a large-scale dataset comprising 3500 pairs of SO and ChatGPT answers, along with a curated set of 4500 adversarially prompted AI responses. Next, we evaluate existing code language models over this newly curated dataset. Our evaluation shows that existing models perform well on standard AI answers but fail to detect adversarial ones. Finally, to improve detection, we propose an ensemble approach combining stylometric features of code along with the code embeddings. Our approach shows consistent improvements across multiple models and improves resistance to adversarial prompted code. Our overall findings open promising directions for future research into understanding the nuances of AI code detection with adversarial prompting and code stylometry.

#### 6. Benchmarking of Generative AI Tools in Software Engineering Education: Formative Insights for Curriculum Integration
**Authors**: Roy, Horielko, Omojokun
**Year**: 2025
**DOI**: 10.1145/3702653.3744328
**URL**: https://doi.org/10.1145/3702653.3744328

**Abstract**: Generative Artificial Intelligence (Gen-AI) has revolutionized software engineering (SE) by automating tasks across design, coding, and testing [1] [2]. Tools like ChatGPT and GitHub Copilot streamline code generation, architectural modeling, debugging, and test-case creation [3] [4]. Despite their rapid adoption in industry, the pedagogical implications of these tools in computing education have not been systematically examined. This study solves the existing gap by conducting a comprehensive benchmarking study of Gen-AI tools across four core SE phases— design documentation, feature implementation, debugging support, and testing — to address two research questions: RQ1: What strengths and limitations do Gen-AI tools exhibit in each phase? RQ2: How can insights from benchmarking inform effective integration of Gen-AI into SE curricula? To answer these questions, a diverse set of Gen-AI tools is evaluated, ranging from design-focused assistants such as Lucidchart, Mermaid.js and UIzard; implementation-oriented systems including GitHub Copilot, TabNine, Codeium and Supermaven; debugging supports like GPT-4 and Claude 3.5 Sonnet; and testing frameworks such as Testim, Mabl and Applitools—while also surveying emerging platforms (as of summer 2024) like Replit, Postman, Visily, Gemini, Eraser.io and others. For each tool and development phase, we applied phase-specific metrics: in design documentation, we assessed diagram accuracy, completeness, user effort, and IDE integration; in feature implementation, we measured pattern-based code generation quality, code-completion effectiveness, refactoring robustness, and UI/UX scaffolding; in debugging, we evaluated error-detection accuracy, hallucination rates, and clarity of explanatory feedback; and in testing, we examined test-case relevance and defect-detection coverage. Across all phases, we tracked prompt engineering complexity as a key mediating factor influencing tool performance. Our evaluation reveals speed-fidelity trade-offs: Code-completion assistants accelerate boilerplate generation but demand manual oversight to ensure cross-file consistency and manage higher-order abstractions; diagramming tools can produce precise UML models with minimal effort— but at the cost of iterative prompt refinement for complex cases; LLM debuggers deliver context-sensitive fixes yet suffer from nontrivial hallucination rates; testing generators exhibit wide variance in edge-case coverage. On average, tools needed 2.4 prompt iterations for usable diagrams and 1.5 prompts for bug fixes, underscoring the human effort in guiding AI. We recommend a scaffolded framework for integrating Gen-AI into SE education by: embedding AI tools into hands-on assignments, to explore tasks in a controlled context; by structuring small team projects in which one subgroup uses AI assistants while the other completes the same tasks manually (covering design, implementation, debugging and testing) to surface contrasts in workflow, tool strengths, and human reasoning; by requiring students to maintain a reflective journal documenting their AI usage and prompt-engineering strategies, fostering metacognitive insight into how tool inputs shape outputs; and by equipping learners with decision making criteria, teaching them to evaluate AI assistants according to task fit- preparing them to leverage AI responsibly across SE phases in its evolving landscape.

#### 7. Generative AI in Evidence-Based Software Engineering: A White Paper
**Authors**: Esposito, Janes, Taibi, Lenarduzzi
**Year**: 2024
**DOI**: 10.48550/arXiv.2407.17440
**URL**: https://doi.org/10.48550/arXiv.2407.17440

**Abstract**: Context. In less than a year practitioners and researchers witnessed a rapid and wide implementation of Generative Artificial Intelligence. The daily availability of new models proposed by practitioners and researchers has enabled quick adoption. Textual GAIs capabilities enable researchers worldwide to explore new generative scenarios simplifying and hastening all timeconsuming text generation and analysis tasks. Motivation. The exponentially growing number of publications in our field with the increased accessibility to information due to digital libraries makes conducting systematic literature reviews and mapping studies an effort and timeinsensitive task Stemmed from this challenge we investigated and envisioned the role of GAIs in evidencebased software engineering. Future Directions. Based on our current investigation we will follow up the vision with the creation and empirical validation of a comprehensive suite of models to effectively support EBSE researchers

#### 8. Navigating the Complexity of Generative AI Adoption in Software Engineering—RCR Report
**Authors**: Russo
**Year**: 2024
**DOI**: 10.1145/3680471
**URL**: https://doi.org/10.1145/3680471

**Abstract**: This Replicated Computational Results (RCR) report complements the study “Navigating the Complexity of Generative AI Adoption in Software Engineering,” which examines the factors influencing the integration of AI tools in software engineering practices. Employing a mixed-methods approach grounded in the Technology Acceptance Model, Diffusion of Innovation Theory, and Social Cognitive Theory, the study introduces the Human-AI Collaboration and Adaptation Framework (HACAF), validated through PLS-SEM analysis. The replication package detailed herein includes survey instruments, raw data, and analysis scripts essential for reproducing the study's findings. By providing these artifacts, the RCR report aims to support transparency, enable replication, and encourage further research on effective AI tool adoption strategies in software engineering.

#### 9. Interactions with Generative AI: Wearables to Measure Developer Experience and Productivity Objectively
**Authors**: Brandebusemeyer
**Year**: 2025
**DOI**: 10.1109/ICSE-Companion66252.2025.00043
**URL**: https://doi.org/10.1109/ICSE-Companion66252.2025.00043

**Abstract**: With recent advances in AI, many technology firms and developers are exploring whether Generative AI (GenAI) tools can help improve productivity, code quality and the overall developer experience. Current evaluations primarily depend on subjective questionnaire data or code quality metrics. A supplemental objective, continuous, real-time developer-centered measure is missing. Wearables with body sensors that measure the physiological activity of a person have the aforementioned benefits. With their help, impeding factors from GenAI tools on a developer's well-being and cognitive load may be detected. Implications for productivity may be deduced from these cognitive load measures. This research aims to use body sensor data to (1) assess the ability of wearables to measure developers' cognitive load in their everyday working context to then (2) evaluate developer experience, productivity and GenAI interactions in real-life work environments. A holistic mixed-method approach by combining subjective and objective measures in a developer-centered manner is taken.

#### 10. Agentless: Demystifying LLM-based Software Engineering Agents
**Authors**: Xia, Deng, Dunn, Zhang
**Year**: 2024
**DOI**: 10.48550/arXiv.2407.01489
**URL**: https://doi.org/10.48550/arXiv.2407.01489

**Abstract**: Recent advancements in large language models (LLMs) have significantly advanced the automation of software development tasks, including code synthesis, program repair, and test generation. More recently, researchers and industry practitioners have developed various autonomous LLM agents to perform end-to-end software development tasks. These agents are equipped with the ability to use tools, run commands, observe feedback from the environment, and plan for future actions. However, the complexity of these agent-based approaches, together with the limited abilities of current LLMs, raises the following question: Do we really have to employ complex autonomous software agents? To attempt to answer this question, we build Agentless -- an agentless approach to automatically solve software development problems. Compared to the verbose and complex setup of agent-based approaches, Agentless employs a simplistic three-phase process of localization, repair, and patch validation, without letting the LLM decide future actions or operate with complex tools. Our results on the popular SWE-bench Lite benchmark show that surprisingly the simplistic Agentless is able to achieve both the highest performance (32.00%, 96 correct fixes) and low cost ($0.70) compared with all existing open-source software agents! Furthermore, we manually classified the problems in SWE-bench Lite and found problems with exact ground truth patch or insufficient/misleading issue descriptions. As such, we construct SWE-bench Lite-S by excluding such problematic issues to perform more rigorous evaluation and comparison. Our work highlights the current overlooked potential of a simple, interpretable technique in autonomous software development. We hope Agentless will help reset the baseline, starting point, and horizon for autonomous software agents, and inspire future work along this crucial direction.

#### 11. A Framework for Intelligent Cloud Systems: Enabling Secure, Policy-Driven, and Sustainable AI at Scale
**Authors**: Jamili, Peta, Taware, Krishnan, Perla
**Year**: 2025
**DOI**: 10.1109/I2ITCON65200.2025.11210587
**URL**: https://doi.org/10.1109/I2ITCON65200.2025.11210587

**Abstract**: The speed of the spread of cloud-hosted artificial intelligence (AI) services requires a model that has the capability to promote secure, policy-based, and environmentally sustainable AI at scale. This paper presents a new framework of intelligent cloud systems that contains adaptive AI orchestration and automated policy enforcement and sustainable resource management. The architecture suggested will use dynamic policy-as-code approaches to provide dynamic support of security, privacy regulations and compliance requirements and control distributed AI workloads in a fine-grained manner. Moreover, energy-efficient scheduling and predictive auto scaling enable better sustainability by reducing the carbon emissions of model training and inference on large-scale systems. On a wide range of workloads, experimental analysis shows that the framework uses resources more efficiently (up to 28 percent), is more secure (because it operates continuously to validate policies), and can easily scale to any workload, without impacting performance or sustainability objectives. The outcomes indicate the promise of this framework to allow responsible and scalable AI to be deployed throughout cloud environments.

#### 12. Sensory Experience-Driven CMF Design for Smart Cabins: A Generative AI Collaborative Approach with the MINI Aceman Ocean Wave Green as a Case Study
**Authors**: Wang
**Year**: 2025
**DOI**: 10.17548/ksaf.2025.06.30.347
**URL**: https://doi.org/10.17548/ksaf.2025.06.30.347

#### 13. Exploring the Impact of ISO/IEC 42001:2023 AI Management Standard on Organizational Practices
**Authors**: Biroğul, Şahin, əsgərli
**Year**: 2025
**DOI**: 10.54569/aair.1709628
**URL**: https://doi.org/10.54569/aair.1709628

**Abstract**: This paper examines the technical, operational, and strategic impacts of implementing the ISO/IEC 42001:2023 Artificial Intelligence (AI) Management System standard, which is a critical factor for companies adapting to the transformative effects of AI technologies in the business world. Aimed at ensuring the ethical and reliable governance of AI systems, this standard assists organizations in developing transparent, unbiased, fair and sustainable AI solutions. The framework provided by ISO/IEC 42001:2023 is also discussed in terms of its benefits in critical areas such as data security, operational efficiency, regulatory compliance and competitive advantage. In this context, it is emphasized that companies can adopt AI applications not only as a technical innovation but also as a strategic management element. The integration processes between ISO/IEC 42001:2023 AI Management System and ISO/IEC 27001:2022 Information Security Management System are presented, highlighting how these two standards complement each other. An analysis is provided on how principles of information security, risk management, and transparency can be effectively implemented within AI systems. In conclusion, the adoption of the ISO/IEC 42001:2023 AI management system enables companies to manage AI applications within a secure and ethical framework while achieving a sustainable competitive advantage in their digital transformation processes.

#### 14. AI-Driven SBOM: Automated Software Bill of Materials Generation and Management
**Authors**: Shukla
**Year**: 2025
**DOI**: 10.64917/feaiml/volume02issue12-08
**URL**: https://doi.org/10.64917/feaiml/volume02issue12-08

**Abstract**: The openness of modern software development has increased the urgent demand and necessity to manage Software Bill of Materials (SBOM) comprehensively due to the increasing number of open-source elements and third-party dependencies. Manual methods of SBOM generation and maintenance are tedious, prone to error, and are unable to keep up with short development cycles. In this paper, a framework based on AI to generate SBOM, analyze it, and assess the vulnerability is introduced. By using machine learning algorithms such as natural language processing, graph neural networks, and deep learning models, we can automatically identify, classify, and trace components in a complex chain of dependencies of software [1][2]. Our multi-model system of architectural design that employs the methods of the static analysis and the AI-based pattern recognition allows us to reach the results of 94.7 percent component detection and 91.3 percent accuracy in vulnerability mapping. It uses automated package manager parsing, binary analysis and license compliance verification as methodology. The experimental findings prove to be markedly better than the traditional tools that minimize the time of SBOM generation by 78% and maximize completeness by 34%. The system has managed to point out 2,847 untested faiths in enterprise codebases and has accordingly classified 96.2 percent of software licenses. We find the results that AI-powered SBOM systems do not just improve the security posture but also facilitate compliance processes, so they must be part of present-day DevSecOps. This study is relevant to the developing body of AI-enhanced software supply chain security.

#### 15. Advancing Automotive Software Supply Chain Security: A Blockchain-Reproducible Build Approach
**Authors**: Aideyan, Pesé, Brooks
**Year**: 2025
**DOI**: 10.4271/2025-01-0456
**URL**: https://doi.org/10.4271/2025-01-0456

**Abstract**: The automotive industry’s systems and over-the-air (OTA) updates have vulnerabilities in its software supply chain (SSC). Although frameworks like Uptane have improved OTA security, gaps remain in ensuring software integrity and provenance. In this paper, we examine challenges securing the automotive SSC and introduce a framework, GUIXCHAIN, that integrates version control, reproducible builds, blockchain technology, and software bills of materials (SBoMs) for transparency, auditability, and resilience. Reproducible builds guarantee identical resulting binaries when compiling the same source code in different environments, as any deviation in the final output indicates a potential compromise in the build process, such as malware injection. Our preliminary study shows Guixchain’s use of reproducible builds ensures consistent and integrity-secured software across various build environments. The blockchain provides forensic capabilities, offering a history of the what, who and where of discrepancies within the SSC process. SBoMs provide an inventory of the software components used. Our preliminary study demonstrates that Guixchain effectively mitigates risks such as ransomware, unauthorized modifications, and build server compromises, reinforcing the system’s integrity and resilience throughout the software life cycle. Future work will focus on the full implementation of Guixchain and a comprehensive evaluation of its performance in real-world automotive software supply chain scenarios.

#### 16. Prototyping with Prompts: Emerging Approaches and Challenges in Generative AI Design for Collaborative Software Teams
**Authors**: Subramonyam, Thakkar, Ku, Dieber, Sinha
**Year**: 2024
**DOI**: 10.1145/3706598.3713166
**URL**: https://doi.org/10.1145/3706598.3713166

**Abstract**: Generative AI models are increasingly being integrated into human task workflows, enabling the production of expressive content across a wide range of contexts. Unlike traditional human-AI design methods, the new approach to designing generative capabilities focuses heavily on prompt engineering strategies. This shift requires a deeper understanding of how collaborative software teams establish and apply design guidelines, iteratively prototype prompts, and evaluate them to achieve specific outcomes. To explore these dynamics, we conducted design studies with 39 industry professionals, including UX designers, AI engineers, and product managers. Our findings highlight emerging practices and role shifts in AI system prototyping among multistakeholder teams. We observe various prompting and prototyping strategies, highlighting the pivotal role of to-be-generated content characteristics in enabling rapid, iterative prototyping with generative AI. By identifying associated challenges, such as the limited model interpretability and overfitting the design to specific example content, we outline considerations for generative AI prototyping.

#### 17. Automated Code Review in Practice
**Authors**: Cihan, Haratian, Içöz, Gül, Devran, Bayendur, Uçar, Tüzün
**Year**: 2025
**DOI**: 10.1109/ICSE-SEIP66354.2025.00043
**URL**: https://doi.org/10.1109/ICSE-SEIP66354.2025.00043

**Abstract**: Context: Code review is a widespread practice among practitioners to improve software quality and transfer knowledge. It is often perceived as time-consuming due to the need for manual effort and potential delays in the development process. Several AI-assisted code review tools (Qodo, GitHub Copilot, Coderabbit, etc.) provide automated code reviews using large language models (LLMs). The overall effects of such tools in the industry setting are yet to be examined. Objective: This study examines the impact of LLM-based automated code review tools in an industry setting. Method: The study was conducted within an industrial software development environment that adopted an AI-assisted code review tool (based on open-source Qodo PR Agent). 238 practitioners across ten projects had access to the tool. We focused our analysis on three projects, which included $\mathbf{4, 3 3 5}$ pull requests, 1,568 of which underwent automated reviews. Our data collection comprised three sources: (1) a quantitative analysis of pull request data, including comment labels indicating whether developers acted on the automated comments, (2) surveys sent to developers regarding their experience with the reviews on individual pull requests, and (3) a broader survey of 22 practitioners capturing their general opinions on automated code reviews. Results: $73.8\%$ of automated code review comments were labeled as resolved. However, the overall average pull request closure duration increased from five hours 52 minutes to eight hours 20 minutes, with varying trends observed across different projects. According to survey responses, most practitioners observed a minor improvement in code quality as a result of automated code reviews. Conclusion: The LLM-based automated code review tool proved useful in software development, enhancing bug detection, increasing awareness of code quality, and promoting best practices. However, it also led to longer pull request closure times and introduced drawbacks such as faulty reviews, unnecessary corrections, and irrelevant comments. Based on these findings, we discussed how practitioners can more effectively utilize automated code review technologies.

#### 18. With Great Power Comes Great Responsibility: The Role of Software Engineers
**Authors**: Betz, Penzenstadler
**Year**: 2024
**DOI**: 10.1145/3715112
**URL**: https://doi.org/10.1145/3715112

**Abstract**: The landscape of Software Engineering evolves rapidly amidst digital transformation and the ascendancy of AI, leading to profound shifts in the role and responsibilities of Software Engineers. This evolution encompasses both immediate changes, such as the adoption of Large Language Model-based approaches to coding, and deeper shifts driven by the profound societal and environmental impacts of technology. Despite the urgency, there persists a lag in adapting to these evolving roles. This roadmap article proposes 10 research challenges to develop a new generation of Software Engineers equipped to navigate the technical and social complexities as well as ethical considerations inherent in their evolving profession. Furthermore, the challenges target role definition, integration of AI, education transformation, standards evolution, and impact assessment to equip future Software Engineers to skillfully and responsibly handle the obstacles within their transforming discipline.

#### 19. Human Digital Twin in Industry 5.0: A Holistic Approach to Worker Safety and Well-Being through Advanced AI and Emotional Analytics
**Authors**: Davila-Gonzalez, Martín
**Year**: 2024
**DOI**: 10.3390/s24020655
**URL**: https://doi.org/10.3390/s24020655

**Abstract**: This research introduces a conceptual framework designed to enhance worker safety and well-being in industrial environments, such as oil and gas construction plants, by leveraging Human Digital Twin (HDT) cutting-edge technologies and advanced artificial intelligence (AI) techniques. At its core, this study is in the developmental phase, aiming to create an integrated system that could enable real-time monitoring and analysis of the physical, mental, and emotional states of workers. It provides valuable insights into the impact of Digital Twins (DT) technology and its role in Industry 5.0. With the development of a chatbot trained as an empathic evaluator that analyses emotions expressed in written conversations using natural language processing (NLP); video logs capable of extracting emotions through facial expressions and speech analysis; and personality tests, this research intends to obtain a deeper understanding of workers’ psychological characteristics and stress levels. This innovative approach might enable the identification of stress, anxiety, or other emotional factors that may affect worker safety. Whilst this study does not encompass a case study or an application in a real-world setting, it lays the groundwork for the future implementation of these technologies. The insights derived from this research are intended to inform the development of practical applications aimed at creating safer work environments.

### From Gemini Grounded (3 citations)

#### 1. deloitte.com
**Authors**: Deloitte
**Year**: 2026
**DOI**: 
**URL**: https://www.deloitte.com/us/en/insights/industry/technology/how-can-organizations-develop-quality-software-in-age-of-gen-ai.html

#### 2. mit.edu
**Authors**: MIT
**Year**: 2026
**DOI**: 
**URL**: https://dspace.mit.edu/bitstream/handle/1721.1/157323/3643795.3648394.pdf?sequence=1&isAllowed=y

#### 3. mit.edu
**Authors**: MIT
**Year**: 2026
**DOI**: 
**URL**: https://news.mit.edu/2025/explained-generative-ai-environmental-impact-0117

---

## Failed Topics

The following topics did not return valid citations:

- human-AI pair programming effectiveness
- empirical analysis of ChatGPT in software maintenance
- security vulnerabilities in AI-generated code
- title:"prompt engineering" for software developers
- ethnographic study of AI tools in industry
- Stack Overflow developer survey 2023 AI usage
- Forrester report AI coding assistants
- GitHub Octoverse 2023 AI adoption
- JetBrains state of developer ecosystem AI
- IEEE software generative AI standards
- ACM TOSEM large language models
- ICSE 2024 generative AI papers
- FSE 2023 AI-assisted software development
- ASE conference large language models
- IEEE TSE empirical study copilot
- CHI conference developer experience AI
- NIST AI risk management framework software
- OWASP top 10 for large language models
- ISO/IEC standards for AI in software engineering
- European Commission AI act software development
- OECD framework for AI in the workplace
- impact of LLMs on junior vs senior developers
- cognitive load in AI-assisted programming
- AI tools for software testing and QA
- automated unit test generation using LLMs
- using generative AI for code documentation
- refactoring code with large language models
- technical debt in AI-generated code
- legal implications of AI-generated code copyright
- corporate governance of generative AI in coding
- title:"Copilot" empirical evaluation
- title:"GPT-4" software engineering capabilities
- Google DeepMind coding agents research
- title:"CodeLlama" performance analysis
- Microsoft Research developer productivity AI
- Meta research code generation LLMs
- systematic mapping study generative AI SE
- qualitative study software engineers AI adoption
- survey of practitioners on generative AI
- barriers to adopting generative AI in software industry
- integration of LLMs into IDEs
- security risks of LLM code suggestions
- generative AI for software architecture design
- AI-driven requirements engineering
- debugging with large language models
- mitigating hallucinations in code generation
- Capgemini research institute generative AI engineering
- Accenture technology vision generative AI
- PwC emerging technology software development
- IDC marketscape AI code assistants
- KPMG generative AI software lifecycle
- Brookings Institution AI workforce impact
- software engineering ethics generative AI
- RAND Corporation AI software safety
- World Economic Forum jobs of tomorrow large language models
- impact of AI on software engineering education vs industry
- skill degradation due to AI coding assistants
- onboarding developers with generative AI
- measuring DevEx with AI tools
- SPACE framework and generative AI
- DORA metrics impact of generative AI
- title:"DevOps" and generative AI
- SQL query generation using LLMs
- infrastructure as code generation LLM
- continuous integration with AI agents
- API documentation generation AI
- legacy code modernization with generative AI
- translating programming languages with LLMs
- fine-tuning LLMs for proprietary codebases
- context window limitations in code generation
- gender bias in AI code generation
- detecting AI-generated code plagiarism
- impact of AI on code review process