Paperless Presentation Accepted (Full Paper declined) Systems Engineering Test & Evaluation Conference 2024

Enhancing GenAI: Leveraging Systems Engineering for Rigorous Test & Evaluation (20743)

Tanya Dixit 1
  1. Google, Haymarket, NSW, Australia
  • Overview: This paper looks at how systems engineering practices can be built into the Test and Evaluation (T&E) process for generative AI (GenAI) applications so as to make them work better, more reliably, and in accordance with standards. The reader will have a good understanding of techniques and best practices for testing GenAI systems.

  • Context: Generative AI has a lot of potential in various industries enabling automation and optimisation of multiple tasks. However, deploying GenAI in real-world, resource-constrained, and real-time critical applications poses challenges in ensuring performance, reliability, and safety. Established knowledge points to the need for strong testing frameworks that address these challenges.

  • Purpose: This research study is aimed at understanding how systems engineering practices could be adopted while performing T&E of GenAI systems. It seeks to review and establish effective practices and tooling that influence the reliability and functionality of GenAI application within practical settings where resources are constrained and high reliability and safety is paramount.

  • Approach: To this end we review well-known systems engineering methodologies vis-a-vis their relevance to T&E processes. In addition to examining case studies, industry reports, academic literature were scrutinised. There was an emphasis on modifying these methods to suit GenAI applications, while focusing on using such practices to establish rigorous testing and evaluation methodologies to ensure success in applying generative AI.

  • Insights: GenAI T&E processes can be significantly improved by integrating systems engineering practices into them, the investigation shows. Important findings include the significance of a thorough understanding of requirements, automated and manual testing approaches, and continuous monitoring as well as feedback mechanisms. These findings will guide the development of dependable and efficient GenAI applications that are destined to fit well with intricate real-world use cases.