Overview: This paper looks at how systems engineering practices can be built into the Test and Evaluation (T&E) process for generative AI (GenAI) applications so as to make them work better, more reliably, and in accordance with standards. The reader will have a good understanding of techniques and best practices for testing GenAI systems.
Context: Generative AI has a lot of potential in various industries enabling automation and optimisation of multiple tasks. However, deploying GenAI in real-world, resource-constrained, and real-time critical applications poses challenges in ensuring performance, reliability, and safety. Established knowledge points to the need for strong testing frameworks that address these challenges.
Purpose: This research study is aimed at understanding how systems engineering practices could be adopted while performing T&E of GenAI systems. It seeks to review and establish effective practices and tooling that influence the reliability and functionality of GenAI application within practical settings where resources are constrained and high reliability and safety is paramount.
Approach: To this end we review well-known systems engineering methodologies vis-a-vis their relevance to T&E processes. In addition to examining case studies, industry reports, academic literature were scrutinised. There was an emphasis on modifying these methods to suit GenAI applications, while focusing on using such practices to establish rigorous testing and evaluation methodologies to ensure success in applying generative AI.