Below are some of my notes and insights from Prof Simon Burton’s presentation, Safety Under Uncertainty (youtube, slides). This presentation was part of the Safety Critical System’s Club Seminar titled Safe Autonomous Transport – the Good, the Bad and the Ugly.
Why We Need ISO PAS 8800
Traditional safety standards, such as ISO 26262 (Automotive Functional Safety) and ISO 21448 (SOTIF), were designed for deterministic systems with predictable behaviors. ISO 26262 focuses on hardware and software faults, where faults can be traced, diagnosed, and mitigated through rigorous testing. SOTIF, on the other hand, addresses risks arising when systems perform as intended but fall short under certain conditions or the system is reaching performance limits—often due to insufficient requirements or design limitations
Robotics systems are becoming increasingly complex as they move from deterministic, rules-based architectures to machine learning (ML)-driven designs capable of handling unstructured and unpredictable environments. However, modern AI-enabled systems introduce probabilistic behaviors and performance uncertainties that these older frameworks cannot fully address. ISO PAS 8800, Road Vehicles: Safety and Artificial Intelligence, bridges this gap by evaluating AI components’ contributions to system safety. It provides a structured process to integrate AI safely into larger systems by accounting for model performance, data quality, and operational uncertainty.
There is no safe AI, only safe systems. The new standard, ISO PAS 8800, is intended to connect safety arguments at the systems level and the performance of individual AI-based components. It defines a structured approach to evaluate AI-enabled systems, ensuring they meet safety requirements through iterative testing and validation cycles.
ISO PAS 8800’s Three Primary Iteration Cycles
The new standard provides a framework of three primary iteration cycles for ensuring AI based components meet the desired safety requirements.
- Requirements Evaluation: The first cycle focuses on assessing whether AI components meet initial safety requirements. If performance falls short, either the model must improve, or requirements need adjustment. Engineers must also characterize the AI component’s limitations to enable system-level compensations.
- Assurance Arguments: This cycle emphasizes providing an expression of confidence that the system meets safety requirements, akin to Safety Integrity Levels. Engineers must address:
- The adequacy of requirements and input space definitions.
- Dataset completeness and accuracy.
- Performance metrics of the AI model.
- Validation Through Field Operations: The final iteration loop occurs when the system is deployed to the field. Additional deficiencies can be identified or changes in the operating environment detected.
When discussing the assurance argument, a specific example is provided of a requirement for the system to recognize construction signs. In order to meet this requirement, several criteria must be met to include
- Acceptance criteria: recall rate of 98% on construction signs
- Assurance confidence: recall rate should be estimated with 95% confidence and second order uncertainty of < 1%
Engineering can provide an initial estimate of the model’s recall rate from the model’s performance metrics but metrics alone aren’t enough. We must then account for uncertainty in the dataset such as gaps in the coverage of the design domain or label accuracy errors. Synthetic datasets, with their own sources of uncertainty, could provide additional independent evidence. Combining these multiple forms of evidence could further reduce uncertainty and support the assurance argument.
Frameworks for Complexity and Uncertainty
The standard defines uncertainty as “any deviation from the unachievable ideal of completely deterministic knowledge of the relevant system”. Therefore, there’s a relationship between the complexity of the system and our ability to make definitive statements about the safety of the system. Prof Burton presents a model with three layers to describe the sources of uncertainty
- Environmental Uncertainty: The inherent unpredictability of the real world makes it impossible to model every scenario
- Observational Uncertainty: Sensors capture limited and noisy data, which constrains the quality of training and validation datasets.
- Model Uncertainty: Even with perfect data, AI models may underperform due to architectural limitations, parameter constraints, or flawed optimization functions.
Prof Burton makes an interesting point that these layers all compound on each other.
If we don’t understand the world we operating in how are we going to come up with the right set of training data or test data right if we don’t have the right set of training data how are we going to expect to have the right model or how are we going to be able to test and demonstrate that the model is correct.
This points to the strategic importance of the ability to characterize the operating domain via log mining and tagging infrastructure.
Data as a First Class Citizen
A critical aspect of ISO PAS 8800 is treating the dataset as a ”first class citizen” of the safety engineering process. This includes developing a clear definition of safety requirements, having traceability, and performing verification and safety analysis on the dataset. The slides list some common dataset errors
- Lack of coverage of the input space
- Lack of representation of safety-relevant edge cases
- Distribution does not match the target input space
- Dependencies on the data acquisition method (e.g. camera type,
- geographic, temporal dependencies)
- Data fidelity (e.g., sensor noise, accuracy of synthetic data)
- Errors in the meta-data / labelling
- Lack of independence between training and verification datasets
This is an area I’d like to learn more about as the standard is more publicly available.
Moving Forward with AI Safety
The ISO 8800 framework represents a significant step toward addressing the complexities of AI safety in robotics. By emphasizing iterative testing, assurance arguments, and data integrity, it provides a roadmap for systematically evaluating AI components and mitigating risks. However, its implementation requires a cultural shift in the industry—moving from deterministic validation to probabilistic assurance backed by confidence metrics.
For robotics engineers, this means adopting new tools and methodologies for characterizing uncertainty, refining datasets, and integrating diverse forms of evidence to strengthen safety cases. While challenges remain, the insights from ISO 8800 highlight a promising direction for building safer AI-enabled robotics systems.
