How to Build Trusted AI Agents for Platform Engineers

Speaker Aaron Yang, AI Product Manager at Stackgen

Channel Platform Engineering

Event PlatformCon 2025

Date June 25, 2025

Duration 16:35

Platform Engineering Trust AI Agents PlatformCon

TL;DR

Aaron Yang presents a practical framework for building enterprise-ready AI agents that platform engineers can trust. The talk covers four pillars of trust - execution control, AI decision reliability, data security, and compliance - along with concrete mechanisms like RBAC integration, levels of autonomy, LLM gateways, and pre-production evaluation using mathematical metrics and LLM-as-a-judge techniques.

Key Takeaways

Four Pillars of Trust: Execution control (RBAC, human approvals), AI decision reliability (deterministic output, explainability), data security (no training on customer data, PII redaction), and compliance/audit (policy-aware remediations)
Levels of Autonomy: Agents can operate at three risk levels - Level 1 requires human approval at every step, Level 2 auto-executes low-risk actions, Level 3 auto-executes medium-risk actions but requires approval for infrastructure changes
Knowledge Plane Architecture: All agent actions must flow through a knowledge plane that enforces authorization, RBAC, and MCP server integrations - acting as the "limbs" of the agent
Identity Assumption for RBAC: When an on-call engineer uses the agent, it assumes their identity - preventing privilege escalation attacks
LLM Gateway Protection: Input/output screening using LLM-as-a-judge to detect prompt injection and filter sensitive data
Pre-Production Evaluation: Uses Jaccard similarity for determinism, TF-IDF for semantic understanding, and needle-in-haystack tests for retrieval capabilities
No Training on Customer Data: Prevents data leakage through model fine-tuning that could expose customer information via prompt manipulation

Summary

Context: Stackular - An Infrastructure-Aware Incident Agent

Aaron Yang introduces Stackular, an AI copilot for on-call engineers that performs root cause analysis and suggests compliant remediation steps. The agent ingests data from Kubernetes, Terraform, cloud topology, logs, APM, and unstructured sources like runbooks and tickets.

The Core Challenge

When building AI agents for infrastructure, you must answer critical questions: How do you ensure the agent won't go haywire with a 0% (not 0.1%) chance of catastrophic changes? How do you include human-in-the-loop gates? How do you enforce least privilege? How do you test before production?

Two-Plane Architecture

Knowledge Plane: Acts as both the data ingestion layer and execution control plane. Includes MCP servers, RBAC enforcement, access control rules, and identity management. This plane is the only way the agent can interact with external systems - SSH into servers, access CloudWatch, query DataDog, or any other external tool.

Agent Control Plane: Contains the intelligence layer - LLMs, memory, personalization, MCP clients, and session data. Must route all external access through the knowledge plane with proper authorization.

Four Pillars of Trust

1. Execution Control: RBAC integration, explicit human approval gates, one-click rollback capability, and whitelisted actions with least privilege enforcement. High-risk commands like rm are automatically categorized for additional approval.

2. AI Decision Reliability: Deterministic output verification, explainability, read-only enforcement modes, LLM gateway for input/output inspection, and behavior learning from history.

3. Data Security & Privacy: Customer data is never used for model training (prevents extraction via crafted prompts). PII is redacted before reaching models.

4. Compliance & Audit: Remediation suggestions are policy-aware (GDPR, NIST). Full audit trail integration and change management integration ensure all actions are traceable.

Trust and Safety Mechanics

RBAC with Identity Assumption: When an engineer uses Stackular, the agent assumes their identity for all actions. This prevents the agent from being used as a privilege escalation tool and ensures audit trails reflect the actual operator.

Levels of Autonomy:

Level 1: Read-only/low-risk actions only, human approval at every step
Level 2: Autonomous low-risk actions, approval required for medium/high-risk
Level 3: Autonomous low/medium-risk, approval only for infrastructure changes

LLM Gateway: All inputs are screened by a smaller model to detect prompt injection attacks and validate relevance. PII and sensitive data are redacted. Outputs are similarly filtered before presentation.

Pre-Production Evaluation

The talk presents concrete evaluation metrics combining code-based mathematical evaluation with LLM-as-a-judge:

Jaccard Similarity: Mathematical comparison of response consistency for determinism testing
TF-IDF Vector Space: Evaluates semantic understanding by reconstructing output vector space
Needle-in-Haystack Tests: Measures information retrieval capabilities

Notable Quotes

"How do we ensure that the agent won't go haywire and that there's not a 0.1% chance but a 0% chance that the agent will execute some sort of catastrophic change?"

"The knowledge plane essentially is the limbs of the agent - whatever the agent wants to do, whether that's SSH into the server to read logs or go to CloudWatch, it has to go through the knowledge plane."

"If an on-call engineer is actively using Stackular, it assumes the identity of that operator when it conducts any action on their behalf."

"There's a lot of work that has to be done on the trust and safety side, and there's some level of math as well to determine if you're building the right thing."

References

Products & Technologies

Stackular/Stackgen: The product and company Aaron works for
MCP (Model Context Protocol): Used for tool integration and server communication
Kubernetes, Terraform, CloudWatch, DataDog: Infrastructure and observability tools

Compliance Frameworks

GDPR: European data protection regulation
NIST: Security and compliance framework

Evaluation Techniques

LLM-as-a-judge: Using a smaller LLM to evaluate another model's outputs
Jaccard Similarity: Mathematical technique for measuring output consistency
TF-IDF: Term Frequency-Inverse Document Frequency for semantic analysis
Needle-in-Haystack Test: Evaluation method for information retrieval