https://app.arize.com
+-- Manger_Agents = Alyx
+-- Observe
+-- Evaluate
+-- Improve
Lessons
+-- Stay on Task
+-- Planning = Lives outside Chat
+-- Tools
+-- todo_read
+-- todo_write
+-- todo_update
+-- States
+-- blocked
+-- pending
+-- in_progress
+-- completed
Lessons
+-- Context Management
+-- LargeJson
+-- Context in case needed later
+-- Keep Structure
+-- Compress value
+-- Tools
+-- Small
+-- Composable
+-- Examples
+-- 1. jq = query conditional
+-- .experiments[0].rows[:5]
+-- [.rows[] | select(.eval_score < 0.5) ]
+-- [.rows[].latency_ms] | add / length
+-- 2. grep_json = search patterns
+-- grep_json pattern="error"
+-- Budget
+-- Hard Budget on Token / tool output
+-- Exceptions
+-- Recoverable
+-- Tool response contains customer data
Lessons
+-- Crystallizing Good Behavior
+-- Make a change, will it break?
+-- Vibe Checking = Bad
+-- Does not scale
+-- Production Traces = Good
Tool-Chat-Tool-Chat
+-- Testing
+-- Level 1: Decision-point Tests
contains_any = [
["2000ms", "2.0 seconds", "two seconds"],
["OpenAIChat.invoke", "LLM span"]
]
+-- Level 2: Trajectory Tests
+-- Evaluation
+-- Output: Score + Explanation
+-- Judge_LLM
+-- Input: Dataset + Template
+-- Level 3: CI + Prompt Evaluation
+-- Arize, dogfood
Alyx
+-- Plan_Current
+-- Sort_LLM_spans_by_Latency
+-- Identify_bottlenacks
+-- Suggest_improvements
+-- Call todo_update() upon completion
+-- The finish gate
+-- All to-do's
+-- Complete
+-- Mark as blocked
+-- Human
+-- Best practices
+-- Prompt --> Enforce in Code
+-- Examples --> Clarity
+-- Plan != todo_write
+-- Teach agent to plan
The Evolution of Software Engineering
Who consumes telemetry data?
Phase 1: Software 1.0
App --[telemetry]--> Obs_Platform --[reads]--> Dev_Human
--[write_code]--> IDE --[deploy]--> App
Phase 2: Software 2.0
App --[telemetry]--> Obs_Platform --[reads]--> Dev_Human
--[prompt]--> IDE_Agentic
+-- Agent_Coding (Claude, Cursor)
--[write_code]--> IDE --[deploy]--> App
--[deploy]--> App
Phase 2 to Phase 3:
Obs_Platform evolution:
+-- Fr: Human Dashboard
+-- To: Programmatic interfaces that agents can consume
Phase 3: Autonomous: Agent consumes telemetry data
App
--[telemetry]--> Obs_Platform
--[reads]------> Agent_Coding (Claude, Cursor) --[notify]--> Dev_Mgr_Human
--[deploy]-----> App
Iterate
--[telemetry]--> Obs_Platform
--[reads]------> Agent_Coding (Claude, Cursor) --[notify]--> Dev_Mgr_Human
--[deploy]-----> App
App_Agentic
+-- Arize_AX = [Traces, Eval, Feedback]
+-- Skills_Arize_AX
--[Query, Correlate, Reason]------------------> Agent_Coding
Agent_Coding
--[dev_chg, PR = Pull (me) Req, App Restart]--> Code_Base
--> Re-run workload
--> Test
--> User_Journey
--> Agent_Coding
Skills_Arize_AX
arize-instrumentation
arize-trace
arize-dataset
arize-experiment
arize-evaluator
arize-ai-provider-integration
arize-annotation
arize-prompt-optimization
arize-link
AI Engineer Loop Agent: Data_Observe : Traces, Spans : Arize_AX Agent: Data_Annotate : Annotations, Notes : Arize_AX Agent: Eval_Build : Annotations, Notes : Arize_AX Agent: Hypothesize : Trace_Analysis : Arize_AX Agent: Experiment_Design : Experiments, Data : Arize_AX Agent: Experiment_Run : Experiments, Data : Arize_AX Agent: Outcome_Measure : Eval Results : Arize_AX Agent: Errors_Analyze : Errors : Arize_AX Agent: Apply_or_Iterate : Evals_Online : Arize_AX
Lessons +-- Debug agent
https://www.philschmid.de/building-agents-interactions-api https://github.com/philschmid/gemini-samples/blob/main/examples/interactions-build-agents.ipynb
Code
Objective
Control Flow
Static
Predefined
Agents
Objective
Control Flow
Dynamic
LLM = Large Language Model
Agents
+-- Model = Brain = Reason, Plan
+-- Context = Memory = Remember
+-- Tools = Eyes, Hands = Read, Write, Search, Call
+-- Loop = Life = Repeat
+-- Observe
+-- Think
+-- Act
Workflow
User Prompt
+-- Context
+-- Model: Think, Decide, Select Tool
+-- Client: Execute Tool
+-- Tool: Result ------------------> Feedback to Model (Repeat)
+-- Model: Respond

