https://app.arize.com
+-- Manger_Agents = Alyx
    +-- Observe
    +-- Evaluate
    +-- Improve
  
  

  
    Lessons
+-- Stay on Task
    +-- Planning = Lives outside Chat
        +-- Tools
            +-- todo_read
            +-- todo_write
            +-- todo_update
        +-- States
            +-- blocked
            +-- pending
            +-- in_progress
            +-- completed
    

  
  

  
    Lessons
+-- Context Management
    +-- LargeJson
        +-- Context in case needed later
        +-- Keep Structure
            +-- Compress value
    +-- Tools
        +-- Small
        +-- Composable
        +-- Examples
            +-- 1. jq = query conditional
                +-- .experiments[0].rows[:5]
                +-- [.rows[] | select(.eval_score < 0.5) ]
                +-- [.rows[].latency_ms] | add / length
            +-- 2. grep_json = search patterns
                +-- grep_json pattern="error"
    +-- Budget 
        +-- Hard Budget on Token / tool output
    +-- Exceptions
        +-- Recoverable
    +-- Tool response contains customer data
        

  
  

  
    Lessons
+-- Crystallizing Good Behavior
    +-- Make a change, will it break?
        +-- Vibe Checking = Bad
            +-- Does not scale
        +-- Production Traces = Good
            Tool-Chat-Tool-Chat
    +-- Testing
        +-- Level 1: Decision-point Tests
            contains_any = [
               ["2000ms", "2.0 seconds", "two seconds"],
               ["OpenAIChat.invoke", "LLM span"]
            ]
        +-- Level 2: Trajectory Tests
            +-- Evaluation
                +-- Output: Score + Explanation
                    +-- Judge_LLM
                        +-- Input: Dataset + Template
        +-- Level 3: CI + Prompt Evaluation
            +-- Arize, dogfood
  
  

  
    Alyx
+-- Plan_Current
    +-- Sort_LLM_spans_by_Latency
    +-- Identify_bottlenacks
    +-- Suggest_improvements
    +-- Call todo_update() upon completion
        +-- The finish gate
            +-- All to-do's
                +-- Complete 
                +-- Mark as blocked
                    +-- Human
+-- Best practices
    +-- Prompt --> Enforce in Code
    +-- Examples --> Clarity
    +-- Plan != todo_write
    +-- Teach agent to plan
  
  

  
    The Evolution of Software Engineering
Who consumes telemetry data?

Phase 1: Software 1.0
   App --[telemetry]--> Obs_Platform --[reads]--> Dev_Human 
   --[write_code]--> IDE --[deploy]--> App

Phase 2: Software 2.0
   App --[telemetry]--> Obs_Platform --[reads]--> Dev_Human 
   --[prompt]--> IDE_Agentic
                 +-- Agent_Coding (Claude, Cursor)
   --[write_code]--> IDE --[deploy]--> App
   --[deploy]--> App

Phase 2 to Phase 3:
   Obs_Platform evolution: 
   +-- Fr: Human Dashboard
   +-- To: Programmatic interfaces that agents can consume

Phase 3: Autonomous: Agent consumes telemetry data
   App 
   --[telemetry]--> Obs_Platform 
   --[reads]------> Agent_Coding (Claude, Cursor)      --[notify]--> Dev_Mgr_Human
   --[deploy]-----> App
                    Iterate
   --[telemetry]--> Obs_Platform 
   --[reads]------> Agent_Coding (Claude, Cursor)      --[notify]--> Dev_Mgr_Human
   --[deploy]-----> App

  
  

  
    App_Agentic
+-- Arize_AX = [Traces, Eval, Feedback]
    +-- Skills_Arize_AX
        --[Query, Correlate, Reason]------------------> Agent_Coding

        Agent_Coding
        --[dev_chg, PR = Pull (me) Req, App Restart]--> Code_Base
        --> Re-run workload
        --> Test
        --> User_Journey 
        --> Agent_Coding

  
    Skills_Arize_AX
   arize-instrumentation
   arize-trace
   arize-dataset
   arize-experiment
   arize-evaluator
   arize-ai-provider-integration
   arize-annotation
   arize-prompt-optimization
   arize-link

  
    AI Engineer Loop

Agent: Data_Observe      : Traces, Spans      : Arize_AX
Agent: Data_Annotate     : Annotations, Notes : Arize_AX
Agent: Eval_Build        : Annotations, Notes : Arize_AX
Agent: Hypothesize       : Trace_Analysis     : Arize_AX
Agent: Experiment_Design : Experiments, Data  : Arize_AX 
Agent: Experiment_Run    : Experiments, Data  : Arize_AX 
Agent: Outcome_Measure   : Eval Results       : Arize_AX 
Agent: Errors_Analyze    : Errors             : Arize_AX 
Agent: Apply_or_Iterate  : Evals_Online       : Arize_AX

  
    Lessons
+-- Debug agent

  
    https://www.philschmid.de/building-agents-interactions-api
https://github.com/philschmid/gemini-samples/blob/main/examples/interactions-build-agents.ipynb

  
    Code

Objective
   Control Flow   
      Static
         Predefined

  
    Agents

Objective
   Control Flow
      Dynamic
         LLM = Large Language Model

  
    Agents
+-- Model   = Brain       = Reason, Plan
+-- Context = Memory      = Remember
+-- Tools   = Eyes, Hands = Read, Write, Search, Call
+-- Loop    = Life        = Repeat
                            +-- Observe
                            +-- Think
                            +-- Act
                            
  
  

  
    Workflow

User Prompt
+-- Context
    +-- Model: Think, Decide, Select Tool
        +-- Client: Execute Tool
            +-- Tool: Result ------------------> Feedback to Model (Repeat)
    +-- Model: Respond