Use isolated database environments to test and track AI agent behavior and performance. Create separate database schemas for testing different agent configurations and compare results using natural language queries.
How it works
GibsonAI provides database environments where you can test different agent configurations, track their performance, and analyze behavior patterns. Create isolated databases for each test scenario and use natural language queries to analyze results.
MCP Integration
Connect through MCP server for agent testing
Database Management
Explore database management features
CLI Tools
Use the CLI for environment setup
Key Features
Isolated Testing Environments
- Separate Databases: Create isolated databases for each agent test
- Independent Schemas: Independent database schemas for different experiments
- Safe Testing: Test agent behavior without affecting production data
- Environment Comparison: Compare results across different test environments
Agent Performance Tracking
- Behavior Logging: Track agent actions and decisions in structured format
- Performance Metrics: Store and analyze agent performance data
- Response Tracking: Log agent responses and their effectiveness
- Error Monitoring: Track errors and failure patterns
Natural Language Analysis
- Query Testing Results: Use natural language to analyze test results
- Performance Comparison: Compare agent performance across different scenarios
- Behavior Analysis: Analyze agent behavior patterns and trends
- Results Reporting: Generate reports on agent testing outcomes
Use Cases
Agent Development
Perfect for:
- Testing new agent features and capabilities
- Validating agent behavior in different scenarios
- Comparing different agent configurations
- Debugging agent issues and problems
Performance Optimization
Enable:
- Identifying performance bottlenecks
- Testing different optimization strategies
- Measuring impact of configuration changes
- Validating performance improvements
Behavior Validation
Support:
- Ensuring agent responses are appropriate
- Testing edge cases and error handling
- Validating decision-making logic
- Confirming compliance with requirements
Implementation Examples
Setting Up Agent Testing Environment
# Using Gibson CLI to create agent testing database
# Create agent testing tables
# gibson modify agent_tests "Create agent_tests table with id, test_name, agent_config, environment, created_at"
# gibson modify agent_actions "Create agent_actions table with id, test_id, action_type, input_data, output_data, timestamp, duration"
# gibson modify agent_metrics "Create agent_metrics table with id, test_id, metric_name, value, timestamp"
# gibson modify test_results "Create test_results table with id, test_id, result_type, data, success, error_message"
# Generate models and apply changes
# gibson code models
# gibson merge
Agent Testing Framework
import requests
import json
from datetime import datetime
import time
class AgentTester:
def __init__(self, api_key, environment="test"):
self.api_key = api_key
self.environment = environment
self.base_url = "https://api.gibsonai.com/v1/-"
self.headers = {"Authorization": f"Bearer {api_key}"}
def create_test(self, test_name, agent_config):
"""Create a new agent test"""
test_data = {
"test_name": test_name,
"agent_config": agent_config,
"environment": self.environment,
"created_at": datetime.now().isoformat()
}
response = requests.post(
f"{self.base_url}/agent-tests",
json=test_data,
headers=self.headers
)
if response.status_code == 201:
test_record = response.json()
print(f"Created test: {test_name}")
return test_record["id"]
else:
print(f"Failed to create test: {response.status_code}")
return None
def log_agent_action(self, test_id, action_type, input_data, output_data, duration):
"""Log an agent action during testing"""
action_data = {
"test_id": test_id,
"action_type": action_type,
"input_data": input_data,
"output_data": output_data,
"timestamp": datetime.now().isoformat(),
"duration": duration
}
response = requests.post(
f"{self.base_url}/agent-actions",
json=action_data,
headers=self.headers
)
if response.status_code == 201:
return response.json()
else:
print(f"Failed to log action: {response.status_code}")
return None
def record_metric(self, test_id, metric_name, value):
"""Record a performance metric"""
metric_data = {
"test_id": test_id,
"metric_name": metric_name,
"value": value,
"timestamp": datetime.now().isoformat()
}
response = requests.post(
f"{self.base_url}/agent-metrics",
json=metric_data,
headers=self.headers
)
if response.status_code == 201:
return response.json()
else:
print(f"Failed to record metric: {response.status_code}")
return None
def log_test_result(self, test_id, result_type, data, success, error_message=None):
"""Log test result"""
result_data = {
"test_id": test_id,
"result_type": result_type,
"data": data,
"success": success,
"error_message": error_message
}
response = requests.post(
f"{self.base_url}/test-results",
json=result_data,
headers=self.headers
)
if response.status_code == 201:
return response.json()
else:
print(f"Failed to log result: {response.status_code}")
return None
Testing Different Agent Configurations
class AgentBehaviorTester:
def __init__(self, api_key):
self.tester = AgentTester(api_key)
def test_response_configurations(self):
"""Test different agent response configurations"""
# Test Configuration A: Conservative responses
config_a = {
"response_style": "conservative",
"confidence_threshold": 0.8,
"escalation_enabled": True
}
test_a_id = self.tester.create_test("Conservative Response Test", config_a)
# Test Configuration B: Assertive responses
config_b = {
"response_style": "assertive",
"confidence_threshold": 0.6,
"escalation_enabled": False
}
test_b_id = self.tester.create_test("Assertive Response Test", config_b)
# Run tests with same scenarios
test_scenarios = [
{"user_input": "I need help with my order", "expected_action": "order_lookup"},
{"user_input": "I want to cancel my subscription", "expected_action": "cancellation_process"},
{"user_input": "This product is defective", "expected_action": "refund_process"}
]
for scenario in test_scenarios:
# Test Configuration A
self.run_test_scenario(test_a_id, scenario, config_a)
# Test Configuration B
self.run_test_scenario(test_b_id, scenario, config_b)
def run_test_scenario(self, test_id, scenario, config):
"""Run a single test scenario"""
start_time = time.time()
# Simulate agent processing
try:
# Mock agent response based on configuration
if config["response_style"] == "conservative":
response = self.generate_conservative_response(scenario["user_input"])
else:
response = self.generate_assertive_response(scenario["user_input"])
duration = time.time() - start_time
# Log the action
self.tester.log_agent_action(
test_id,
"user_interaction",
scenario["user_input"],
response,
duration
)
# Record metrics
self.tester.record_metric(test_id, "response_time", duration)
self.tester.record_metric(test_id, "confidence_score", response.get("confidence", 0))
# Log result
success = response.get("action") == scenario["expected_action"]
self.tester.log_test_result(
test_id,
"scenario_test",
{"scenario": scenario, "response": response},
success
)
except Exception as e:
# Log error
self.tester.log_test_result(
test_id,
"scenario_test",
{"scenario": scenario, "error": str(e)},
False,
str(e)
)
def generate_conservative_response(self, user_input):
"""Generate conservative agent response"""
# Mock conservative response logic
return {
"response": "I'd be happy to help you with that. Let me connect you with a specialist.",
"action": "escalate",
"confidence": 0.9
}
def generate_assertive_response(self, user_input):
"""Generate assertive agent response"""
# Mock assertive response logic
return {
"response": "I can help you with that right away. Let me process your request.",
"action": "direct_action",
"confidence": 0.7
}
Analyzing Test Results
class TestResultAnalyzer:
def __init__(self, api_key):
self.api_key = api_key
self.base_url = "https://api.gibsonai.com/v1/-"
self.headers = {"Authorization": f"Bearer {api_key}"}
def compare_test_performance(self, test_a_name, test_b_name):
"""Compare performance between two tests"""
query_request = {
"query": f"Compare average response time and success rate between tests named '{test_a_name}' and '{test_b_name}'"
}
response = requests.post(
f"{self.base_url}/query",
json=query_request,
headers=self.headers
)
if response.status_code == 200:
results = response.json()
print("Test Performance Comparison:")
for result in results:
print(f" {result}")
return results
else:
print(f"Analysis failed: {response.status_code}")
return None
def analyze_agent_behavior_patterns(self, test_id):
"""Analyze agent behavior patterns in a test"""
query_request = {
"query": f"Analyze action types and response patterns for test ID {test_id}"
}
response = requests.post(
f"{self.base_url}/query",
json=query_request,
headers=self.headers
)
if response.status_code == 200:
results = response.json()
print(f"Behavior Analysis for Test {test_id}:")
for result in results:
print(f" {result}")
return results
else:
print(f"Analysis failed: {response.status_code}")
return None
def get_error_analysis(self, test_id):
"""Get error analysis for a test"""
query_request = {
"query": f"Show all errors and failure patterns for test ID {test_id}"
}
response = requests.post(
f"{self.base_url}/query",
json=query_request,
headers=self.headers
)
if response.status_code == 200:
results = response.json()
print(f"Error Analysis for Test {test_id}:")
for result in results:
print(f" {result}")
return results
else:
print(f"Analysis failed: {response.status_code}")
return None
def generate_test_report(self, test_name):
"""Generate comprehensive test report"""
query_request = {
"query": f"Generate a comprehensive report for test '{test_name}' including performance metrics, success rates, and error analysis"
}
response = requests.post(
f"{self.base_url}/query",
json=query_request,
headers=self.headers
)
if response.status_code == 200:
results = response.json()
print(f"Test Report for {test_name}:")
for result in results:
print(f" {result}")
return results
else:
print(f"Report generation failed: {response.status_code}")
return None
A/B Testing Example
class ABTestingFramework:
def __init__(self, api_key):
self.tester = AgentTester(api_key)
self.analyzer = TestResultAnalyzer(api_key)
def run_ab_test(self, test_name, config_a, config_b, scenarios):
"""Run A/B test with two configurations"""
# Create tests for both configurations
test_a_id = self.tester.create_test(f"{test_name}_A", config_a)
test_b_id = self.tester.create_test(f"{test_name}_B", config_b)
# Run scenarios for both configurations
for scenario in scenarios:
# Test Configuration A
self.run_scenario_test(test_a_id, scenario, config_a)
# Test Configuration B
self.run_scenario_test(test_b_id, scenario, config_b)
# Analyze results
print(f"\nA/B Test Results for {test_name}:")
self.analyzer.compare_test_performance(f"{test_name}_A", f"{test_name}_B")
return test_a_id, test_b_id
def run_scenario_test(self, test_id, scenario, config):
"""Run a single scenario test"""
start_time = time.time()
try:
# Simulate agent processing based on configuration
response = self.simulate_agent_response(scenario, config)
duration = time.time() - start_time
# Log action
self.tester.log_agent_action(
test_id,
"scenario_test",
scenario,
response,
duration
)
# Record metrics
self.tester.record_metric(test_id, "response_time", duration)
self.tester.record_metric(test_id, "confidence_score", response.get("confidence", 0))
# Determine success
success = response.get("error") is None
# Log result
self.tester.log_test_result(
test_id,
"ab_test_scenario",
{"scenario": scenario, "response": response},
success,
response.get("error")
)
except Exception as e:
# Log error
self.tester.log_test_result(
test_id,
"ab_test_scenario",
{"scenario": scenario, "error": str(e)},
False,
str(e)
)
def simulate_agent_response(self, scenario, config):
"""Simulate agent response based on configuration"""
# Mock agent response logic
if config.get("response_style") == "detailed":
return {
"response": "I'll provide detailed help with your request...",
"confidence": 0.85,
"action": "detailed_response"
}
else:
return {
"response": "I can help with that.",
"confidence": 0.75,
"action": "brief_response"
}
Benefits for AI Agent Testing
Comprehensive Testing
- Isolated Environments: Test different configurations without interference
- Structured Data: Organized test data for easy analysis
- Natural Language Analysis: Query test results using natural language
- Performance Tracking: Track agent performance over time
Data-Driven Insights
- Behavior Analysis: Analyze agent behavior patterns and trends
- Performance Comparison: Compare different agent configurations
- Error Identification: Identify and analyze error patterns
- Optimization Guidance: Data-driven insights for agent improvement
Scalable Testing
- Multiple Environments: Test multiple configurations simultaneously
- Flexible Schema: Adapt database schema to different testing needs
- API Integration: Easy integration with existing testing workflows
- Automated Analysis: Automated analysis and reporting capabilities
Best Practices
Test Design
- Clear Objectives: Define clear testing objectives and success criteria
- Realistic Scenarios: Use realistic test scenarios that match production usage
- Controlled Variables: Control variables to isolate the impact of changes
- Comprehensive Coverage: Test edge cases and error scenarios
Data Management
- Consistent Logging: Log all relevant data consistently across tests
- Data Quality: Ensure high-quality test data for accurate analysis
- Version Control: Track changes to test configurations and scenarios
- Data Retention: Implement appropriate data retention policies
Analysis and Reporting
- Regular Analysis: Regularly analyze test results for insights
- Comparative Analysis: Compare results across different configurations
- Trend Analysis: Track performance trends over time
- Actionable Insights: Focus on actionable insights for improvement
Getting Started
- Design Test Schema: Define your agent testing database schema
- Create Test Environment: Set up isolated database for testing
- Implement Testing Framework: Create framework for logging test data
- Run Tests: Execute tests with different agent configurations
- Analyze Results: Use natural language queries to analyze results
Gibson CLI Commands
# Create agent testing schema
gibson modify table_name "description of testing table"
gibson code models
gibson merge
# Generate models for testing integration
gibson code models
gibson code schemas
# Reset testing environment
gibson forget last
gibson build datastore
Ready to set up database environments for testing AI agent behavior? Get started with GibsonAI.