Database environments for testing AI agent behavior

Use isolated database environments to test and track AI agent behavior and performance

Use isolated database environments to test and track AI agent behavior and performance. Create separate database schemas for testing different agent configurations and compare results using natural language queries.

How it works

GibsonAI provides database environments where you can test different agent configurations, track their performance, and analyze behavior patterns. Create isolated databases for each test scenario and use natural language queries to analyze results.

Key Features

Isolated Testing Environments

Separate Databases: Create isolated databases for each agent test
Independent Schemas: Independent database schemas for different experiments
Safe Testing: Test agent behavior without affecting production data
Environment Comparison: Compare results across different test environments

Agent Performance Tracking

Behavior Logging: Track agent actions and decisions in structured format
Performance Metrics: Store and analyze agent performance data
Response Tracking: Log agent responses and their effectiveness
Error Monitoring: Track errors and failure patterns

Natural Language Analysis

Query Testing Results: Use natural language to analyze test results
Performance Comparison: Compare agent performance across different scenarios
Behavior Analysis: Analyze agent behavior patterns and trends
Results Reporting: Generate reports on agent testing outcomes

Use Cases

Agent Development

Perfect for:

Testing new agent features and capabilities
Validating agent behavior in different scenarios
Comparing different agent configurations
Debugging agent issues and problems

Performance Optimization

Enable:

Identifying performance bottlenecks
Testing different optimization strategies
Measuring impact of configuration changes
Validating performance improvements

Behavior Validation

Support:

Ensuring agent responses are appropriate
Testing edge cases and error handling
Validating decision-making logic
Confirming compliance with requirements

Implementation Examples

Setting Up Agent Testing Environment

# Using Gibson CLI to create agent testing database
# Create agent testing tables
# gibson modify agent_tests "Create agent_tests table with id, test_name, agent_config, environment, created_at"
# gibson modify agent_actions "Create agent_actions table with id, test_id, action_type, input_data, output_data, timestamp, duration"
# gibson modify agent_metrics "Create agent_metrics table with id, test_id, metric_name, value, timestamp"
# gibson modify test_results "Create test_results table with id, test_id, result_type, data, success, error_message"

# Generate models and apply changes
# gibson code models
# gibson merge

Agent Testing Framework

import requests
import json
from datetime import datetime
import time

class AgentTester:
    def __init__(self, api_key, environment="test"):
        self.api_key = api_key
        self.environment = environment
        self.base_url = "https://api.gibsonai.com/v1/-"
        self.headers = {"Authorization": f"Bearer {api_key}"}

    def create_test(self, test_name, agent_config):
        """Create a new agent test"""
        test_data = {
            "test_name": test_name,
            "agent_config": agent_config,
            "environment": self.environment,
            "created_at": datetime.now().isoformat()
        }

        response = requests.post(
            f"{self.base_url}/agent-tests",
            json=test_data,
            headers=self.headers
        )

        if response.status_code == 201:
            test_record = response.json()
            print(f"Created test: {test_name}")
            return test_record["id"]
        else:
            print(f"Failed to create test: {response.status_code}")
            return None

    def log_agent_action(self, test_id, action_type, input_data, output_data, duration):
        """Log an agent action during testing"""
        action_data = {
            "test_id": test_id,
            "action_type": action_type,
            "input_data": input_data,
            "output_data": output_data,
            "timestamp": datetime.now().isoformat(),
            "duration": duration
        }

        response = requests.post(
            f"{self.base_url}/agent-actions",
            json=action_data,
            headers=self.headers
        )

        if response.status_code == 201:
            return response.json()
        else:
            print(f"Failed to log action: {response.status_code}")
            return None

    def record_metric(self, test_id, metric_name, value):
        """Record a performance metric"""
        metric_data = {
            "test_id": test_id,
            "metric_name": metric_name,
            "value": value,
            "timestamp": datetime.now().isoformat()
        }

        response = requests.post(
            f"{self.base_url}/agent-metrics",
            json=metric_data,
            headers=self.headers
        )

        if response.status_code == 201:
            return response.json()
        else:
            print(f"Failed to record metric: {response.status_code}")
            return None

    def log_test_result(self, test_id, result_type, data, success, error_message=None):
        """Log test result"""
        result_data = {
            "test_id": test_id,
            "result_type": result_type,
            "data": data,
            "success": success,
            "error_message": error_message
        }

        response = requests.post(
            f"{self.base_url}/test-results",
            json=result_data,
            headers=self.headers
        )

        if response.status_code == 201:
            return response.json()
        else:
            print(f"Failed to log result: {response.status_code}")
            return None

Testing Different Agent Configurations

class AgentBehaviorTester:
    def __init__(self, api_key):
        self.tester = AgentTester(api_key)

    def test_response_configurations(self):
        """Test different agent response configurations"""

        # Test Configuration A: Conservative responses
        config_a = {
            "response_style": "conservative",
            "confidence_threshold": 0.8,
            "escalation_enabled": True
        }

        test_a_id = self.tester.create_test("Conservative Response Test", config_a)

        # Test Configuration B: Assertive responses
        config_b = {
            "response_style": "assertive",
            "confidence_threshold": 0.6,
            "escalation_enabled": False
        }

        test_b_id = self.tester.create_test("Assertive Response Test", config_b)

        # Run tests with same scenarios
        test_scenarios = [
            {"user_input": "I need help with my order", "expected_action": "order_lookup"},
            {"user_input": "I want to cancel my subscription", "expected_action": "cancellation_process"},
            {"user_input": "This product is defective", "expected_action": "refund_process"}
        ]

        for scenario in test_scenarios:
            # Test Configuration A
            self.run_test_scenario(test_a_id, scenario, config_a)

            # Test Configuration B
            self.run_test_scenario(test_b_id, scenario, config_b)

    def run_test_scenario(self, test_id, scenario, config):
        """Run a single test scenario"""
        start_time = time.time()

        # Simulate agent processing
        try:
            # Mock agent response based on configuration
            if config["response_style"] == "conservative":
                response = self.generate_conservative_response(scenario["user_input"])
            else:
                response = self.generate_assertive_response(scenario["user_input"])

            duration = time.time() - start_time

            # Log the action
            self.tester.log_agent_action(
                test_id,
                "user_interaction",
                scenario["user_input"],
                response,
                duration
            )

            # Record metrics
            self.tester.record_metric(test_id, "response_time", duration)
            self.tester.record_metric(test_id, "confidence_score", response.get("confidence", 0))

            # Log result
            success = response.get("action") == scenario["expected_action"]
            self.tester.log_test_result(
                test_id,
                "scenario_test",
                {"scenario": scenario, "response": response},
                success
            )

        except Exception as e:
            # Log error
            self.tester.log_test_result(
                test_id,
                "scenario_test",
                {"scenario": scenario, "error": str(e)},
                False,
                str(e)
            )

    def generate_conservative_response(self, user_input):
        """Generate conservative agent response"""
        # Mock conservative response logic
        return {
            "response": "I'd be happy to help you with that. Let me connect you with a specialist.",
            "action": "escalate",
            "confidence": 0.9
        }

    def generate_assertive_response(self, user_input):
        """Generate assertive agent response"""
        # Mock assertive response logic
        return {
            "response": "I can help you with that right away. Let me process your request.",
            "action": "direct_action",
            "confidence": 0.7
        }

Analyzing Test Results

class TestResultAnalyzer:
    def __init__(self, api_key):
        self.api_key = api_key
        self.base_url = "https://api.gibsonai.com/v1/-"
        self.headers = {"Authorization": f"Bearer {api_key}"}

    def compare_test_performance(self, test_a_name, test_b_name):
        """Compare performance between two tests"""
        query_request = {
            "query": f"Compare average response time and success rate between tests named '{test_a_name}' and '{test_b_name}'"
        }

        response = requests.post(
            f"{self.base_url}/query",
            json=query_request,
            headers=self.headers
        )

        if response.status_code == 200:
            results = response.json()
            print("Test Performance Comparison:")
            for result in results:
                print(f"  {result}")
            return results
        else:
            print(f"Analysis failed: {response.status_code}")
            return None

    def analyze_agent_behavior_patterns(self, test_id):
        """Analyze agent behavior patterns in a test"""
        query_request = {
            "query": f"Analyze action types and response patterns for test ID {test_id}"
        }

        response = requests.post(
            f"{self.base_url}/query",
            json=query_request,
            headers=self.headers
        )

        if response.status_code == 200:
            results = response.json()
            print(f"Behavior Analysis for Test {test_id}:")
            for result in results:
                print(f"  {result}")
            return results
        else:
            print(f"Analysis failed: {response.status_code}")
            return None

    def get_error_analysis(self, test_id):
        """Get error analysis for a test"""
        query_request = {
            "query": f"Show all errors and failure patterns for test ID {test_id}"
        }

        response = requests.post(
            f"{self.base_url}/query",
            json=query_request,
            headers=self.headers
        )

        if response.status_code == 200:
            results = response.json()
            print(f"Error Analysis for Test {test_id}:")
            for result in results:
                print(f"  {result}")
            return results
        else:
            print(f"Analysis failed: {response.status_code}")
            return None

    def generate_test_report(self, test_name):
        """Generate comprehensive test report"""
        query_request = {
            "query": f"Generate a comprehensive report for test '{test_name}' including performance metrics, success rates, and error analysis"
        }

        response = requests.post(
            f"{self.base_url}/query",
            json=query_request,
            headers=self.headers
        )

        if response.status_code == 200:
            results = response.json()
            print(f"Test Report for {test_name}:")
            for result in results:
                print(f"  {result}")
            return results
        else:
            print(f"Report generation failed: {response.status_code}")
            return None

A/B Testing Example

class ABTestingFramework:
    def __init__(self, api_key):
        self.tester = AgentTester(api_key)
        self.analyzer = TestResultAnalyzer(api_key)

    def run_ab_test(self, test_name, config_a, config_b, scenarios):
        """Run A/B test with two configurations"""

        # Create tests for both configurations
        test_a_id = self.tester.create_test(f"{test_name}_A", config_a)
        test_b_id = self.tester.create_test(f"{test_name}_B", config_b)

        # Run scenarios for both configurations
        for scenario in scenarios:
            # Test Configuration A
            self.run_scenario_test(test_a_id, scenario, config_a)

            # Test Configuration B
            self.run_scenario_test(test_b_id, scenario, config_b)

        # Analyze results
        print(f"\nA/B Test Results for {test_name}:")
        self.analyzer.compare_test_performance(f"{test_name}_A", f"{test_name}_B")

        return test_a_id, test_b_id

    def run_scenario_test(self, test_id, scenario, config):
        """Run a single scenario test"""
        start_time = time.time()

        try:
            # Simulate agent processing based on configuration
            response = self.simulate_agent_response(scenario, config)
            duration = time.time() - start_time

            # Log action
            self.tester.log_agent_action(
                test_id,
                "scenario_test",
                scenario,
                response,
                duration
            )

            # Record metrics
            self.tester.record_metric(test_id, "response_time", duration)
            self.tester.record_metric(test_id, "confidence_score", response.get("confidence", 0))

            # Determine success
            success = response.get("error") is None

            # Log result
            self.tester.log_test_result(
                test_id,
                "ab_test_scenario",
                {"scenario": scenario, "response": response},
                success,
                response.get("error")
            )

        except Exception as e:
            # Log error
            self.tester.log_test_result(
                test_id,
                "ab_test_scenario",
                {"scenario": scenario, "error": str(e)},
                False,
                str(e)
            )

    def simulate_agent_response(self, scenario, config):
        """Simulate agent response based on configuration"""
        # Mock agent response logic
        if config.get("response_style") == "detailed":
            return {
                "response": "I'll provide detailed help with your request...",
                "confidence": 0.85,
                "action": "detailed_response"
            }
        else:
            return {
                "response": "I can help with that.",
                "confidence": 0.75,
                "action": "brief_response"
            }

Benefits for AI Agent Testing

Comprehensive Testing

Isolated Environments: Test different configurations without interference
Structured Data: Organized test data for easy analysis
Natural Language Analysis: Query test results using natural language
Performance Tracking: Track agent performance over time

Data-Driven Insights

Behavior Analysis: Analyze agent behavior patterns and trends
Performance Comparison: Compare different agent configurations
Error Identification: Identify and analyze error patterns
Optimization Guidance: Data-driven insights for agent improvement

Scalable Testing

Multiple Environments: Test multiple configurations simultaneously
Flexible Schema: Adapt database schema to different testing needs
API Integration: Easy integration with existing testing workflows
Automated Analysis: Automated analysis and reporting capabilities

Best Practices

Test Design

Clear Objectives: Define clear testing objectives and success criteria
Realistic Scenarios: Use realistic test scenarios that match production usage
Controlled Variables: Control variables to isolate the impact of changes
Comprehensive Coverage: Test edge cases and error scenarios

Data Management

Consistent Logging: Log all relevant data consistently across tests
Data Quality: Ensure high-quality test data for accurate analysis
Version Control: Track changes to test configurations and scenarios
Data Retention: Implement appropriate data retention policies

Analysis and Reporting

Regular Analysis: Regularly analyze test results for insights
Comparative Analysis: Compare results across different configurations
Trend Analysis: Track performance trends over time
Actionable Insights: Focus on actionable insights for improvement

Getting Started

Design Test Schema: Define your agent testing database schema
Create Test Environment: Set up isolated database for testing
Implement Testing Framework: Create framework for logging test data
Run Tests: Execute tests with different agent configurations
Analyze Results: Use natural language queries to analyze results

Gibson CLI Commands

# Create agent testing schema
gibson modify table_name "description of testing table"
gibson code models
gibson merge

# Generate models for testing integration
gibson code models
gibson code schemas

# Reset testing environment
gibson forget last
gibson build datastore

Ready to set up database environments for testing AI agent behavior? Get started with GibsonAI.