Agent Evaluation Harness Creator

Trigger Phrase

Use the Agent Evaluation Harness Creator skill

Prompt

276 words

# QUALITY GATE: Agent Evaluation Harness Creator

## Purpose
Create deterministic test cases that prove whether an agent skill improves real task completion instead of just sounding clever.

## When to Use
Use this when the user needs to: Create deterministic test cases that prove whether an agent skill improves real task completion instead of just sounding clever. It is designed for repeatable agent or automation work, not one-off fluffy prompting.

## Inputs Required
- Skill or agent being tested
- Target task
- Good output example
- Bad output example
- Pass/fail rules

## Workflow
1. Define 5 representative tasks and 3 edge cases
2. Create objective scoring criteria
3. Specify a no-skill baseline comparison
4. Record pass rate and failure patterns
5. Recommend changes only when evidence supports them

## Output Format
Evaluation pack with test cases, scoring rubric, baseline method, and improvement backlog.

## Quality Rules
- Ground every claim in supplied inputs or clearly mark it as an assumption.
- Prefer specific fields, examples, and decision points over generic advice.
- Include a test command and pass criteria so the skill can be evaluated.
- Keep the output usable by a human first and automation-ready second.

## Guardrails
- Do not send emails, publish posts, contact leads, change calendars, spend money, delete data, or alter production systems without explicit human approval.
- Do not invent facts, private details, legal claims, prices, or external data.
- For web or profile research, use current sources and separate evidence from inference.
- For connected tools, use the minimum permission needed and log the action taken.

## Test Command
Run this skill on a simple example for Agent Evaluation Harness Creator and return the expected output structure plus any missing inputs.

Before & After

❌ Without this prompt

Make me an automation for agent evaluation harness creator.

✅ With this prompt

Use the Agent Evaluation Harness Creator skill. Inputs: goal, systems, data fields, approval owner, and definition of done. Return the workflow, tests, guardrails, and approval checkpoint.

Install Instructions

Copy the body into Prompt Hub as a quality-gate. For Agent Skills, save as SKILL.md or paste into the target agent/project. For n8n/Make/Voiceflow, use this as the build blueprint before importing any third-party JSON.

Test It

Test command:

Run Agent Evaluation Harness Creator with a tiny dummy case and verify it returns the declared output format, missing-input warnings, guardrails, and pass criteria.

Expected output:

Evaluation pack with test cases, scoring rubric, baseline method, and improvement backlog.

Pass criteria:

Passes when the output is specific, complete, safe, testable, approval-aware where needed, and immediately usable with light editing.

⚠️ Guardrails

Human approval is required before external sends, publishing, destructive changes, spending money, calendar booking, CRM updates that change customer status, or production system changes. Use least-privilege tool access.

📁 Context File Tip

Source context: Sabrina Ramonov describes agents.sabrina.dev as a free library of AI agents and automations, including n8n and Make templates. Related source: https://arxiv.org/abs/2602.12670. Agent-skill structure context: https://agentskills.io/home. Security context: https://arxiv.org/abs/2510.26328.

⚡ Automation

Agent Skills

🔌 MCP-compatible

Next step: Run the Agent Skill Reviewer, then test with dummy data before connecting real tools or importing automation JSON.

👤 By HackTheSim / Bruce · v1.0.0

💡 Suggest an improvement