Getting Started
First Test
Write and run your first LLM assertion in under 3 minutes
This guide takes you from zero to a passing LLM assertion test in 4 steps. No configuration files needed.
Prerequisites: Install
@llmassert/playwright and set your
OPENAI_API_KEY.
1. Create a test file
Create tests/llm.spec.ts:
import { test, expect } from "@llmassert/playwright";
test("response is grounded in source docs", async () => {
const response = "Our return window is 30 days from purchase.";
const context = "Returns accepted within 30 days. No restocking fee.";
await expect(response).toBeGroundedIn(context);
});
test("response contains no PII", async () => {
const response = "Your order #12345 has shipped.";
await expect(response).toBeFreeOfPII();
});Import test and expect from @llmassert/playwright, not from
@playwright/test. This gives you the LLM assertion matchers. Your
playwright.config.ts still uses defineConfig from @playwright/test as
normal.
2. Run the test
pnpm exec playwright test tests/llm.spec.tsYou should see output like:
Running 2 tests using 1 worker
✓ response is grounded in source docs (2.1s)
✓ response contains no PII (1.8s)
2 passed (4.2s)That's it. You just ran LLM-powered assertions using the same expect() API you already know from Playwright.
What just happened?
- Your test strings were sent to GPT-5.4-mini (the judge model) for evaluation
- The judge returned a score between 0.0 and 1.0 with reasoning
- The score was compared against the default threshold (0.7) to determine pass/fail
- If the judge was unavailable, the result would be
inconclusiveand the test would pass — provider outages never fail your CI
Try more matchers
LLMAssert provides 5 assertion matchers:
// Check tone/sentiment
await expect(response).toMatchTone("professional and helpful");
// Validate output format
await expect(response).toBeFormatCompliant(
"JSON object with fields: id (number), name (string)",
);
// Compare semantic meaning
await expect(summary).toSemanticMatch(expectedSummary);See the Matchers section for full documentation on all 5 matchers.
Next steps
- Set up the dashboard reporter — track results over time
- Explore all matchers — groundedness, PII, tone, format, semantic
- Configure the judge — timeouts, fallback chain, models