First Test

This guide takes you from zero to a passing LLM assertion test in 4 steps. No configuration files needed.

Prerequisites: Install @llmassert/playwright and set your OPENAI_API_KEY.

1. Create a test file

Create tests/llm.spec.ts:

tests/llm.spec.ts

import { test, expect } from "@llmassert/playwright";

test("response is grounded in source docs", async () => {
  const response = "Our return window is 30 days from purchase.";
  const context = "Returns accepted within 30 days. No restocking fee.";

  await expect(response).toBeGroundedIn(context);
});

test("response contains no PII", async () => {
  const response = "Your order #12345 has shipped.";

  await expect(response).toBeFreeOfPII();
});

Import test and expect from @llmassert/playwright, not from @playwright/test. This gives you the LLM assertion matchers. Your playwright.config.ts still uses defineConfig from @playwright/test as normal.

2. Run the test

pnpm exec playwright test tests/llm.spec.ts

You should see output like:

Running 2 tests using 1 worker

  ✓ response is grounded in source docs (2.1s)
  ✓ response contains no PII (1.8s)

  2 passed (4.2s)

That's it. You just ran LLM-powered assertions using the same expect() API you already know from Playwright.

What just happened?

Your test strings were sent to GPT-5.4-mini (the judge model) for evaluation
The judge returned a score between 0.0 and 1.0 with reasoning
The score was compared against the default threshold (0.7) to determine pass/fail
If the judge was unavailable, the result would be inconclusive and the test would pass — provider outages never fail your CI

Try more matchers

LLMAssert provides 5 assertion matchers:

// Check tone/sentiment
await expect(response).toMatchTone("professional and helpful");

// Validate output format
await expect(response).toBeFormatCompliant(
  "JSON object with fields: id (number), name (string)",
);

// Compare semantic meaning
await expect(summary).toSemanticMatch(expectedSummary);

See the Matchers section for full documentation on all 5 matchers.

Next steps

Set up the dashboard reporter — track results over time
Explore all matchers — groundedness, PII, tone, format, semantic
Configure the judge — timeouts, fallback chain, models

1. Create a test file

2. Run the test

What just happened?

Try more matchers

Next steps

On this page