diff --git a/.agents/skills/frontend-testing/SKILL.md b/.agents/skills/frontend-testing/SKILL.md
index 105c979c58..7f5fb4f50a 100644
--- a/.agents/skills/frontend-testing/SKILL.md
+++ b/.agents/skills/frontend-testing/SKILL.md
@@ -43,16 +43,16 @@ Apply this skill when the user:
 
 ```bash
 # Run all tests
-pnpm test
+vp run test
 
 # Watch mode
-pnpm test:watch
+vp run test --watch
 
 # Run specific file
-pnpm test path/to/file.spec.tsx
+vp run test path/to/file.spec.tsx
 
 # Generate coverage report
-pnpm test:coverage
+vp run test --coverage
 
 # Analyze component complexity
 pnpm analyze-component <path>
@@ -159,7 +159,7 @@ describe('ComponentName', () => {
 For each file:
   ┌────────────────────────────────────────┐
   │ 1. Write test                          │
-  │ 2. Run: pnpm test <file>.spec.tsx      │
+  │ 2. Run: vp run test <file>.spec.tsx    │
   │ 3. PASS? → Mark complete, next file    │
   │    FAIL? → Fix first, then continue    │
   └────────────────────────────────────────┘
@@ -228,7 +228,10 @@ Every test should clearly separate:
 ### 2. Black-Box Testing
 
 - Test observable behavior, not implementation details
-- Use semantic queries (getByRole, getByLabelText)
+- Use semantic queries (`getByRole` with accessible `name`, `getByLabelText`, `getByPlaceholderText`, `getByText`, and scoped `within(...)`)
+- Treat `getByTestId` as a last resort. If a control cannot be found by role/name, label, landmark, or dialog scope, fix the component accessibility first instead of adding or relying on `data-testid`.
+- Remove production `data-testid` attributes when semantic selectors can cover the behavior. Keep them only for non-visual mocked boundaries, editor/browser shims such as Monaco, canvas/chart output, or third-party widgets with no accessible DOM in the test environment.
+- Do not assert decorative icons by test id. Assert the named control that contains them, or mark decorative icons `aria-hidden`.
 - Avoid testing internal state directly
 - **Prefer pattern matching over hardcoded strings** in assertions:
 
diff --git a/web/docs/test.md b/web/docs/test.md
index b7c6a5f5a3..809cba4b5d 100644
--- a/web/docs/test.md
+++ b/web/docs/test.md
@@ -16,16 +16,16 @@ When I ask you to write/refactor/fix tests, follow these rules by default.
 
 ```bash
 # Run all tests
-pnpm test
+vp run test
 
 # Watch mode
-pnpm test:watch
+vp run test --watch
 
 # Generate coverage report
-pnpm test:coverage
+vp run test --coverage
 
 # Run specific file
-pnpm test path/to/file.spec.tsx
+vp run test path/to/file.spec.tsx
 ```
 
 ## Project Test Setup
@@ -45,6 +45,8 @@ pnpm test path/to/file.spec.tsx
 
 - **Single behavior per test**: Each test verifies one user-observable behavior.
 - **Black-box first**: Assert external behavior and observable outputs, avoid internal implementation details. Prefer role-based queries (`getByRole`) and pattern matching (`/text/i`) over hardcoded string assertions.
+- **Accessibility selectors first**: Prefer `getByRole` with accessible `name`, then `getByLabelText`, `getByPlaceholderText`, `getByText`, and scoped `within(...)` queries. Avoid `getByTestId` unless the target has no user-observable semantics, such as a mocked non-visual integration boundary, virtualized/canvas output, or a third-party widget shim.
+- **Fix markup before selectors**: If a test cannot find a control by role, name, label, landmark, or dialog scope, treat that as a component accessibility problem first. Add semantic HTML, an accessible name, or an associated label instead of adding `data-testid`.
 - **Semantic naming**: Use `should <behavior> when <condition>` and group related cases with `describe(<subject or scenario>)`.
 - **AAA / Given–When–Then**: Separate Arrange, Act, and Assert clearly with code blocks or comments.
 - **Minimal but sufficient assertions**: Keep only the expectations that express the essence of the behavior.
@@ -83,6 +85,16 @@ Use `pnpm analyze-component <path>` to analyze component complexity and adopt di
 - ✅ TypeScript: No `any` types
 - ✅ **Cleanup**: `vi.clearAllMocks()` should be in `beforeEach()`, not `afterEach()`. This ensures mock call history is reset before each test, preventing test pollution when using assertions like `toHaveBeenCalledWith()` or `toHaveBeenCalledTimes()`.
 
+### Test ID Policy
+
+`data-testid` is a last resort because users and assistive technology cannot perceive it. It can hide broken semantics, such as a clickable `div` that should have been a button or an input without a label.
+
+- Prefer `screen.getByRole('button', { name: /save/i })`, `screen.getByRole('textbox', { name: /email/i })`, and scoped queries like `within(screen.getByRole('dialog', { name: /settings/i }))`.
+- Prefer `userEvent` for interactions that should match real user behavior.
+- Remove production `data-testid` attributes when they exist only to support tests and a semantic selector is available.
+- Keep `data-testid` only for justified boundaries that cannot expose stable semantics in the test environment: mocked non-visual children, browser/editor shims such as Monaco, canvas/chart output, or third-party widgets. In those cases, keep assertions focused on the boundary contract rather than internal DOM shape.
+- Do not assert decorative icons by test id. If the icon conveys meaning, give the control an accessible name and assert the control. If the icon is decorative, keep it `aria-hidden`.
+
 **⚠️ Mock components must accurately reflect actual component behavior**, especially conditional rendering based on props or state.
 
 **Rules**: