Assessing external test risk
Use this skill to decide whether a branch or PR should include external e2e coverage using @pytest.mark.external_test .
This helps protect deployments that commonly involve proxies, embedded iframe contexts, CSP constraints, and other browser security boundaries.
This skill is for risk assessment and recommendation. It does not auto-mark tests unless explicitly requested.
Decision rule
Use an any-hit policy:
-
If any checklist category is hit, output Recommend external_test: Yes
-
If no categories are hit, output Recommend external_test: No
Inputs to review
-
Branch or PR diff against its base branch
-
Changed files and related tests
-
PR description (if available)
Assessment workflow
-
Gather the changed files and full diff against the base branch.
-
Evaluate each checklist category below as hit or not hit.
-
Record concrete evidence from file paths and diff snippets.
-
Produce a recommendation and specific external-test focus areas.
Checklist categories
Evaluate all categories. A single hit is enough to recommend external coverage.
Routing and URL behavior
-
Hit when changes introduce or modify Tornado or Starlette routes, server.baseUrlPath , catch-alls, request methods, URL resolution, redirects, or status codes.
Auth, cookies, CSRF, and identity binding
-
Hit when changes touch login/logout or OAuth flows, _streamlit_user , _streamlit_xsrf , CSRF/XSRF handling, server.trustedUserHeaders , or session-to-identity binding.
Websocket handshake and session transport
-
Hit when changes affect websocket handshake or subprotocols, session affinity, reconnect behavior, ping or timeout behavior, message size limits, or fragmentation.
Embedding and iframe boundary
-
Hit when changes modify host-to-guest communication (postMessage ), iframe sizing or resize behavior, iframe sandbox or allow attributes, or permissions policy behavior in embedded contexts.
Static and component asset serving
-
Hit when changes alter asset handlers, cache headers, size limits, base paths (including server.customComponentBaseUrlPath ), or proxying rules for static/component assets.
Service worker, uploads, and downloads
-
Hit when changes modify service worker registration, scope, or caching strategy; upload/download endpoints; JWT or CSRF wrapping; or download attribute behavior.
Cross-origin behavior and external networking
-
Hit when changes alter CORS allowlists, crossOrigin usage, external-origin fetches or external networks behavior, or backend URL discovery via window.__streamlit.* .
Cross-origin theming and resource discovery
-
Hit when changes introduce or modify theme/resource loading across origins (fonts, images, theme globals), CSS isolation with host pages, or manifest/asset discovery when HTML is not served by Tornado or Starlette.
SiS and Snowflake runtime dependencies
-
Hit when changes rely on or modify SiS/Snowflake runtime behavior, including running_in_sis() , get_active_session() , Snowflake connection/session semantics, or SiS-specific environment flags.
Client storage behavior
-
Hit when changes introduce or modify cookies, localStorage , or sessionStorage usage that may differ in embedded or third-party contexts.
Security headers and browser policies
- Hit when changes adjust CSP, Referrer-Policy, Permissions-Policy, or related headers that can impact embedding or resource loading.
Output format
Use this exact structure:
External test recommendation
- Recommend external_test: [Yes/No]
- Triggered categories: [List category numbers and names, or "None"]
- Evidence:
<path>: [short reason from diff]<path>: [short reason from diff]
- Suggested external_test focus areas:
- [Concrete scenario to validate externally]
- [Concrete scenario to validate externally]
- Confidence: [High/Medium/Low]
- Assumptions and gaps: [Unknowns, missing context, or why confidence is reduced]
Interpretation guidance
-
Prefer evidence over intuition. Tie each hit to concrete diff details.
-
When in doubt, err toward Yes if externally hosted or embedded behavior could diverge from local runs.
-
Keep focus areas specific and testable (route, auth handshake, iframe boundary, asset loading, SiS runtime behavior).
Examples
Example yes recommendation
Diff includes:
-
lib/streamlit/web/server/routes.py route changes
-
Cookie/XSRF handling updates in request auth middleware
-
Frontend embed code changing iframe allow attributes
Expected output:
-
Recommend external_test: Yes
-
Triggered categories include routing, auth/cookies/CSRF, and embedding boundary
-
Focus areas include external host iframe embedding + auth/session continuity checks
Example no recommendation
Diff includes:
-
Pure refactor in internal utility functions with no network, auth, embedding, storage, or runtime integration impact
-
Docs and test name cleanup only
Expected output:
-
Recommend external_test: No
-
Triggered categories: None
-
Confidence is high if no indirect integration points are touched