Upload Image to PR
Upload local images to a GitHub PR and embed them in the description or comments using browser automation tools.
How It Works
Since the GitHub API does not support direct image uploads, this skill uses the PR comment textarea as a staging area for GitHub's image hosting — uploading files there to obtain persistent user-attachments/assets/ URLs, then updating the PR description or posting a comment via the gh CLI.
Step 0: Resolve PR context
If the user didn't specify a PR number or URL, auto-detect it:
# Get PR number from the current branch
gh pr view --json number,url -q '"\(.number) \(.url)"'
If multiple repos or branches are involved, confirm with the user which PR to target.
Also, normalize the image paths to absolute paths. If a path contains special characters (e.g., Unicode narrow spaces from CleanShot X), copy the file to /tmp/ first:
# e.g., to handle glob-matched paths with special chars
cp /path/to/CleanShot*keyword*.png /tmp/screenshot.png
Tool Detection and Selection
Priority Order
- Playwright MCP (MCP connection,
mcp__playwright__*) — connects to existing browser, login state preserved - Chrome DevTools MCP (MCP connection,
mcp__chrome-devtools__*) — connects to existing browser, login state preserved - agent-browser (CLI via Bash — fallback, login state preserved with
--profile)
MCP-based tools connect to an already-running browser instance, so GitHub login state is automatically preserved. agent-browser can persist login state using --profile ~/.agent-browser-github.
Detection
# 1. Search for MCP-based browser tools (preferred)
ToolSearch: "browser navigate upload"
# 2. Fall back to agent-browser only if no MCP tools found
Bash: agent-browser --version
Tool Compatibility Matrix
| Operation | Playwright MCP | Chrome DevTools MCP | agent-browser (CLI/Bash) |
|---|---|---|---|
| Navigate | browser_navigate | navigate_page | agent-browser --headed open {url} |
| Snapshot | browser_snapshot | take_snapshot | agent-browser snapshot |
| Screenshot | browser_take_screenshot | take_screenshot | agent-browser screenshot {path} |
| Click | browser_click (ref) | click (uid) | agent-browser click {ref} |
| File Upload | browser_file_upload (paths) | upload_file (uid, filePath) | agent-browser upload {ref} {path} |
| JS Eval | browser_evaluate (function) | evaluate_script (function) | agent-browser eval '{js}' |
| Login State | Preserved | Preserved | Preserved with --profile |
Steps
Step 1: Navigate to PR page and check login state
Navigate to the PR page and immediately take a snapshot to verify login state.
// Playwright MCP
browser_navigate({ url: "https://github.com/{owner}/{repo}/pull/{number}" })
// Chrome DevTools MCP
navigate_page({ url: "https://github.com/{owner}/{repo}/pull/{number}", type: "url" })
// agent-browser (use --profile to persist login state)
agent-browser --headed --profile ~/.agent-browser-github open "https://github.com/{owner}/{repo}/pull/{number}"
If SSO authentication screen appears: Take a snapshot, locate the "Continue" button, and click it.
If NOT logged in (agent-browser only):
- Navigate to
https://github.com/login - Ask the user to log in manually in the headed browser window.
- Wait for user confirmation, then navigate back to the PR page.
Step 2: Locate the file upload input
Take a snapshot/screenshot and scroll to the bottom to find the comment area.
GitHub renders a file upload input in the comment form. Try these selectors in order (GitHub's UI can change — if one fails, try the next):
// Shared JS for MCP-based tools — tries multiple known selectors
() => {
const selectors = [
'input[type="file"][id*="comment"]',
'input[type="file"][id="fc-new_comment_field"]',
'#new_comment_field',
'input[type="file"]'
];
for (const sel of selectors) {
const el = document.querySelector(sel);
if (el) return { found: true, id: el.id, selector: sel };
}
return { found: false };
}
For Chrome DevTools MCP, you can also take a snapshot to find the uid of the file upload element directly.
Step 3: Upload images one by one
Upload each image file using the detected tool. Wait 2–3 seconds between uploads to allow GitHub to process each file.
For multiple images, upload them all to the same comment textarea before extracting URLs — this is more efficient than navigating between uploads.
// Chrome DevTools MCP: upload_file requires the uid of the input element
// Playwright MCP: browser_file_upload takes the element ref and file path(s) array
// agent-browser: agent-browser upload {ref} {absolute_path}
Important: Always use absolute file paths.
Step 4: Retrieve uploaded image URLs
Wait 3–5 seconds after the last upload, then read the textarea value. GitHub injects markdown image syntax like  into the textarea:
// Shared JS — tries both known textarea IDs
() => {
const ta = document.getElementById('new_comment_field')
|| document.querySelector('textarea[id*="comment"]');
return ta ? ta.value : 'textarea not found';
}
# agent-browser
agent-browser eval 'document.getElementById("new_comment_field")?.value || document.querySelector("textarea[id*=comment]")?.value || "not found"'
The response contains URLs in the format:

Extract all image URLs/markdown from the textarea value before clearing it.
Step 5: Clear the textarea (do not submit the comment)
// MCP-based tools
() => {
const ta = document.getElementById('new_comment_field')
|| document.querySelector('textarea[id*="comment"]');
if (ta) { ta.value = ""; return "cleared"; }
return "textarea not found";
}
# agent-browser
agent-browser eval 'const ta = document.getElementById("new_comment_field") || document.querySelector("textarea[id*=comment]"); if(ta){ta.value=""} "cleared"'
Step 6: Embed images in the PR
Option A — Update PR description (append images to existing body):
EXISTING_BODY=$(gh pr view {PR_NUMBER} --json body -q .body)
gh pr edit {PR_NUMBER} --body "$(printf '%s\n\n## Screenshots\n\n%s' "$EXISTING_BODY" "")"
Option B — Post as a new comment:
gh pr comment {PR_NUMBER} --body "## Screenshots
"
Use Option A by default unless the user explicitly asks for a comment, or if the PR description is already long and a comment would be cleaner.
Step 7: Verify the result
Reload the page and take a screenshot to confirm the images are displayed correctly.
Tips
- Image sizing: Control display size via HTML
<img>tags:<img width="800" alt="description" src="..." /> - Multiple images: Upload all images in one session to the same textarea; extract all URLs before clearing
- Prefer MCP tools: Always prefer Playwright or Chrome DevTools MCP over agent-browser for simpler setup
- agent-browser login persistence: Use
--profile ~/.agent-browser-githubto persist GitHub login across sessions
Troubleshooting
| Issue | Solution |
|---|---|
| Not logged in (MCP tools) | SSO screen may appear — take snapshot, find "Continue" button, click it |
| Not logged in (agent-browser) | Use --headed mode, navigate to login page, ask user to log in manually |
| Browser window not visible | For agent-browser, ensure --headed flag is used |
| File path with special characters (e.g., Unicode narrow spaces from CleanShot) | Copy file to /tmp/ with a simple name: cp /path/CleanShot*keyword*.png /tmp/screenshot.png |
| File upload fails | Ensure the file path is absolute |
| Textarea doesn't contain URLs yet | Wait 3–5 seconds after upload before running JS eval; retry once if needed |
| Textarea selector not found | GitHub UI changes occasionally — use the multi-selector JS in Step 2 to find the current element |
| Chrome DevTools MCP disconnected | Reconnect via /mcp command |
| agent-browser not found | npm install -g agent-browser && agent-browser install |
| No browser tools found | Use ToolSearch to search for available browser tools |
| PR not found / 404 | Private repos return 404 for unauthenticated users — check login state |
Notes
- GitHub
user-attachments/assets/URLs are persistent — images remain accessible even without submitting the comment - Editing the description directly in the browser UI is fragile due to GitHub UI structure changes — updating via
gh pr editis strongly preferred - Multiple images can be uploaded in a single session before extracting URLs
- MCP-based tools connect to existing browser instances, preserving cookies and login sessions