opencode: add info about android app UI interaction
This commit is contained in:
@@ -50,6 +50,54 @@ in
|
||||
## Nix
|
||||
For using `nix build` append `-L` to get better visibility into the logs.
|
||||
If you get an error that a file can't be found, always try to `git add` the file before trying other troubleshooting steps.
|
||||
|
||||
|
||||
## Android UI Interaction Workflow Summary
|
||||
1. Taking Screenshots
|
||||
adb exec-out screencap -p > /tmp/screen.png
|
||||
Captures the current screen state as a PNG image.
|
||||
|
||||
2. Analyzing Screenshots
|
||||
I delegate screenshot analysis to an explore agent rather than analyzing images directly:
|
||||
mcp_task(subagent_type="explore", prompt="Analyze /tmp/screen.png. What screen is this? What elements are visible?")
|
||||
The agent describes the UI, identifies elements, and estimates Y coordinates.
|
||||
|
||||
3. Getting Precise Element Coordinates
|
||||
UI Automator dump - extracts the full UI hierarchy as XML:
|
||||
adb shell uiautomator dump /sdcard/ui.xml && adb pull /sdcard/ui.xml /tmp/ui.xml
|
||||
Then grep for specific elements:
|
||||
# Find by text
|
||||
grep -oP 'text="Login".*?bounds="[^"]*"' /tmp/ui.xml
|
||||
# Find by class
|
||||
grep -oP 'class="android.widget.EditText".*?bounds="[^"]*"' /tmp/ui.xml
|
||||
Bounds format: [left,top][right,bottom] → tap center: ((left+right)/2, (top+bottom)/2)
|
||||
|
||||
4. Tapping Elements
|
||||
adb shell input tap X Y
|
||||
Where X, Y are pixel coordinates from the bounds.
|
||||
|
||||
5. Text Input
|
||||
adb shell input text "some_text"
|
||||
Note: Special characters need escaping (\!, \;, etc.)
|
||||
|
||||
6. Other Gestures
|
||||
# Swipe/scroll
|
||||
adb shell input swipe startX startY endX endY duration_ms
|
||||
# Key events
|
||||
adb shell input keyevent KEYCODE_BACK
|
||||
adb shell input keyevent KEYCODE_ENTER
|
||||
|
||||
7. WebView Limitation
|
||||
- UI Automator can see WebView content if accessibility is enabled
|
||||
- Touch events on iframe content (like Cloudflare Turnstile) often fail due to cross-origin isolation
|
||||
- Form fields in WebViews work if you get exact bounds from the UI dump
|
||||
|
||||
Typical Flow
|
||||
1. Take screenshot → analyze with explore agent (get rough layout)
|
||||
2. Dump UI hierarchy → grep for exact element bounds
|
||||
3. Calculate center coordinates from bounds
|
||||
4. Tap/interact
|
||||
5. Wait → screenshot → verify result
|
||||
'';
|
||||
settings = {
|
||||
theme = "opencode";
|
||||
|
||||
Reference in New Issue
Block a user