opencode: add info about android app UI interaction
This commit is contained in:
@@ -50,6 +50,54 @@ in
|
|||||||
## Nix
|
## Nix
|
||||||
For using `nix build` append `-L` to get better visibility into the logs.
|
For using `nix build` append `-L` to get better visibility into the logs.
|
||||||
If you get an error that a file can't be found, always try to `git add` the file before trying other troubleshooting steps.
|
If you get an error that a file can't be found, always try to `git add` the file before trying other troubleshooting steps.
|
||||||
|
|
||||||
|
|
||||||
|
## Android UI Interaction Workflow Summary
|
||||||
|
1. Taking Screenshots
|
||||||
|
adb exec-out screencap -p > /tmp/screen.png
|
||||||
|
Captures the current screen state as a PNG image.
|
||||||
|
|
||||||
|
2. Analyzing Screenshots
|
||||||
|
I delegate screenshot analysis to an explore agent rather than analyzing images directly:
|
||||||
|
mcp_task(subagent_type="explore", prompt="Analyze /tmp/screen.png. What screen is this? What elements are visible?")
|
||||||
|
The agent describes the UI, identifies elements, and estimates Y coordinates.
|
||||||
|
|
||||||
|
3. Getting Precise Element Coordinates
|
||||||
|
UI Automator dump - extracts the full UI hierarchy as XML:
|
||||||
|
adb shell uiautomator dump /sdcard/ui.xml && adb pull /sdcard/ui.xml /tmp/ui.xml
|
||||||
|
Then grep for specific elements:
|
||||||
|
# Find by text
|
||||||
|
grep -oP 'text="Login".*?bounds="[^"]*"' /tmp/ui.xml
|
||||||
|
# Find by class
|
||||||
|
grep -oP 'class="android.widget.EditText".*?bounds="[^"]*"' /tmp/ui.xml
|
||||||
|
Bounds format: [left,top][right,bottom] → tap center: ((left+right)/2, (top+bottom)/2)
|
||||||
|
|
||||||
|
4. Tapping Elements
|
||||||
|
adb shell input tap X Y
|
||||||
|
Where X, Y are pixel coordinates from the bounds.
|
||||||
|
|
||||||
|
5. Text Input
|
||||||
|
adb shell input text "some_text"
|
||||||
|
Note: Special characters need escaping (\!, \;, etc.)
|
||||||
|
|
||||||
|
6. Other Gestures
|
||||||
|
# Swipe/scroll
|
||||||
|
adb shell input swipe startX startY endX endY duration_ms
|
||||||
|
# Key events
|
||||||
|
adb shell input keyevent KEYCODE_BACK
|
||||||
|
adb shell input keyevent KEYCODE_ENTER
|
||||||
|
|
||||||
|
7. WebView Limitation
|
||||||
|
- UI Automator can see WebView content if accessibility is enabled
|
||||||
|
- Touch events on iframe content (like Cloudflare Turnstile) often fail due to cross-origin isolation
|
||||||
|
- Form fields in WebViews work if you get exact bounds from the UI dump
|
||||||
|
|
||||||
|
Typical Flow
|
||||||
|
1. Take screenshot → analyze with explore agent (get rough layout)
|
||||||
|
2. Dump UI hierarchy → grep for exact element bounds
|
||||||
|
3. Calculate center coordinates from bounds
|
||||||
|
4. Tap/interact
|
||||||
|
5. Wait → screenshot → verify result
|
||||||
'';
|
'';
|
||||||
settings = {
|
settings = {
|
||||||
theme = "opencode";
|
theme = "opencode";
|
||||||
|
|||||||
Reference in New Issue
Block a user