diff --git a/home-manager/progs/opencode.nix b/home-manager/progs/opencode.nix index 30c2c06..feb7896 100644 --- a/home-manager/progs/opencode.nix +++ b/home-manager/progs/opencode.nix @@ -50,6 +50,54 @@ in ## Nix For using `nix build` append `-L` to get better visibility into the logs. If you get an error that a file can't be found, always try to `git add` the file before trying other troubleshooting steps. + + + ## Android UI Interaction Workflow Summary + 1. Taking Screenshots + adb exec-out screencap -p > /tmp/screen.png + Captures the current screen state as a PNG image. + + 2. Analyzing Screenshots + I delegate screenshot analysis to an explore agent rather than analyzing images directly: + mcp_task(subagent_type="explore", prompt="Analyze /tmp/screen.png. What screen is this? What elements are visible?") + The agent describes the UI, identifies elements, and estimates Y coordinates. + + 3. Getting Precise Element Coordinates + UI Automator dump - extracts the full UI hierarchy as XML: + adb shell uiautomator dump /sdcard/ui.xml && adb pull /sdcard/ui.xml /tmp/ui.xml + Then grep for specific elements: + # Find by text + grep -oP 'text="Login".*?bounds="[^"]*"' /tmp/ui.xml + # Find by class + grep -oP 'class="android.widget.EditText".*?bounds="[^"]*"' /tmp/ui.xml + Bounds format: [left,top][right,bottom] → tap center: ((left+right)/2, (top+bottom)/2) + + 4. Tapping Elements + adb shell input tap X Y + Where X, Y are pixel coordinates from the bounds. + + 5. Text Input + adb shell input text "some_text" + Note: Special characters need escaping (\!, \;, etc.) + + 6. Other Gestures + # Swipe/scroll + adb shell input swipe startX startY endX endY duration_ms + # Key events + adb shell input keyevent KEYCODE_BACK + adb shell input keyevent KEYCODE_ENTER + + 7. WebView Limitation + - UI Automator can see WebView content if accessibility is enabled + - Touch events on iframe content (like Cloudflare Turnstile) often fail due to cross-origin isolation + - Form fields in WebViews work if you get exact bounds from the UI dump + + Typical Flow + 1. Take screenshot → analyze with explore agent (get rough layout) + 2. Dump UI hierarchy → grep for exact element bounds + 3. Calculate center coordinates from bounds + 4. Tap/interact + 5. Wait → screenshot → verify result ''; settings = { theme = "opencode";