The whole attack in one paragraph
The attacker stands up a real browser inside a container — a headless Chromium running under Xvfb in a Docker image, with --remote-debugging-port exposed so the operator can drive it via CDP. They point that browser at accounts.google.com and stream the rendered display to the victim over noVNC, a JavaScript VNC client that runs in the victim's own browser. The victim clicks a phishing lure, lands on what looks like a normal "shared file" page or a Google Docs invitation, and is funneled into what looks like a Google sign-in. Because it is a Google sign-in. The HTML is real Google HTML. The TLS certificate is Google's. The CAPTCHA is Google's. The 2SV push is Google's. Everything the victim sees is rendered by Google's own JavaScript inside the attacker's browser, then streamed back to the victim as a pixel feed. The victim types their email, types their password, approves the 2SV push on their phone, and Google authenticates the session — inside the attacker's browser. The attacker now holds an authenticated Gmail tab. They open the operator panel, dump the 29 session cookies, walk the victim's account through a headless OAuth grant flow that mints a refresh token scoped to mail.google.com, and they are done. They have three things: a session cookie jar, a live noVNC link to the authenticated browser, and an OAuth refresh token. The first two have a finite lifetime. The third one survives password rotation. The third one is the prize.
That is the attack. It is shorter than the AiTM kill chain and significantly nastier in two specific ways: the page the victim interacts with is the genuine page (so URL and certificate inspection do not help), and the FIDO2/passkey defense that breaks AiTM does not break this — the credential ceremony fires inside the attacker's browser, against Google's real origin, and Google approves it.
Why your AiTM playbook does not catch this
If you have hardened against AiTM phishing, almost none of it transfers. The reasons:
- Domain-mismatch detection does not help. The page is real Google. The attacker's domain is the wrapper the victim clicked, not what is rendered. By the time the victim sees the Google login, they are looking at pixels from
accounts.google.comrendered by Google's JavaScript. URL-bar inspection on the lure page might catch it; URL-bar inspection on the rendered Google page is meaningless because that page is not loaded in the victim's browser at all. - TLS-cert pinning, HSTS, certificate transparency. None of it applies. The TLS connection to Google is from the attacker's browser, not from the victim's. The victim's connection is to the lure domain over its own valid TLS, carrying noVNC frames.
- Phishkit-signature scanners. They look for cloned HTML matching known phishing-kit fingerprints. There is no clone. Nothing to fingerprint.
- FIDO2 and passkeys. The standard intuition is "FIDO2 binds to origin, so the credential is useless on a phishing domain." That is true for AiTM where the attacker's reverse proxy is the origin the credential sees. In BitM, the credential ceremony fires inside the attacker's browser against the real Google origin. The authenticator signs the challenge, Google accepts it, and the session is established. The attacker now holds the authenticated browser. FIDO2 is not bypassed in any cryptographic sense — it satisfied normally, against the real relying party, on a session the attacker controls.
- Push-2FA awareness ("only approve what you initiated"). The victim did initiate it. They typed their email into what they believe is a real Google login. They are expecting the push. They approve. Google completes the sign-in. From the user's mental model, nothing was wrong.
This is the gap. The detections in this bundle are built to catch the one thing BitM cannot hide — the noVNC stream itself — plus the OAuth artifacts the attacker leaves behind on the Google side.
The three artifacts — and why one matters more than the other two
A successful BitM capture yields three artifacts of escalating value:
1. Session cookie jar. The full set of Gmail cookies — SID, HSID, SSID, __Secure-3PSID, OSID, NID, and roughly 23 others. Lifetime depends on Google's session policy, typically minutes to hours of active replay value. On consumer accounts with 2SV-by-push (the default), Google does not bind these cookies to IP — a captured jar replayed from a fresh VPS IP succeeds without re-challenge. Enterprise Workspace accounts with Conditional Access or BeyondCorp device trust block the replay, which makes them materially more resistant.
2. Live noVNC link. The operator can hijack the live authenticated browser session within the configured window (60 minutes default in the framework we built). This survives 2SV pushes, FIDO2 challenges, and passkeys, because the operator is the user as far as the IdP is concerned — they are sitting at the keyboard of the same browser the victim used. They can click "Approve" on push prompts, complete passkey ceremonies, and operate the account interactively.
3. OAuth refresh token. This is the one that matters. Once the framework completes the headless OAuth exchange (which it does automatically after credential capture), the operator holds a refresh token scoped to mail.google.com. It is independent of the user's password, independent of the user's session, independent of the user's IP, and independent of the user's device. The access token derived from it lasts 3600 seconds and is silently re-issued on demand. From the victim's perspective, there is no sign-in event after the initial capture — no notification, no security alert, no login log entry. The attacker reads mail invisibly, indefinitely, until the user explicitly revokes the grant at myaccount.google.com/permissions.
We tested this on our own account. Rotating the password did not revoke the refresh token. Signing out all sessions did not revoke the refresh token. The only thing that revoked it was an explicit visit to the OAuth permissions page and a manual click on Remove Access. Most IR runbooks do not include this step. That is the under-told story this bundle exists to tell.
Why Gmail specifically
The technique works against any IdP that allows OAuth refresh tokens — that is to say, almost all of them. We focus on Gmail because:
- Consumer Gmail is the most common target — billions of accounts, broad social-engineering vectors (Drive shares, Docs invitations, Calendar invites)
- Workspace tenants have variable security posture — some have Conditional Access and BeyondCorp, most do not, and the latter behave like consumer Gmail for replay purposes
- The Gmail OAuth scope
mail.google.comgives the attacker full mailbox access via the Gmail API, including the batch endpoint that lets them pull metadata for 100 messages per request — fast, quiet, and a detection signal we cover in the detection post - Gmail does not currently bind session cookies to IP on consumer accounts, so cookie replay from a fresh IP succeeds with no challenge
The same defensive primitives apply to BitM against Microsoft accounts, GitHub, Okta, Auth0, or anything else that issues OAuth refresh tokens after a sign-in. The signals are the same. The IR moves are the same. The OAuth grant audit happens at a different URL — that is the only meaningful difference.
Mapped to ATT&CK
- T1566.002 — Phishing: Spearphishing Link. The lure URL delivered to the victim. Standard phishing technique.
- T1539 — Steal Web Session Cookie. The session cookie jar captured from the authenticated browser.
- T1550.001 — Use Alternate Authentication Material: Application Access Token. The OAuth refresh token issued to the attacker's "Desktop app" client.
- T1098.001 — Account Manipulation: Additional Cloud Credentials. The OAuth grant itself functions as a long-lived cloud credential the attacker added to the account.
- T1078.004 — Valid Accounts: Cloud Accounts. Subsequent access using the captured artifacts.
- T1114.002 — Email Collection: Remote Email Collection. Mailbox content pulled via the Gmail API.
The detection post fires on T1539 (network/cookie signals), T1550.001 (OAuth-side signals), and T1078.004 (post-compromise sign-in anomalies). The mitigations close the path before the capture happens.
What the attack does in practice (for defender context)
Brief inventory of attacker capabilities observed in our research lab, because precision about the threat model produces better defenses. Everything below is from direct lab observation, not theory.
- Container infrastructure. Each victim gets a fresh Chrome container (Xvfb + openbox + Chrome + socat, supervisord-managed). CDP ports 9222/9223 for the visible Chrome, 9224/9225 for a separate headless Chrome that runs the OAuth exchange. Container image carries the same Chrome version for all victims, which means the container UA is constant across captures — useful for IR if the attacker's VPS is seized.
- Lure path. The framework uses path patterns like
/f/<slug>for the wrapper URL. Detection signal: short randomized slug under a single path segment, on a domain that resolves to a single VPS IP rather than a CDN. - Operator panel. Once a victim is captured, the operator gets a fully functional Gmail clone in their own browser populated with the victim's live mailbox. They can read every folder, compose and send mail as the victim, and pull full MIME bodies including attachments. This is persistent authenticated identity, not "we got a cookie."
- Headless OAuth exchange. After credential capture, the framework spins up a separate headless Chrome (the 9224/9225 instance), drives it through Google's OAuth consent flow programmatically, accepts the consent on behalf of the victim (they are already authenticated in the original session), and exchanges the resulting authorization code for a refresh token. This happens silently within seconds of the password being typed. The user sees nothing.
The whole flow — credential capture, cookie capture, OAuth grant, refresh-token issuance — completes in under 10 seconds from the moment the victim clicks Sign in. By the time any traditional security alert could fire, the attacker already holds permanent access.
Read the detection post next — the signals you can fire on at each layer, plus the lab observations that ground the network-level detection in real artifacts.