0

Environment: ChromeDriver 138.0.7204.168 Chrome 138.0.7204.184 Selenium 4.34.2 Windows 11

I shared a spreadsheet by iframe, and embeded it to a web page a.html:

<html>
  <head>
  </head>
  <body>
    <iframe src="https://docs.google.com/spreadsheets/d/e/2PACX-1vQaJwXqci0KQzDjzs8kvy--p80OZY1n30t4NWRh2qkU3pJqAdB4ZJEc79ohh4OkuifHWOHBRi0Z0yKS/pubhtml?gid=0&amp;single=true&amp;widget=true&amp;headers=false"></iframe>
  </body>
</html>

It looks like this:

the page image

I write a code to get all the domain names a page accessed:

from urllib.parse import urlparse
import json
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
import time


chrome_options = Options()

chrome_options.set_capability('goog:loggingPrefs',
                              {'performance': 'ALL', })

service = Service(executable_path='C:\\Users\\qi\\PycharmProjects\\PythonLearn\\DomainCollector\\chromedriver.exe')
driver = webdriver.Chrome(service=service, options=chrome_options)

test_url = "file:///C:/Users/qi/Desktop/a.html"
driver.get(test_url)
time.sleep(5)

dms = set()
logs = driver.get_log('performance')

for entry in logs:
    msg = entry['message']
    try:
        data = json.loads(msg)

        method = data['message']['method']
        if method == 'Network.requestWillBeSent':
            url = data['message']['params']['request']['url']
            domain = urlparse(url).netloc
            if domain:
                dms.add(domain)

    except Exception:
        continue

print(f"domains:{dms}")

Result:

domains:{'docs.google.com'}

but it accessed other domain names:

result by DevTools

Why it can't get all the domain names?

0

1 Answer 1

0

ChromeDriver’s “performance” log does not reliably include network traffic from cross‑origin iframes (OOPIF). Setting goog:loggingPrefs performance=ALL only controls log level; it doesn’t solve multi‑target (iframe) capture. Even with perfLoggingPrefs.enableNetwork=true, ChromeDriver’s performance log is mostly limited to the top‑level target.

Alternative:Use Playwright

import asyncio
from urllib.parse import urlparse
from playwright.async_api import async_playwright

dms = set()
def get_request_sent(request):
    domain = urlparse(request.url).netloc
    if domain:
        dms.add(domain)

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch(executable_path='..\\DomainCollector\\chrome-win64\\chrome.exe',
                                          headless=False)
        page = await browser.new_page()
        page.on("request", get_request_sent)

        await page.goto("file:///C:/Users/qi/Desktop/a.html")
        await page.wait_for_load_state("networkidle")


asyncio.run(main())

print(f"domains:{dms}")

Result:

domains:{'ssl.gstatic.com', 'fonts.gstatic.com', 'fonts.googleapis.com', 'docs.google.com'}
Sign up to request clarification or add additional context in comments.

1 Comment

you are wrong Selenium does via using the DevTools client which uses the same CDP that playwright uses

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.