Environment: ChromeDriver 138.0.7204.168 Chrome 138.0.7204.184 Selenium 4.34.2 Windows 11
I shared a spreadsheet by iframe, and embeded it to a web page a.html:
<html>
<head>
</head>
<body>
<iframe src="https://docs.google.com/spreadsheets/d/e/2PACX-1vQaJwXqci0KQzDjzs8kvy--p80OZY1n30t4NWRh2qkU3pJqAdB4ZJEc79ohh4OkuifHWOHBRi0Z0yKS/pubhtml?gid=0&single=true&widget=true&headers=false"></iframe>
</body>
</html>
It looks like this:
I write a code to get all the domain names a page accessed:
from urllib.parse import urlparse
import json
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
import time
chrome_options = Options()
chrome_options.set_capability('goog:loggingPrefs',
{'performance': 'ALL', })
service = Service(executable_path='C:\\Users\\qi\\PycharmProjects\\PythonLearn\\DomainCollector\\chromedriver.exe')
driver = webdriver.Chrome(service=service, options=chrome_options)
test_url = "file:///C:/Users/qi/Desktop/a.html"
driver.get(test_url)
time.sleep(5)
dms = set()
logs = driver.get_log('performance')
for entry in logs:
msg = entry['message']
try:
data = json.loads(msg)
method = data['message']['method']
if method == 'Network.requestWillBeSent':
url = data['message']['params']['request']['url']
domain = urlparse(url).netloc
if domain:
dms.add(domain)
except Exception:
continue
print(f"domains:{dms}")
Result:
domains:{'docs.google.com'}
but it accessed other domain names:
Why it can't get all the domain names?

