Steps to obtain browser network logs with Python Selenium

Question

Steps to obtain browser network logs with Python Selenium

Seeking assistance on retrieving browser network logs through Selenium for troubleshooting request/responses. Any guidance would be greatly appreciated.

Currently utilizing Selenium 3.14.0 in conjunction with the most recent version of Chrome browser.

python selenium selenium-webdriver selenium-chromedriver

Answer 1

Answer №1

Utilizing python with selenium and firefox

Avoid setting up a proxy unless absolutely necessary. To retrieve outbound API requests, I implemented a solution similar to this one, but in python:

test = driver.execute_script("var performance = window.performance || window.mozPerformance || window.msPerformance || window.webkitPerformance || {}; var network = performance.getEntries() || {}; return network;")

for item in test:
  print(item)

An array of dictionaries is returned.

This method allows me to monitor all network requests made. It helps me extract a parameter from one of the requests, which I can then utilize for making my own requests to the API.

Using python with selenium and Chrome

UPDATE: This approach garnered significant attention. Here's how I currently implement it with Chrome (adapted from undetected-chromedriver code):

chrome_options = webdriver.ChromeOptions()
chrome_options.set_capability(
                        "goog:loggingPrefs", {"performance": "ALL", "browser": "ALL"}
                    )
driver = webdriver.Chrome(options=chrome_options)


##visit your website, login, etc. then:
log_entries = driver.get_log("performance")

for entry in log_entries:

    try:
        obj_serialized: str = entry.get("message")
        obj = json.loads(obj_serialized)
        message = obj.get("message")
        method = message.get("method")
        if method in ['Network.requestWillBeSentExtraInfo' or 'Network.requestWillBeSent']:
            try:
                for c in message['params']['associatedCookies']:
                    if c['cookie']['name'] == 'authToken':
                        bearer_token = c['cookie']['value']
            except:
                pass
        print(type(message), method)
        print('--------------------------------------')
    except Exception as e:
        raise e from None

By utilizing this method, you can extract tokens, api keys, and other information that your browser transmits to the server.

Answer 2

Utilizing python with selenium and firefox

Avoid setting up a proxy unless absolutely necessary. To retrieve outbound API requests, I implemented a solution similar to this one, but in python:

test = driver.execute_script("var performance = window.performance || window.mozPerformance || window.msPerformance || window.webkitPerformance || {}; var network = performance.getEntries() || {}; return network;")

for item in test:
  print(item)

An array of dictionaries is returned.

This method allows me to monitor all network requests made. It helps me extract a parameter from one of the requests, which I can then utilize for making my own requests to the API.

Using python with selenium and Chrome

UPDATE: This approach garnered significant attention. Here's how I currently implement it with Chrome (adapted from undetected-chromedriver code):

chrome_options = webdriver.ChromeOptions()
chrome_options.set_capability(
                        "goog:loggingPrefs", {"performance": "ALL", "browser": "ALL"}
                    )
driver = webdriver.Chrome(options=chrome_options)


##visit your website, login, etc. then:
log_entries = driver.get_log("performance")

for entry in log_entries:

    try:
        obj_serialized: str = entry.get("message")
        obj = json.loads(obj_serialized)
        message = obj.get("message")
        method = message.get("method")
        if method in ['Network.requestWillBeSentExtraInfo' or 'Network.requestWillBeSent']:
            try:
                for c in message['params']['associatedCookies']:
                    if c['cookie']['name'] == 'authToken':
                        bearer_token = c['cookie']['value']
            except:
                pass
        print(type(message), method)
        print('--------------------------------------')
    except Exception as e:
        raise e from None

By utilizing this method, you can extract tokens, api keys, and other information that your browser transmits to the server.

Answer 3

Answer №2

Utilizing Python and ChromeDriver for Network Logs

If you want to access network logs, make sure to install BrowserMobProxy in addition to selenium in python

pip install browsermob-proxy

Next, download the browsermobproxy zip file from

Unzip the contents into a designated folder (e.g., path/to/extracted_folder). This folder contains the necessary binary files that need to be referenced when calling Server() in your python code.

You must start the browser proxy and configure it within the chrome driver options,

from browsermobproxy import Server
from selenium import webdriver

server = Server("path/to/extracted_folder/bin/browsermob-proxy")
server.start()
proxy = server.create_proxy()

# Configure the browser proxy in chrome options
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--proxy-server={0}".format(proxy.proxy))
browser = webdriver.Chrome(chrome_options = chrome_options)

#tag the har(network logs) with a name
proxy.new_har("google")

Proceed by navigating to a page using selenium

browser.get("http://www.google.co.in")

Upon navigation, retrieve the network logs in json format from the proxy

print(proxy.har) # returns a Network logs (HAR) as JSON

Prior to quitting the driver, remember to stop the proxy server as well at the end,

server.stop()
browser.quit()

Answer 4

Utilizing Python and ChromeDriver for Network Logs

If you want to access network logs, make sure to install BrowserMobProxy in addition to selenium in python

pip install browsermob-proxy

Next, download the browsermobproxy zip file from

Unzip the contents into a designated folder (e.g., path/to/extracted_folder). This folder contains the necessary binary files that need to be referenced when calling Server() in your python code.

You must start the browser proxy and configure it within the chrome driver options,

from browsermobproxy import Server
from selenium import webdriver

server = Server("path/to/extracted_folder/bin/browsermob-proxy")
server.start()
proxy = server.create_proxy()

# Configure the browser proxy in chrome options
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--proxy-server={0}".format(proxy.proxy))
browser = webdriver.Chrome(chrome_options = chrome_options)

#tag the har(network logs) with a name
proxy.new_har("google")

Proceed by navigating to a page using selenium

browser.get("http://www.google.co.in")

Upon navigation, retrieve the network logs in json format from the proxy

print(proxy.har) # returns a Network logs (HAR) as JSON

Prior to quitting the driver, remember to stop the proxy server as well at the end,

server.stop()
browser.quit()

Answer 5

Answer №3

Consider using selenium-wire as it offers a more effective solution, including the use of undetected-chromedriver to help circumvent bot detection.

Answer 6

Consider using selenium-wire as it offers a more effective solution, including the use of undetected-chromedriver to help circumvent bot detection.

Answer 7

Answer №4

If you're currently using selenium version 4.11, the code below might be of assistance.

import json
from selenium import webdriver

# Initializing Chrome WebDriver with performance logging enabled
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--enable-logging')
chrome_options.add_argument('--log-level=0')
chrome_options.set_capability('goog:loggingPrefs', {'performance': 'ALL'})
driver = webdriver.Chrome(options=chrome_options)

# Navigating to the specified website
driver.get("https://your-website.com")

# Gathering network log entries
log_entries = driver.get_log("performance")

# Setting up variables to hold the last known URL
last_known_url = None

# Setting up lists for request and response headers
request_headers_data = []
response_headers_data = []

for entry in log_entries:
    try:
        obj_serialized = entry.get("message")
        obj = json.loads(obj_serialized)
        message = obj.get("message")
        method = message.get("method")
        url = message.get("params", {}).get("documentURL")

        # Updating last known URL if available
        if url:
            last_known_url = url

        if method == 'Network.requestWillBeSentExtraInfo' or method == 'Network.requestWillBeSent':
            try:
                request_payload = message['params'].get('request', {})
                request_headers = request_payload.get('headers', {})
                # Storing request headers and last known URL in request_headers_data
                request_headers_data.append({"url": last_known_url, "headers": request_headers})
            except KeyError:
                pass

        if method == 'Network.responseReceivedExtraInfo' or method == 'Network.responseReceived':
            try:
                response_payload = message['params'].get('response', {})
                response_headers = response_payload.get('headers', {})
                # Storing response headers and last known URL in response_headers_data
                response_headers_data.append({"url": last_known_url, "headers": response_headers})
            except KeyError:
                pass

        if method == 'Network.loadingFinished':
            # Network request has finished, now you can access request_headers_data and response_headers_data
            print("Request Headers:")
            for request_data in request_headers_data:
                print("URL:", request_data["url"])
                print(request_data["headers"])
            print("Response Headers:")
            for response_data in response_headers_data:
                print("URL:", response_data["url"])
                print(response_data["headers"])
            print('--------------------------------------')
    except Exception as e:
        raise e from None

# Closing the WebDriver
driver.quit()

Answer 8

If you're currently using selenium version 4.11, the code below might be of assistance.

import json
from selenium import webdriver

# Initializing Chrome WebDriver with performance logging enabled
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--enable-logging')
chrome_options.add_argument('--log-level=0')
chrome_options.set_capability('goog:loggingPrefs', {'performance': 'ALL'})
driver = webdriver.Chrome(options=chrome_options)

# Navigating to the specified website
driver.get("https://your-website.com")

# Gathering network log entries
log_entries = driver.get_log("performance")

# Setting up variables to hold the last known URL
last_known_url = None

# Setting up lists for request and response headers
request_headers_data = []
response_headers_data = []

for entry in log_entries:
    try:
        obj_serialized = entry.get("message")
        obj = json.loads(obj_serialized)
        message = obj.get("message")
        method = message.get("method")
        url = message.get("params", {}).get("documentURL")

        # Updating last known URL if available
        if url:
            last_known_url = url

        if method == 'Network.requestWillBeSentExtraInfo' or method == 'Network.requestWillBeSent':
            try:
                request_payload = message['params'].get('request', {})
                request_headers = request_payload.get('headers', {})
                # Storing request headers and last known URL in request_headers_data
                request_headers_data.append({"url": last_known_url, "headers": request_headers})
            except KeyError:
                pass

        if method == 'Network.responseReceivedExtraInfo' or method == 'Network.responseReceived':
            try:
                response_payload = message['params'].get('response', {})
                response_headers = response_payload.get('headers', {})
                # Storing response headers and last known URL in response_headers_data
                response_headers_data.append({"url": last_known_url, "headers": response_headers})
            except KeyError:
                pass

        if method == 'Network.loadingFinished':
            # Network request has finished, now you can access request_headers_data and response_headers_data
            print("Request Headers:")
            for request_data in request_headers_data:
                print("URL:", request_data["url"])
                print(request_data["headers"])
            print("Response Headers:")
            for response_data in response_headers_data:
                print("URL:", response_data["url"])
                print(response_data["headers"])
            print('--------------------------------------')
    except Exception as e:
        raise e from None

# Closing the WebDriver
driver.quit()

Answer 9

Answer №5

If you prefer to view requests and responses in sequence, you can follow this approach.

import json
from selenium import webdriver

# Set up Chrome WebDriver with performance logging enabled
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--enable-logging')
chrome_options.add_argument('--log-level=0')
chrome_options.set_capability('goog:loggingPrefs', {'performance': 'ALL'})
driver = webdriver.Chrome(options=chrome_options)

# Access the desired website
driver.get("https://your-website.com")

# Gather network log entries
log_entries = driver.get_log("performance")

# Initialize dictionaries for storing request and response headers
requests_data = []
responses_data = []
latest_url = None  # Keep track of the most recent URL processed

for entry in log_entries:
    try:
        obj_serialized = entry.get("message")
        obj = json.loads(obj_serialized)
        message = obj.get("message")
        method = message.get("method")
        url = message.get("params", {}).get("documentURL")

        if method == 'Network.requestWillBeSentExtraInfo' or method == 'Network.requestWillBeSent':
            try:
                request_payload = message['params'].get('request', {})
                request_headers = request_payload.get('headers', {})
                # Store request headers and latest URL in requests_data
                requests_data.append({"url": url, "headers": request_headers})
                last_known_url = url
            except KeyError:
                pass

        if method == 'Network.responseReceivedExtraInfo' or method == 'Network.responseReceived':
            try:
                response_payload = message['params'].get('response', {})
                response_headers = response_payload.get('headers', {})
                # Store response headers and latest URL in responses_data
                responses_data.append({"url": url, "headers": response_headers})
                last_known_url = url
            except KeyError:
                pass

    except Exception as e:
        raise e from None

# Print out the headers sequentially
for request_data, response_data in zip(requests_data, responses_data):
    print("Request URL:", request_data["url"])
    print("Request Headers:", request_data["headers"])
    print("Response URL:", response_data["url"])
    print("Response Headers:", response_data["headers"])
    print('--------------------------------------')

# Close the WebDriver
driver.quit()

Answer 10

If you prefer to view requests and responses in sequence, you can follow this approach.

import json
from selenium import webdriver

# Set up Chrome WebDriver with performance logging enabled
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--enable-logging')
chrome_options.add_argument('--log-level=0')
chrome_options.set_capability('goog:loggingPrefs', {'performance': 'ALL'})
driver = webdriver.Chrome(options=chrome_options)

# Access the desired website
driver.get("https://your-website.com")

# Gather network log entries
log_entries = driver.get_log("performance")

# Initialize dictionaries for storing request and response headers
requests_data = []
responses_data = []
latest_url = None  # Keep track of the most recent URL processed

for entry in log_entries:
    try:
        obj_serialized = entry.get("message")
        obj = json.loads(obj_serialized)
        message = obj.get("message")
        method = message.get("method")
        url = message.get("params", {}).get("documentURL")

        if method == 'Network.requestWillBeSentExtraInfo' or method == 'Network.requestWillBeSent':
            try:
                request_payload = message['params'].get('request', {})
                request_headers = request_payload.get('headers', {})
                # Store request headers and latest URL in requests_data
                requests_data.append({"url": url, "headers": request_headers})
                last_known_url = url
            except KeyError:
                pass

        if method == 'Network.responseReceivedExtraInfo' or method == 'Network.responseReceived':
            try:
                response_payload = message['params'].get('response', {})
                response_headers = response_payload.get('headers', {})
                # Store response headers and latest URL in responses_data
                responses_data.append({"url": url, "headers": response_headers})
                last_known_url = url
            except KeyError:
                pass

    except Exception as e:
        raise e from None

# Print out the headers sequentially
for request_data, response_data in zip(requests_data, responses_data):
    print("Request URL:", request_data["url"])
    print("Request Headers:", request_data["headers"])
    print("Response URL:", response_data["url"])
    print("Response Headers:", response_data["headers"])
    print('--------------------------------------')

# Close the WebDriver
driver.quit()

Answer 11

Answer №6

When using the most recent version of Python Selenium, 4.1.0, the webdriver.get_log(self, log_type) function only supports retrieving four types of logs:

driver.get_log('browser')
driver.get_log('driver')
driver.get_log('client')
driver.get_log('server')

Unfortunately, it is not possible to retrieve performance logs using the driver.get_log function.

Answer 12

When using the most recent version of Python Selenium, 4.1.0, the webdriver.get_log(self, log_type) function only supports retrieving four types of logs:

driver.get_log('browser')
driver.get_log('driver')
driver.get_log('client')
driver.get_log('server')

Unfortunately, it is not possible to retrieve performance logs using the driver.get_log function.

Answer 13

Answer №7

If you're looking to capture network logs only up until the page finishes loading without including any AJAX or asynchronous network activity during regular usage, you can utilize the Performance Log feature. Find more information on how to implement this in ChromeDriver here: http://chromedriver.chromium.org/logging/performance-log

To enable Performance Logging for ChromeDriver, you can follow these steps:

DesiredCapabilities cap = DesiredCapabilities.chrome(); 
LoggingPreferences logPrefs = new LoggingPreferences(); 
logPrefs.enable(LogType.PERFORMANCE, Level.ALL); 
cap.setCapability(CapabilityType.LOGGING_PREFS, logPrefs); 
RemoteWebDriver driver = new RemoteWebDriver(new URL("http://127.0.0.1:9515"), cap);

You can also refer to this detailed example on the chromium performance-log page which contains Java and Python code snippets for accessing Performance Logs: https://gist.github.com/klepikov/5457750

Note that this method will only fetch network requests up to the point of page completion. Any subsequent performance logs will be retained until the page reloads.

If your goal is to capture network logs asynchronously while using the web page, consider utilizing BrowserMobProxy as a proxy server for your Selenium driver. This will allow you to monitor and retrieve all network requests through BrowserMobProxy's generated HAR file: https://github.com/lightbody/browsermob-proxy#using-with-selenium

// Start the proxy
BrowserMobProxy proxy = new BrowserMobProxyServer();
proxy.start(0);

// Get the Selenium proxy object
Proxy seleniumProxy = ClientUtil.createSeleniumProxy(proxy);

// Configure it as a desired capability
DesiredCapabilities capabilities = new DesiredCapabilities();
capabilities.setCapability(CapabilityType.PROXY, seleniumProxy);

// Launch the browser
WebDriver driver = new FirefoxDriver(capabilities);

// Enable comprehensive HAR capture if necessary (refer to CaptureType for the complete list)
proxy.enableHarCaptureTypes(CaptureType.REQUEST_CONTENT, CaptureType.RESPONSE_CONTENT);

// Create a new HAR labeled "yahoo.com"
proxy.newHar("yahoo.com");

// Navigate to yahoo.com
driver.get("http://yahoo.com");

// Retrieve the captured HAR data
Har har = proxy.getHar();

Upon obtaining the HAR file, you'll have access to a JSON-like compilation of network events for further analysis.

Answer 14

If you're looking to capture network logs only up until the page finishes loading without including any AJAX or asynchronous network activity during regular usage, you can utilize the Performance Log feature. Find more information on how to implement this in ChromeDriver here: http://chromedriver.chromium.org/logging/performance-log

To enable Performance Logging for ChromeDriver, you can follow these steps:

DesiredCapabilities cap = DesiredCapabilities.chrome(); 
LoggingPreferences logPrefs = new LoggingPreferences(); 
logPrefs.enable(LogType.PERFORMANCE, Level.ALL); 
cap.setCapability(CapabilityType.LOGGING_PREFS, logPrefs); 
RemoteWebDriver driver = new RemoteWebDriver(new URL("http://127.0.0.1:9515"), cap);

You can also refer to this detailed example on the chromium performance-log page which contains Java and Python code snippets for accessing Performance Logs: https://gist.github.com/klepikov/5457750

Note that this method will only fetch network requests up to the point of page completion. Any subsequent performance logs will be retained until the page reloads.

If your goal is to capture network logs asynchronously while using the web page, consider utilizing BrowserMobProxy as a proxy server for your Selenium driver. This will allow you to monitor and retrieve all network requests through BrowserMobProxy's generated HAR file: https://github.com/lightbody/browsermob-proxy#using-with-selenium

// Start the proxy
BrowserMobProxy proxy = new BrowserMobProxyServer();
proxy.start(0);

// Get the Selenium proxy object
Proxy seleniumProxy = ClientUtil.createSeleniumProxy(proxy);

// Configure it as a desired capability
DesiredCapabilities capabilities = new DesiredCapabilities();
capabilities.setCapability(CapabilityType.PROXY, seleniumProxy);

// Launch the browser
WebDriver driver = new FirefoxDriver(capabilities);

// Enable comprehensive HAR capture if necessary (refer to CaptureType for the complete list)
proxy.enableHarCaptureTypes(CaptureType.REQUEST_CONTENT, CaptureType.RESPONSE_CONTENT);

// Create a new HAR labeled "yahoo.com"
proxy.newHar("yahoo.com");

// Navigate to yahoo.com
driver.get("http://yahoo.com");

// Retrieve the captured HAR data
Har har = proxy.getHar();

Upon obtaining the HAR file, you'll have access to a JSON-like compilation of network events for further analysis.

Steps to obtain browser network logs with Python Selenium

Answer №1

Answer №2

Answer №3

Answer №4

Answer №5

Answer №6

Answer №7

Similar questions

Cycling through a lineup of proxy servers sequentially from 0 to 10

Python and Youtube API: Verify User's Login Status

Webdriver encounters difficulty locating elements using xpath or ID

Using Python to Generate Folders within a Google Cloud Storage Bucket

Error encountered: WebDriver session not found while using ChromeDriver with Selenium and Jasmine

Transferring content from a div class to an input class

Could you provide steps for verifying the functionality of the show password feature in selenium?

Go through a series of items and collect data for each item from a web browser, then add the collected data to a data frame as the end result

Why do Capybara/Rspec tests refuse to play nice together, yet excel when running solo?

Unsuccessful endeavor at web scraping using Selenium

Having trouble importing a Bokeh plot while attempting to integrate it with a Django upload button due to ImportError

Prevent unwanted ads from appearing on Selenium

I'm having trouble understanding the Python pipeline syntax. Can anyone provide an explanation

Locator for finding compound text within a div class using Selenium WebDriver

Retrieve a particular value from a JSON object

Solving the CORS problem between Vue.js and Flask: Troubleshooting XMLHttpRequest Blockade

Python lacks the ability to import Selenium framework

Using Python regex to extract numbers or words

What is the best way to combine two strings together?

Having trouble locating the XPath for a button within a frame