Steps to obtain browser network logs with Python Selenium

Seeking assistance on retrieving browser network logs through Selenium for troubleshooting request/responses. Any guidance would be greatly appreciated.

Currently utilizing Selenium 3.14.0 in conjunction with the most recent version of Chrome browser.

Answer №1

Utilizing python with selenium and firefox

Avoid setting up a proxy unless absolutely necessary. To retrieve outbound API requests, I implemented a solution similar to this one, but in python:

test = driver.execute_script("var performance = window.performance || window.mozPerformance || window.msPerformance || window.webkitPerformance || {}; var network = performance.getEntries() || {}; return network;")

for item in test:
  print(item)

An array of dictionaries is returned.

This method allows me to monitor all network requests made. It helps me extract a parameter from one of the requests, which I can then utilize for making my own requests to the API.

Using python with selenium and Chrome

UPDATE: This approach garnered significant attention. Here's how I currently implement it with Chrome (adapted from undetected-chromedriver code):

chrome_options = webdriver.ChromeOptions()
chrome_options.set_capability(
                        "goog:loggingPrefs", {"performance": "ALL", "browser": "ALL"}
                    )
driver = webdriver.Chrome(options=chrome_options)


##visit your website, login, etc. then:
log_entries = driver.get_log("performance")

for entry in log_entries:

    try:
        obj_serialized: str = entry.get("message")
        obj = json.loads(obj_serialized)
        message = obj.get("message")
        method = message.get("method")
        if method in ['Network.requestWillBeSentExtraInfo' or 'Network.requestWillBeSent']:
            try:
                for c in message['params']['associatedCookies']:
                    if c['cookie']['name'] == 'authToken':
                        bearer_token = c['cookie']['value']
            except:
                pass
        print(type(message), method)
        print('--------------------------------------')
    except Exception as e:
        raise e from None

By utilizing this method, you can extract tokens, api keys, and other information that your browser transmits to the server.

Answer №2

Utilizing Python and ChromeDriver for Network Logs

If you want to access network logs, make sure to install BrowserMobProxy in addition to selenium in python

pip install browsermob-proxy

Next, download the browsermobproxy zip file from

Unzip the contents into a designated folder (e.g., path/to/extracted_folder). This folder contains the necessary binary files that need to be referenced when calling Server() in your python code.

You must start the browser proxy and configure it within the chrome driver options,

from browsermobproxy import Server
from selenium import webdriver

server = Server("path/to/extracted_folder/bin/browsermob-proxy")
server.start()
proxy = server.create_proxy()

# Configure the browser proxy in chrome options
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--proxy-server={0}".format(proxy.proxy))
browser = webdriver.Chrome(chrome_options = chrome_options)

#tag the har(network logs) with a name
proxy.new_har("google")

Proceed by navigating to a page using selenium

browser.get("http://www.google.co.in")

Upon navigation, retrieve the network logs in json format from the proxy

print(proxy.har) # returns a Network logs (HAR) as JSON 

Prior to quitting the driver, remember to stop the proxy server as well at the end,

server.stop()
browser.quit()

Answer №3

Consider using selenium-wire as it offers a more effective solution, including the use of undetected-chromedriver to help circumvent bot detection.

Answer №4

If you're currently using selenium version 4.11, the code below might be of assistance.

import json
from selenium import webdriver

# Initializing Chrome WebDriver with performance logging enabled
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--enable-logging')
chrome_options.add_argument('--log-level=0')
chrome_options.set_capability('goog:loggingPrefs', {'performance': 'ALL'})
driver = webdriver.Chrome(options=chrome_options)

# Navigating to the specified website
driver.get("https://your-website.com")

# Gathering network log entries
log_entries = driver.get_log("performance")

# Setting up variables to hold the last known URL
last_known_url = None

# Setting up lists for request and response headers
request_headers_data = []
response_headers_data = []

for entry in log_entries:
    try:
        obj_serialized = entry.get("message")
        obj = json.loads(obj_serialized)
        message = obj.get("message")
        method = message.get("method")
        url = message.get("params", {}).get("documentURL")

        # Updating last known URL if available
        if url:
            last_known_url = url

        if method == 'Network.requestWillBeSentExtraInfo' or method == 'Network.requestWillBeSent':
            try:
                request_payload = message['params'].get('request', {})
                request_headers = request_payload.get('headers', {})
                # Storing request headers and last known URL in request_headers_data
                request_headers_data.append({"url": last_known_url, "headers": request_headers})
            except KeyError:
                pass

        if method == 'Network.responseReceivedExtraInfo' or method == 'Network.responseReceived':
            try:
                response_payload = message['params'].get('response', {})
                response_headers = response_payload.get('headers', {})
                # Storing response headers and last known URL in response_headers_data
                response_headers_data.append({"url": last_known_url, "headers": response_headers})
            except KeyError:
                pass

        if method == 'Network.loadingFinished':
            # Network request has finished, now you can access request_headers_data and response_headers_data
            print("Request Headers:")
            for request_data in request_headers_data:
                print("URL:", request_data["url"])
                print(request_data["headers"])
            print("Response Headers:")
            for response_data in response_headers_data:
                print("URL:", response_data["url"])
                print(response_data["headers"])
            print('--------------------------------------')
    except Exception as e:
        raise e from None

# Closing the WebDriver
driver.quit()

Answer №5

If you prefer to view requests and responses in sequence, you can follow this approach.

import json
from selenium import webdriver

# Set up Chrome WebDriver with performance logging enabled
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--enable-logging')
chrome_options.add_argument('--log-level=0')
chrome_options.set_capability('goog:loggingPrefs', {'performance': 'ALL'})
driver = webdriver.Chrome(options=chrome_options)

# Access the desired website
driver.get("https://your-website.com")

# Gather network log entries
log_entries = driver.get_log("performance")

# Initialize dictionaries for storing request and response headers
requests_data = []
responses_data = []
latest_url = None  # Keep track of the most recent URL processed

for entry in log_entries:
    try:
        obj_serialized = entry.get("message")
        obj = json.loads(obj_serialized)
        message = obj.get("message")
        method = message.get("method")
        url = message.get("params", {}).get("documentURL")

        if method == 'Network.requestWillBeSentExtraInfo' or method == 'Network.requestWillBeSent':
            try:
                request_payload = message['params'].get('request', {})
                request_headers = request_payload.get('headers', {})
                # Store request headers and latest URL in requests_data
                requests_data.append({"url": url, "headers": request_headers})
                last_known_url = url
            except KeyError:
                pass

        if method == 'Network.responseReceivedExtraInfo' or method == 'Network.responseReceived':
            try:
                response_payload = message['params'].get('response', {})
                response_headers = response_payload.get('headers', {})
                # Store response headers and latest URL in responses_data
                responses_data.append({"url": url, "headers": response_headers})
                last_known_url = url
            except KeyError:
                pass

    except Exception as e:
        raise e from None

# Print out the headers sequentially
for request_data, response_data in zip(requests_data, responses_data):
    print("Request URL:", request_data["url"])
    print("Request Headers:", request_data["headers"])
    print("Response URL:", response_data["url"])
    print("Response Headers:", response_data["headers"])
    print('--------------------------------------')

# Close the WebDriver
driver.quit()

Answer №6

When using the most recent version of Python Selenium, 4.1.0, the webdriver.get_log(self, log_type) function only supports retrieving four types of logs:

driver.get_log('browser')
driver.get_log('driver')
driver.get_log('client')
driver.get_log('server')

Unfortunately, it is not possible to retrieve performance logs using the driver.get_log function.

Answer №7

If you're looking to capture network logs only up until the page finishes loading without including any AJAX or asynchronous network activity during regular usage, you can utilize the Performance Log feature. Find more information on how to implement this in ChromeDriver here: http://chromedriver.chromium.org/logging/performance-log

To enable Performance Logging for ChromeDriver, you can follow these steps:

DesiredCapabilities cap = DesiredCapabilities.chrome(); 
LoggingPreferences logPrefs = new LoggingPreferences(); 
logPrefs.enable(LogType.PERFORMANCE, Level.ALL); 
cap.setCapability(CapabilityType.LOGGING_PREFS, logPrefs); 
RemoteWebDriver driver = new RemoteWebDriver(new URL("http://127.0.0.1:9515"), cap);

You can also refer to this detailed example on the chromium performance-log page which contains Java and Python code snippets for accessing Performance Logs: https://gist.github.com/klepikov/5457750

Note that this method will only fetch network requests up to the point of page completion. Any subsequent performance logs will be retained until the page reloads.


If your goal is to capture network logs asynchronously while using the web page, consider utilizing BrowserMobProxy as a proxy server for your Selenium driver. This will allow you to monitor and retrieve all network requests through BrowserMobProxy's generated HAR file: https://github.com/lightbody/browsermob-proxy#using-with-selenium

// Start the proxy
BrowserMobProxy proxy = new BrowserMobProxyServer();
proxy.start(0);

// Get the Selenium proxy object
Proxy seleniumProxy = ClientUtil.createSeleniumProxy(proxy);

// Configure it as a desired capability
DesiredCapabilities capabilities = new DesiredCapabilities();
capabilities.setCapability(CapabilityType.PROXY, seleniumProxy);

// Launch the browser
WebDriver driver = new FirefoxDriver(capabilities);

// Enable comprehensive HAR capture if necessary (refer to CaptureType for the complete list)
proxy.enableHarCaptureTypes(CaptureType.REQUEST_CONTENT, CaptureType.RESPONSE_CONTENT);

// Create a new HAR labeled "yahoo.com"
proxy.newHar("yahoo.com");

// Navigate to yahoo.com
driver.get("http://yahoo.com");

// Retrieve the captured HAR data
Har har = proxy.getHar();

Upon obtaining the HAR file, you'll have access to a JSON-like compilation of network events for further analysis.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Cycling through a lineup of proxy servers sequentially from 0 to 10

When working on my Python script, I need to input 3 pieces of data named A, B, and C. To make this more efficient, I am looking to organize the data into arrays such as arrayA = [A1, A2 ... A10], arrayB, and arrayC. Each array will contain different variat ...

Python and Youtube API: Verify User's Login Status

I'm currently utilizing a code snippet for logging in to YouTube: def auth(email, password): # Setting up the service. yt_service = gdata.youtube.service.YouTubeService() yt_service.email = email yt_service.password = password yt ...

Webdriver encounters difficulty locating elements using xpath or ID

After writing a code to automate signups for a sneakers raffle, I encountered a frustrating issue: the webdriver couldn't locate the elements on the page, which is quite perplexing. I've shared the link to the website and included a snippet of my ...

Using Python to Generate Folders within a Google Cloud Storage Bucket

Currently, I am attempting to utilize the Google Cloud Storage Python client library to generate a new bucket that contains two empty folders. Despite referencing the Python client library API for GCS (https://google-cloud-python.readthedocs.io/en/latest/s ...

Error encountered: WebDriver session not found while using ChromeDriver with Selenium and Jasmine

Currently, I am working on creating an automated test for my website using a combination of Jasmine and selenium. However, when I run the tests on chrome with chromedriver, I encounter the error below randomly. This issue occurs frequently enough that it ...

Transferring content from a div class to an input class

I am seeking help to extract text from a div class and transfer it to an input class. Here is the code I have written: import os import time from selenium import webdriver from pyvirtualdisplay import Display from selenium.webdriver.common.by import By fr ...

Could you provide steps for verifying the functionality of the show password feature in selenium?

What is the best way to verify that the show password feature is functioning correctly? Which checkbox property should I inspect after clicking on the 'show password' checkbox to confirm that the entered text in the password field is visible? ...

Go through a series of items and collect data for each item from a web browser, then add the collected data to a data frame as the end result

I've been working on extracting stock market data from web browsers. So far, I've managed to extract data for a single stock successfully. Below is the Python code snippet for extracting data for a single stock using Selenium Webdriver to open t ...

Why do Capybara/Rspec tests refuse to play nice together, yet excel when running solo?

Recently, I encountered an interesting issue with my rspec tests. When running them individually using Selenium as the driver, everything worked fine without any errors. However, when attempting to run multiple tests at once or putting them in the same fi ...

Unsuccessful endeavor at web scraping using Selenium

Having faced challenges using just BeautifulSoup in a previous attempt, I decided to switch to Selenium for this task. The goal of the script is to retrieve subtitles for a specific TV show or movie as defined. Upon reviewing the code, you'll notice s ...

Having trouble importing a Bokeh plot while attempting to integrate it with a Django upload button due to ImportError

I'm having difficulty integrating a Bokeh plot into a Django website that includes an upload button. I started with the example provided here and then followed additional instructions on embedding from here. I have successfully used need-a-minimal-dja ...

Prevent unwanted ads from appearing on Selenium

Currently, I am utilizing Selenium for a quality assurance automation assignment in my academic endeavors. The website I am testing is not under my control or familiar to me. During the execution of my test cases, I have observed that occasionally adverti ...

I'm having trouble understanding the Python pipeline syntax. Can anyone provide an explanation

I'm having some trouble understanding the function of each step in this particular pipeline. Could someone provide a detailed explanation of how this pipeline is functioning? I have a general idea, but more clarity would be greatly appreciated. Wha ...

Locator for finding compound text within a div class using Selenium WebDriver

I am struggling to select a specific button in the UI that shares similarities with other elements. Below is the code snippet for the button in question: <div class="ui green ok inverted button"> <i class="checkmark icon"></i> Yes </d ...

Retrieve a particular value from a JSON object

I'm currently working with a json file that has a specific structure. It includes an 'edition' and various 'attributes' with trait types and corresponding values. For instance, 'Background': 'Gray', 'Base&a ...

Solving the CORS problem between Vue.js and Flask: Troubleshooting XMLHttpRequest Blockade

Description: Currently working on developing a web application utilizing Vue.js for the frontend and Flask for the backend. The initial phase involves creating a simple login page, but encountering CORS (Cross-Origin Resource Sharing) issues when making r ...

Python lacks the ability to import Selenium framework

Recently delving into Python, I decided to utilize selenium for browsing the web. Unfortunately, I encountered the following error message - any suggestions on how to resolve this? from selenium import webdriver Traceback (most recent call last): Fil ...

Using Python regex to extract numbers or words

Looking for assistance with matching and replacing strings in Python 2.7 that follow the format: {2 digit day of the month} {the exact words de or del} {4 digit year}. The goal is to replace this substring with just {2 digit day of the month} {4 digit ye ...

What is the best way to combine two strings together?

Here's the issue at hand: I have a random string and a random pattern, and I need to find all possible combinations of that pattern within the string. These combinations should be marked with [target] at the beginning and [endtarget] at the end. For ...

Having trouble locating the XPath for a button within a frame

My attempts to access the following website were unsuccessful: https://www.google.com/recaptcha/api2/demo I tried clicking on this button: https://i.stack.imgur.com/0q229.png Then, I attempted to click on this button: https://i.stack.imgur.com/LvTjG.pn ...