At what point should I end the session with the webdriver?

My web scraping process involves using both Scrapy and Selenium. When I run my spider, I notice the instance of my webdriver in htop.

I am unsure about when exactly I should close the webdriver in my code. Would it be best to do so after processing each link or at the end of the script?

def parse():
    # I have all links in my array_links
    for link in self.array_links:
        self.driver.get(link)

        # Here i Parse the products
        item = MyTestItem()
        item['test1'] = "test"
        yield item 

I considered adding this code snippet:

def __del__(self):
    self.driver.quit()

to automatically close the webdriver at the end of the script. However, I still question if closing the webdriver after processing each link would be more efficient.

Any advice on this matter would be greatly appreciated. Thank you.

Answer №1

If one wonders where to open it, the answer lies in knowing when to open and close it.

When your webdriver exists within the spider's context, the best approach is to open it upon spider initiation and shut it down upon spider termination.

This can be accomplished by utilizing the open_spider and close_spider signals:

from scrapy import signals
from scrapy import Spider


class MySpider(Spider):
    name = "spideroo"

    @classmethod
    def from_crawler(cls, crawler, *args, **kwargs):
        spider = super().from_crawler(crawler, *args, **kwargs)
        crawler.signals.connect(spider.spider_closed, signal=signals.spider_closed)
        crawler.signals.connect(spider.spider_opened, signal=signals.spider_opened)
        return spider

    def spider_opened(self, spider):
        self.driver = selenium.WebDriver()  # or what's your driver's class is.

    def spider_closed(self, spider):
        self.driver.close()

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Using Python to control the GPIO to activate an LED by pressing a push

My goal is to control an LED using a Raspberry Pi. The LED should turn on when I press a button and stay in that state until I press the button again. I've written the code below, which works fine. However, I encounter issues when I don't press ...

Effortless content extraction between HTML tags using BeautifulSoup in Python

My Weather Data Extraction Project: I'm currently developing a webpage scraper to collect weather information. This is the progress I have made so far: import urllib.request from bs4 import BeautifulSoup # opening the webpage and storing the conten ...

Issue with Python list indexing within a for loop containing an inner while loop

Provided below is my optimized code: from collections import Counter class Solution: def minWindow(self, s: str, t: str) -> str: left = 0 right = float("inf") ref = Counter(t) necessary_count = sum(ref.values()) ...

The performance of the Python 2.7 DDos Script leaves much to be desired due

Currently, I am experimenting with creating a DDos Script for educational purposes. However, I have encountered a setback as it is operating at a slower pace and utilizing only around 0.8Mb of my upload speed out of the available 20Mb. UPDATE 3 To addres ...

Tips for extracting file paths correctly from the tkinterDND2 event object

Looking to add a "drop files here" feature to my GUI application built with tkinter in Python. I discovered the TkinterDnD2 library after finding this useful Stack Overflow response. After dropping a file, event.data returns the file name within curly br ...

Mastering the art of clicking on a button using Java Selenium

My goal is to first click the button and then automatically select the dropdown menu element labeled Tüm Soru Tipleri using Selenium in Java. I have tried the following code which did not work: driver.findElement(By.id("select2-question_types-sq-res ...

Struggling with Python version selection when creating egg files in Eclipse for Python development

My current setup involves using CentOS with Python 2.6 (/usr/bin/python2.6), but I have also installed Python 2.7.8 (/usr/local/lib/python2.7). However, when running a script in Eclipse, the egg files are being created in /usr/bin/python2.6/.. instead of ...

The functionality of preserver-order=true is not supported in Selenium Grid

@BeforeTest @Parameters({"selenium.host", "selenium.port", "selenium.browser", "selenium.url" }) public void startServer(String host, String port, String browser, String url) throws Exception { selenium = new DefaultSelenium(host, Integer.par ...

The Sklearn KNN Imputer has gaps in its data

After attempting to fill NaN values in a column using the KNN imputer from Sk-learn, I noticed that some of the NaNs were still present in the imputed column. What could be causing this issue? I have already compared the count of NaNs before and after the ...

Error: ChromeDriver does not support the current version of Chrome. The driver only works with Chrome version 94, but the browser is currently at version 93.0.4577

Attempting to create a basic selenium script to navigate and interact with links on a specific website. Here is an example of the script: from selenium import webdriver import time chrome_options = webdriver.ChromeOptions() chrome_options.add_argument(&qu ...

Regular Expression - Replace all non-alphanumeric characters and accents with empty strings

Is there a way to remove all special characters except alphanumeric and accents? I attempted the following: text = 'abcdeáéí.@# ' re.sub(r'[^a-zA-Z0-9áéíóúÁÉÍÓÚâêîôÂÊÎÔãõÃÕçÇ: ]', ' ', text) Unfor ...

Why is my class not getting recognized by my Python package?

I am in the process of transforming my extensive Python script into a package. My file structure is as follows: bin/foob # The primary Python script lib/foob/__init__.py The lib/foob/__init__.py file contains a single defined class: class Node(object): ...

WebDriverException was encountered while attempting to establish a connection with localhost/0:0:0:0:0:0:0:1:1941 using GeckoDriver and Selenium due to a java.net.ConnectException

I am currently facing an issue while trying to set up Eclipse for writing a basic automated test. Specifically, I am encountering difficulties in launching Firefox. Below is the code I have written along with the error stack that follows: Here's the ...

Leveraging SQL Server File Streaming with Python

My goal is to utilize SQL Server 2017 filestream in a Python environment. Since I rely heavily on SQLAlchemy for functionality, I am seeking a way to integrate filestream support into my workflow. Despite searching, I have not come across any implementatio ...

What could be causing selenium not to click in Python?

Recently, I began using the Selenium library to automate button clicks on YouTube. Initially, everything was working smoothly until one day it suddenly stopped functioning without any changes made on my end. Strangely, when I ran the code on a different ...

Using the key from a nested CSV file as the primary key in Python

Sorry for the confusing title. I have a csv file with rows structured like this: 1234567, Install X Software on Y Machine, ServiceTicket,"{'id': 47, 'name': 'SERVICE BOARD', '_info': {'board_href': ' ...

random identifier selection using selenium's xpath and css selector algorithm

Hello, I am a newcomer to using Selenium with Python. I am facing an issue where the id, xpath, and css selector contain random values each time I navigate to the page. I have tried using various methods like xpath, id, css selector, and even class name to ...

The Selenium Python module is having difficulty locating the element for the name or email address

Currently, I am attempting to automate the login process on a website using Selenium with Python. Unfortunately, I encountered an error message shown below. Traceback (most recent call last): File "C:\Users\KienThong\Automation\L ...

The error message thrown by bcrypt.checkpw states: "Encoding of Unicode-objects is required before proceeding with

When using bcrypt.checkpw to compare an unencrypted password with a hashed password stored in the credential database, I encountered an issue: An error occurred: TypeError: Unicode-objects must be encoded before checking How can this problem be resolve ...

Navigate through URLs without using wildcards

Can someone please help me with this Python code? It's meant to loop through URLs using a wildcard* that replaces the unique id for each match. The issue I'm facing is that the text to be wildcarded sits between the invariant part of the URL and ...