Our job site Selenium webscraper encounters a glitch at a particular step during its operation. How can this be resolved and pinpointed the root of the issue?

Several months ago, I created a web scraper to track job listings for a football club. Everything was running smoothly until about a week ago when the program started experiencing multiple issues.

Despite my efforts to troubleshoot and make changes to the code (primarily moving away from CSS selectors that were becoming unreliable over time), I now find myself facing a major problem with no clear solution in sight.

The intended operation of the scraper is straightforward: it should navigate to the website, locate job postings, open them in separate tabs, extract the relevant data, apply some formatting, and then move on to the next listing until every job is documented.

However, the current dilemma arises during the scraping process. The program successfully retrieves the initial link and extracts information (as indicated by output in the terminal), but it abruptly halts at line 61 where it fails to find the "linkTitle" attribute.

I have resorted to revisiting an older version of the code that had previously functioned flawlessly until recently. The sudden shift in behavior prompted me to transition from CSS selectors to Xpath selections, along with adjusting the find() method which had ceased to perform its intended task.

Why is this error occurring within the code? What is causing the breakdown on line 61?

Modifications have been made to address these issues.

You can view the revised code here:

line 61, in <module>
    link = jobitem.find("a", class="linkTitle")
           ^^^^^^^^^^^^^
AttributeError: 'WebElement' object has no attribute 'find'

The specific reason behind the failure at line 61 remains unclear to me. While observing the program in action, it appears to successfully identify and access links before encountering a halt in operation just prior to returning to the main tab containing job listings.

Answer №1

Seems like you're combining parsing methods from both Selenium web elements and BeautifulSoup.

jobitem.find("a", class="linkTitle")
follows the syntax of BeautifulSoup. However, the error indicates that jobitem is a WebElement from Selenium. Therefore, you're attempting to use a BeautifulSoup method on a Selenium WebElement object.

Running your code reveals no job listings, likely due to the dynamic nature of the page where

class="item-jkL6m item-AunLv"
does not exist, so further debugging is unnecessary.

Another reason for refraining from deeper investigation is the presence of an API. Utilizing this API provides data in a stable, reliable JSON format, unaffected by changes in HTML structure which may lead to script failures.

Consider the following example (adjust parameters accordingly):

[API Example Code]

Output: Displaying the first 5 out of 31 rows returned

[Sample Output Data]

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

What is the best way to input a combination of keyboard keys in Selenium WebDriver using Java?

I need to input the number 1999 into a text box using Selenium WebDriver (Java). However, the code I tried using to combine key strokes before sending them is not working: String allKeys = Keys.NUMPAD1 + Keys.NUMPAD9 + Keys.NUMPAD9 + Keys.NUMPAD9; An err ...

Is it possible to run a Cucumber Test with just one command, depending on the tags condition? For example, only execute the @Regression tag if the @sm

I am currently facing a need to develop a single run command or script file that can execute the @Smoke tag test first, and if it passes, continue on to execute the @Regression tag. If the Smoke test fails, then the execution should be aborted. We are wor ...

Tips for isolating page components and code logic in Selenium using Java

I am currently learning how to separate locators from the actual code in Selenium. I have managed to separate them so far, but I would like some guidance on optimizing the code further. Can the Page Object design model be used to not only store locators bu ...

Looking to retrieve the genre information from the IMDb movie page

I'm tackling the challenge of extracting a list of genres from any movie page on IMDb. For example: Movie page: List of Genres: [Crime, Drama, Mystery, Thriller] Despite my attempts with Beautiful Soup, I haven't been able to pinpoint the exa ...

How to Enhance HTML Requests with a Variety of Search Terms

I recently came across a helpful post on how to use R to search for news articles on Google. The post provides a link to Scraping Google News with Rvest for Keywords. The example in the post demonstrates searching for a single term, such as: keyword <- ...

Flash player is not compatible with PhantomJS on Windows devices

I am currently working on a project where I need to capture screenshots from multiple websites using a Python script. To accomplish this task, I am utilizing the following tools: - PhantomJS with Selenium - Python - Windows PC Initially, I trie ...

Issue when attempting to run Chimp / Webdriver.io / Selenium on Fargate with failure to start Chrome

While attempting to run my test cases on Fargate using Chimp, which relies on Webdriver.io / Selenium internally, I encountered an issue. The tests execute smoothly within a docker container on my EC2 instance. However, upon uploading the container to ECS ...

"Incompatibility between Ruby + watir-webdriver and Selenium Grid2 causes an exception

Using selenium-server-standalone-2.18.0.jar. Configured the hub and node on the same host. Executing the client code on a separate host. Observing in the hub console, I notice that one of the nodes has connected with 5 firefox icons. Upon running my clien ...

TimeoutException raised with message, screen, and stacktrace

I am a beginner when it comes to Python and Selenium, and I was attempting to try out an example code that I found on YouTube. Here is the code snippet: from selenium import webdriver from selenium.webdriver.common.keys import Keys from selenium.webdriver ...

The JavaScript function getSelection is malfunctioning, whereas getElementById is functioning perfectly

I am encountering a peculiar situation while trying to input text into a textbox using a JavaScript command. The CSS locator does not seem to update the text, whereas the ID locator is able to do so. URL: Below are screenshots of the browser console disp ...

The TimeoutException from selenium.common.exceptions has occurred: Notification

https://i.stack.imgur.com/telGx.png Experiencing the following issue with the line pst_hldr. Identified the error: File "/home/PycharmProjects/reditt/redit1.py", line 44, in get_links pst_hldr = wait.until(cond.visibility_of_element_locate ...

The input boxes are identical in attributes. How can I create a unique XPath for each one?

I encountered an issue on where two input boxes have the same name and attributes (Enter City Or Airport), causing my xpath to throw a "no such element" exception. Is there a workaround for this problem? Any help would be appreciated! Below is the code ...

Struggling to concentrate on a recently opened Selenium window

Having trouble focusing on a new window using Selenium and Java while running my application on Internet Explorer. The new window opens but I'm unable to interact with it. I've tried the following code: Set<String> allwindows = driver.getW ...

No Drag and Drop Functionality Available

Having trouble figuring out what's wrong with the code below.. it should be working fine. import java.util.concurrent.TimeUnit; import org.openqa.selenium.By; import org.openqa.selenium.WebDriver; import org.openqa.selenium.WebElement; import org.ope ...

Selenium and Docker: Struggling to load a personalized Chrome user profile

Hey there, I posted an issue a couple of days ago, but it seems like it's not really considered an issue. However, my problem still persists, and I'm hoping someone can help me figure out what's going on :D You can find the detailed discus ...

Extracting Dates from a Datepicker using Scrapy and Selenium

Currently, I am working on scraping pages like this or this using Scrapy and Selenium to interact with a datepicker calendar. My main goal is to check the availability of rooms for each day in a given month. To achieve this, I am attempting to click on th ...

I am looking to create a Python bot using Selenium that can monitor a Telegram channel and notify me when a new message is received. Can you guide

As a newcomer to Python and Selenium, my understanding of both is limited at this point. My Goal: I want to continuously monitor a Telegram channel for new messages without having to log in each time by using custom flags on the web version. Firstly, I i ...

"An error has occurred during the ANT build, resulting in java.lang.NoClass

I encountered an issue related to the classpath during the run phase. The error message reads as follows: run: [java] java.lang.NoClassDefFoundError: org/openqa/selenium/WebDriver [java] at java.lang.Class.getDeclaredMethods0(Native Method) [ ...

Is there a method to review all the details of a button on a website once Selenium has identified that specific element?

Could I potentially analyze the attributes of a button element that I have selected using selenium? I am currently utilizing selenium to navigate through complex JavaScript-based web pages. My goal is to download certain files from these pages, but before ...

Can Selenium code be automatically resumed when the browser reaches a specific URL?

Just wondering, can the Selenium code pick up where it left off as soon as the browser reaches a specific URL? ...