Selenium is only able to retrieve a single result at a time, disregarding any other connected outcomes

Question

Selenium is only able to retrieve a single result at a time, disregarding any other connected outcomes

Recently delving into the world of selenium, I encountered an issue while trying to scrape a website with 10 results per page displayed in lists (li tags). Each list contains the same attributes and when certain conditions are met, I navigate to another related page to extract desired content. However, my code fails to find the same attributes for subsequent lists after looping through the initial ones. Below is the snippet of my code:

p_url = "https://www.linkedin.com/vsearch/f?keywords=BARCO%2BNV%2Bkortrijk&pt=people&page_num=5"             
driver.get(p_url)

time.sleep(5)

results = driver.find_element_by_id("results-container")
employees = results.find_elements_by_tag_name('li')

for emp in employees:
    try:

        main_emp = emp.find_element_by_css_selector("a.title.main-headline")
        name = emp.find_element_by_css_selector("a.title.main-headline").text
        href = main_emp.get_attribute("href")

        if name != "LinkedIn Member":
            location = emp.find_element_by_class_name("demographic").text
            href = main_emp.get_attribute("href")
            print(href)
            print(location)

            driver.get(href)
            exp = driver.find_element_by_id("background-experience")

            amkk = exp.find_elements_by_class_name("editable-item")

            for amk in amkk:
                him = amk.find_element_by_tag_name("header").text
                him2 = amk.find_element_by_class_name("experience-date-locale").text

                if '\n' in him:
                    a = him.split('\n')
                    print(a[0])
                    print(a[1])

                print(him2)

    except Exception as exc:
        print(exc)
        continue

The line

main_emp = emp.find_element_by_css_selector("a.title.main-headline")

in this code stops functioning after the first iteration, resulting in a

Message: stale element reference: element is not attached to the page document

error.

After researching on stackoverflow, some users suggest that the content might be removed from the DOM structure, while others recommend storing the results in a list. I attempted to create a list of elements like so

emp_list = []
for i in range(len(employees)):
    emp_list.append(employees[i])

, but it did not resolve the issue.

What steps can I take to overcome this obstacle?

python-3.x selenium

Answer 1

Answer №1

You are not using the correct selector in your code. While you are able to grab results using the results-container id, the issue arises when collecting elements from it. This is resulting in more elements being retrieved than just the employees (the reason for this is unclear).

To fix this issue, try using a single selector that specifically targets only the employees without any unwanted elements.

employees = results.find_elements_by_css_selector("ol[id='results']>li")

Edit: If you find yourself navigating through the employees and then losing track of the list of elements, consider opening each employee in a new tab, performing actions there, and then closing the tab afterwards.

Example:

for emp in employees:
    try:
        main_emp = emp.find_element_by_css_selector("a.title.main-headline")
        # Do necessary actions...

        # Open employee in a new tab (make sure Keys is imported)
        main_emp.send_keys(Keys.CONTROL + 't')
        
        # Switch focus to the new tab
        driver.switch_to_window(d.window_handles[1])

        # Perform actions within the employee page
        
        # Close the opened tab
        driver.close()
        
        # Return to the original tab
        driver.switch_to_window(d.window_handles[0])

Note: For MacOS, use

main_emp.send_keys(Keys.COMMAND + 't')

Answer 2

You are not using the correct selector in your code. While you are able to grab results using the results-container id, the issue arises when collecting elements from it. This is resulting in more elements being retrieved than just the employees (the reason for this is unclear).

To fix this issue, try using a single selector that specifically targets only the employees without any unwanted elements.

employees = results.find_elements_by_css_selector("ol[id='results']>li")

Edit: If you find yourself navigating through the employees and then losing track of the list of elements, consider opening each employee in a new tab, performing actions there, and then closing the tab afterwards.

Example:

for emp in employees:
    try:
        main_emp = emp.find_element_by_css_selector("a.title.main-headline")
        # Do necessary actions...

        # Open employee in a new tab (make sure Keys is imported)
        main_emp.send_keys(Keys.CONTROL + 't')
        
        # Switch focus to the new tab
        driver.switch_to_window(d.window_handles[1])

        # Perform actions within the employee page
        
        # Close the opened tab
        driver.close()
        
        # Return to the original tab
        driver.switch_to_window(d.window_handles[0])

Note: For MacOS, use

main_emp.send_keys(Keys.COMMAND + 't')

Selenium is only able to retrieve a single result at a time, disregarding any other connected outcomes

Answer №1

Similar questions

Guide on verifying the clickability of a button with selenium webdriver

Why is it that I'm encountering this basic mistake in Selenium?

Creating Selenium reports with Jenkins (formerly known as Hudson) from JUnit XML files

Utilize Selenium C# to locate and update information within a text file

What is the best method to override a class attribute that is two levels deep?

Leveraging Python for populating a text field on a website using an If Statement

Guide for setting the executable path of the Chrome driver to match the operating system. My goal is to create a generic executable path for the Selenium driver using Python

In relation to the List function within the Python programming language

Tips for utilizing the "next_in_line" attribute in place of "super"

Tips on enabling direct downloads for various file formats (bzip2 and csv) using RSelenium

Getting the text of an element in Selenium Webdriver: A step-by-step guide

Selenium is unable to function properly with a chromedriver that has been altered to evade detection

Having trouble utilizing Webdriver Manager on my Raspberry Pi, encountering the issue "Unable to retrieve Firefox version using this command: Firefox --version"

Selenium test is interrupted by a Firefox pop-up blocking part of the screen

The IgnoreExceptionTypes feature in Selenium C# DefaultWait does not function as intended

Use Node.js with Selenium and WebdriverIO to simulate the ENTER keypress action on

What is the best way to ensure that an Odoo computed relation field automatically calculates itself during the import process

Using JavaScript in Selenium, you can check a checkbox and verify whether it is checked

The way Xpath is used to locate the same checkbox element can vary between Chrome and Mozilla browsers when working with Selenium in Python

Converting the output to JSON format for archival purposes