Locate the text of an element that is not visible on the screen while the code is running

Question

Locate the text of an element that is not visible on the screen while the code is running

Currently, I am diving into the world of web scraping using Selenium. To get some practical experience, I decided to extract promotions from this specific site:

https://i.stack.imgur.com/zKEDM.jpg

This is the code I have been working on:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

def get_promotion():
    '''
    Scraping data from Smiles promotions
    '''     
    promotions = []
    chrome_options = Options()
    chrome_options.add_argument("--disable-notifications")
    driver = webdriver.Chrome(options=chrome_options)
    driver.implicitly_wait(10)
    driver.get('https://www.smiles.com.br/home')
    site_promotion = driver.find_elements(By.CLASS_NAME, 'swiper-slide')  
    for promotion in site_promotion:
        promotions.append(
            { 
                'destination': promotion.find_element(By.XPATH, f'./a/div/div/h3').text,
                'origin': promotion.find_element(By.XPATH, f'./a/div/div/h4/span[2]').text,
                'diamont_value': promotion.find_element(By.XPATH, f'./a/div/div/div/span[2]/p[3]').text,
                'normal_value': promotion.find_element(By.XPATH, f'./a/div/div/div/span[2]/p[2]').text,
            }

        )
    return promotions

The issue lies in the fact that the last 3 cards' information did not get extracted, as shown below:

[
  {'destination': 'São Paulo (GRU)', 'origin': 'Maceió (MCZ)', 'diamont_value': '17.700', 'normal_value': '19.000'}, 
  {'destination': 'Rio de Janeiro (GIG)', 'origin': 'Recife (REC)', 'diamont_value': '19.500', 'normal_value': '21.000'}, 
  {'destination': 'Brasília (BSB)', 'origin': 'Recife (REC)', 'diamont_value': '17.700', 'normal_value': '19.000'}, 
  {'destination': 'Porto Seguro (BPS)', 'origin': 'Belo Horizonte (CNF)', 'diamont_value': '13.300', 'normal_value': '14.500'}, 
  {'destination': 'Goiânia (GYN)', 'origin': 'Palmas (PMW)', 'diamont_value': '11.500', 'normal_value': '12.500'}, 
  {'destination': '', 'origin': '', 'diamont_value': '', 'normal_value': ''}, 
  {'destination': '', 'origin': '', 'diamont_value': '', 'normal_value': ''}, 
  {'destination': '', 'origin': '', 'diamont_value': '', 'normal_value': ''}
]

Upon further investigation, I noticed that when the browser is launched using Selenium, it fails to display the last 3 cards:

https://i.stack.imgur.com/5t5gx.png

The challenge does not seem to be related to the presence of elements. Upon debugging, I confirmed that the last 3 elements within the site_promotion are indeed present.

Could it be that Selenium is getting confused because the last 3 cards are not visible on the screen? If so, how can this issue be resolved?

Is there a way to retrieve text from these elements even if they are not in the current view?

I attempted adding

options.add_argument("--start-maximized")

, but unfortunately, it only resulted in an empty list of promotions.

python selenium selenium-webdriver web-scraping selenium-chromedriver

Answer 1

Answer №1

Try using .get_attribute('textContent') instead of .text

This change will transform the code snippet from:

promotions.append(
    { 
        'destination': promotion.find_element(By.XPATH, f'./a/div/div/h3').text,
        'origin': promotion.find_element(By.XPATH, f'./a/div/div/h4/span[2]').text,
        'diamont_value': promotion.find_element(By.XPATH, f'./a/div/div/div/span[2]/p[3]').text,
        'normal_value': promotion.find_element(By.XPATH, f'./a/div/div/div/span[2]/p[2]').text,
    }
)

to:

promotions.append(
    { 
        'destination': promotion.find_element(By.XPATH, f'./a/div/div/h3').get_attribute('textContent'),
        'origin': promotion.find_element(By.XPATH, f'./a/div/div/h4/span[2]').get_attribute('textContent'),
        'diamont_value': promotion.find_element(By.XPATH, f'./a/div/div/div/span[2]/p[3]').get_attribute('textContent'),
        'normal_value': promotion.find_element(By.XPATH, f'./a/div/div/div/span[2]/p[2]').get_attribute('textContent'),
    }
)

You can learn more about this topic by visiting this informative Source.

The article explains that both getText() and getAttribute() methods are utilized to extract information from an HTML element. While getText() retrieves visible text, getAttribute() fetches key-value pairs of attributes within HTML tags.

In cases where content is hidden off-screen but still present in the HTML source, using .text may not accurately capture the data. Make sure to use .get_attribute('textContent') for such scenarios.

I encountered a similar issue with a website containing visually hidden data despite being included in the HTML structure.

Answer 2

Try using .get_attribute('textContent') instead of .text

This change will transform the code snippet from:

promotions.append(
    { 
        'destination': promotion.find_element(By.XPATH, f'./a/div/div/h3').text,
        'origin': promotion.find_element(By.XPATH, f'./a/div/div/h4/span[2]').text,
        'diamont_value': promotion.find_element(By.XPATH, f'./a/div/div/div/span[2]/p[3]').text,
        'normal_value': promotion.find_element(By.XPATH, f'./a/div/div/div/span[2]/p[2]').text,
    }
)

to:

promotions.append(
    { 
        'destination': promotion.find_element(By.XPATH, f'./a/div/div/h3').get_attribute('textContent'),
        'origin': promotion.find_element(By.XPATH, f'./a/div/div/h4/span[2]').get_attribute('textContent'),
        'diamont_value': promotion.find_element(By.XPATH, f'./a/div/div/div/span[2]/p[3]').get_attribute('textContent'),
        'normal_value': promotion.find_element(By.XPATH, f'./a/div/div/div/span[2]/p[2]').get_attribute('textContent'),
    }
)

You can learn more about this topic by visiting this informative Source.

The article explains that both getText() and getAttribute() methods are utilized to extract information from an HTML element. While getText() retrieves visible text, getAttribute() fetches key-value pairs of attributes within HTML tags.

In cases where content is hidden off-screen but still present in the HTML source, using .text may not accurately capture the data. Make sure to use .get_attribute('textContent') for such scenarios.

I encountered a similar issue with a website containing visually hidden data despite being included in the HTML structure.

Locate the text of an element that is not visible on the screen while the code is running

Answer №1

Similar questions

The Python implementation of Fermat's little theorem

Calculating the sum of two pandas dataframe values when the row and column index match

I am looking to create a custom method in Selenium WebDriver to effectively manage exceptions by checking for the presence of a specific element. This method will

The code execution results in an error message stating: "TypeError: unhashable type: 'numpy.ndarray'"

The variable in the HTML input value is not fully visible when filling out the HTML form

Encountering difficulty transforming DataFrame into an HTML table

Selenium facing issues with detecting a second window in Internet Explorer

"Encountering a problem with a CSS dropdown using Selenium WebDriver in C#, where the element should have been a select

Issue encountered while trying to display Twitter search outcomes

IE browser is giving trouble as new window handles are mysteriously van

In PYTHON, the while loop should be continued whenever an Exception is caught in the code

Automating tasks with Selenium, Python, and Chromedriver scheduled through Cron

Tips on concealing QComboBox elements instead of removing them completely

Steps for running two different buttons in succession using selenium

Utilizing Python to transform lists into a two-dimensional numpy array

Obtain the title of a Youtube video by utilizing the classname and text attribute with Selenium and Python

Discover how to efficiently locate the input box for the First Name using Selenium in Java with the new Relative Locator feature in Selenium 4

Executing Python code through a website using PHP - is there a way to deactivate the button once it has been clicked?

Running Selenium tests using HTML in Node.js or Python WebDriver

Having trouble connecting with the SafariDriver extension