Locate the text of an element that is not visible on the screen while the code is running

Currently, I am diving into the world of web scraping using Selenium. To get some practical experience, I decided to extract promotions from this specific site:

https://i.stack.imgur.com/zKEDM.jpg

This is the code I have been working on:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

def get_promotion():
    '''
    Scraping data from Smiles promotions
    '''     
    promotions = []
    chrome_options = Options()
    chrome_options.add_argument("--disable-notifications")
    driver = webdriver.Chrome(options=chrome_options)
    driver.implicitly_wait(10)
    driver.get('https://www.smiles.com.br/home')
    site_promotion = driver.find_elements(By.CLASS_NAME, 'swiper-slide')  
    for promotion in site_promotion:
        promotions.append(
            { 
                'destination': promotion.find_element(By.XPATH, f'./a/div/div/h3').text,
                'origin': promotion.find_element(By.XPATH, f'./a/div/div/h4/span[2]').text,
                'diamont_value': promotion.find_element(By.XPATH, f'./a/div/div/div/span[2]/p[3]').text,
                'normal_value': promotion.find_element(By.XPATH, f'./a/div/div/div/span[2]/p[2]').text,
            }

        )
    return promotions

The issue lies in the fact that the last 3 cards' information did not get extracted, as shown below:

[
  {'destination': 'São Paulo (GRU)', 'origin': 'Maceió (MCZ)', 'diamont_value': '17.700', 'normal_value': '19.000'}, 
  {'destination': 'Rio de Janeiro (GIG)', 'origin': 'Recife (REC)', 'diamont_value': '19.500', 'normal_value': '21.000'}, 
  {'destination': 'Brasília (BSB)', 'origin': 'Recife (REC)', 'diamont_value': '17.700', 'normal_value': '19.000'}, 
  {'destination': 'Porto Seguro (BPS)', 'origin': 'Belo Horizonte (CNF)', 'diamont_value': '13.300', 'normal_value': '14.500'}, 
  {'destination': 'Goiânia (GYN)', 'origin': 'Palmas (PMW)', 'diamont_value': '11.500', 'normal_value': '12.500'}, 
  {'destination': '', 'origin': '', 'diamont_value': '', 'normal_value': ''}, 
  {'destination': '', 'origin': '', 'diamont_value': '', 'normal_value': ''}, 
  {'destination': '', 'origin': '', 'diamont_value': '', 'normal_value': ''}
]

Upon further investigation, I noticed that when the browser is launched using Selenium, it fails to display the last 3 cards:

https://i.stack.imgur.com/5t5gx.png

The challenge does not seem to be related to the presence of elements. Upon debugging, I confirmed that the last 3 elements within the site_promotion are indeed present.

Could it be that Selenium is getting confused because the last 3 cards are not visible on the screen? If so, how can this issue be resolved?

Is there a way to retrieve text from these elements even if they are not in the current view?


I attempted adding

options.add_argument("--start-maximized")
, but unfortunately, it only resulted in an empty list of promotions.

Answer №1

Try using .get_attribute('textContent') instead of .text

This change will transform the code snippet from:

promotions.append(
    { 
        'destination': promotion.find_element(By.XPATH, f'./a/div/div/h3').text,
        'origin': promotion.find_element(By.XPATH, f'./a/div/div/h4/span[2]').text,
        'diamont_value': promotion.find_element(By.XPATH, f'./a/div/div/div/span[2]/p[3]').text,
        'normal_value': promotion.find_element(By.XPATH, f'./a/div/div/div/span[2]/p[2]').text,
    }
)

to:

promotions.append(
    { 
        'destination': promotion.find_element(By.XPATH, f'./a/div/div/h3').get_attribute('textContent'),
        'origin': promotion.find_element(By.XPATH, f'./a/div/div/h4/span[2]').get_attribute('textContent'),
        'diamont_value': promotion.find_element(By.XPATH, f'./a/div/div/div/span[2]/p[3]').get_attribute('textContent'),
        'normal_value': promotion.find_element(By.XPATH, f'./a/div/div/div/span[2]/p[2]').get_attribute('textContent'),
    }
)

You can learn more about this topic by visiting this informative Source.

The article explains that both getText() and getAttribute() methods are utilized to extract information from an HTML element. While getText() retrieves visible text, getAttribute() fetches key-value pairs of attributes within HTML tags.

In cases where content is hidden off-screen but still present in the HTML source, using .text may not accurately capture the data. Make sure to use .get_attribute('textContent') for such scenarios.

I encountered a similar issue with a website containing visually hidden data despite being included in the HTML structure.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

The Python implementation of Fermat's little theorem

I am currently working on a Python implementation of Fermat's Little Theorem. However, the value being returned is not always a prime number. Any assistance or guidance would be greatly appreciated. To clarify, the theorem states that for any random n ...

Calculating the sum of two pandas dataframe values when the row and column index match

I encountered an issue with merging two dataframes of different sizes, where one dataframe is larger than the other but has fewer columns. The problem arises when attempting to combine the dataframes based on matching values in a specific column, in this ...

I am looking to create a custom method in Selenium WebDriver to effectively manage exceptions by checking for the presence of a specific element. This method will

I am currently using Selenium WebDriver to automate a webpage. Let me provide a brief overview of the webpage - when I input a valid ID and click on the search button, the results corresponding to that ID are displayed. However, if I enter an invalid ID, ...

The code execution results in an error message stating: "TypeError: unhashable type: 'numpy.ndarray'"

Below is the pandas DataFrame df in question: Col1 Col2 0 NaN Type1 1 NaN Type2 2 NaN Type1 3 A Type1 4 NaN Type1 I am looking to find the index of rows where Col1 is equal to NaN and Col2 is equal to Type1. Here is my at ...

The variable in the HTML input value is not fully visible when filling out the HTML form

I am working with a Python Flask code where I have passed a dictionary to an HTML form. The table in the form correctly displays both keys and values, but when trying to populate a form field with the value from the dictionary, only the first word is displ ...

Encountering difficulty transforming DataFrame into an HTML table

I am facing difficulties incorporating both the Name and Dataframe variables into my HTML code. Here's the code snippet I have: Name = "Tom" Body = Email_Body_Data_Frame html = """\ <html> <head> </he ...

Selenium facing issues with detecting a second window in Internet Explorer

When my application triggers a new window on button click, I encounter issues with the response of the getWindowHandles() method in selenium webdriver. It only returns one window id, especially when there is a delay in calling the getWindowHandles(). This ...

"Encountering a problem with a CSS dropdown using Selenium WebDriver in C#, where the element should have been a select

Below is an example of HTML code: <div class="rowElem fullSize "> <div class="jqTransformSelectWrapper" style="z-index: 10; width: 276px;"> <div> <span style="width: 245px;">MasterCard</span> <a class="jqTransformSelectOpe ...

Issue encountered while trying to display Twitter search outcomes

My apologies in advance for what may seem like a simple question, but I cannot seem to wrap my head around this issue. I would greatly appreciate any insights into why the following error is occurring within my codebase and for any other potential mistakes ...

IE browser is giving trouble as new window handles are mysteriously van

I recently encountered an issue with window switching in Internet Explorer 10 using Selenium 2.37.0, which I previously asked about here. After navigating to a page and clicking a button that opens a new link, my C# program failed to switch to the new win ...

In PYTHON, the while loop should be continued whenever an Exception is caught in the code

My loop is not functioning as expected when catching ZeroDivisionError or ValueError exceptions. Within this code, I have two functions - main() and convert(). In main(), I call the convert() function. When these exceptions occur, I want the question to be ...

Automating tasks with Selenium, Python, and Chromedriver scheduled through Cron

I've been running Selenium, Chromedriver and Python on a Raspberry Pi 5. Since there isn't really an ARM version of Chromedriver available for download, I found a version on APT according https://forums.raspberrypi.com/viewtopic.php?p=2155925#p2 ...

Tips on concealing QComboBox elements instead of removing them completely

Is there a way to temporarily hide QComboBox items without deleting them? Currently, the only method seems to be clearing all existing items with .clear() and then adding them back one by one using .addItem(). I prefer a solution where I can hide and unhi ...

Steps for running two different buttons in succession using selenium

Currently, I am using selenium to attempt the execution of two different buttons consecutively. However, I am facing various errors depending on the method I employ. Below is the snippet of code: wait=WebDriverWait(driver, 10) elem=wait.until(EC.element_to ...

Utilizing Python to transform lists into a two-dimensional numpy array

I have multiple lists that I need to transform into a 2D numpy array. list1 = [ 3, 9 , 6 , 4] list2 = [22 ,35, 49,28] list3 = [3.8, 5.2, 7.9, 6.4] The final numpy array should look like this: [[ 3. 22. 3.8] [ 9. 35. 5.2] [ 6. 49. 7.9 ...

Obtain the title of a Youtube video by utilizing the classname and text attribute with Selenium and Python

Greetings! I am currently utilizing Python Selenium Webdriver to retrieve the title of a Youtube video, however, the information obtained is more than desired. The specific line causing this issue is: driver.find_element_by_class_name("style-scope ytd ...

Discover how to efficiently locate the input box for the First Name using Selenium in Java with the new Relative Locator feature in Selenium 4

I am facing an issue where I cannot input text into the input field located to the right of the label "FirstName" while using Selenium 4 with toRightOf. WebDriverManager.chromedriver().setup(); WebDriver driver = new ChromeDriver(); driver.manage().windo ...

Executing Python code through a website using PHP - is there a way to deactivate the button once it has been clicked?

Can you assist with the following issue? <?php if(isset($_POST['MEASURE'])) { shell_exec("sudo python /var/www/html/lab/mkdir.py"); } ?> Here is the HTML part: <form method="post" > <input type="submi ...

Running Selenium tests using HTML in Node.js or Python WebDriver

I've been attempting to utilize the Ghost Inspector Chrome extension for test creation, but unfortunately running the tests has proven to be a challenge due to the website being offline and using HTTPS, which is causing compatibility issues with Ngrok ...

Having trouble connecting with the SafariDriver extension

During my e2e testing of an AngularJS web-app using protractor on Chrome and Firefox, I encountered an issue when attempting to include Safari in the array. The error message displayed was: "Unable to establish a connection with the SafariDriver extension ...