Code to retrieve text from the project issue description using Selenium

Question

Code to retrieve text from the project issue description using Selenium

I am encountering an issue with extracting content from Gitee using Selenium in Python. Whenever I attempt to extract the text, it returns blank results. Here is the element inspection:

https://i.stack.imgur.com/5aulV.png

My goal is to retrieve all the text within the div class 'git-issue-description markdown body'.

However, my current code does not seem to work when I use the following snippet:

Issue_description = driver.find_element(By.CLASS_NAME,'git-issue-description markdown-body').text

What steps should I take to successfully fetch the content inside this specified div class? The website link that I am attempting to extract text from can be found here -

python python-3.x selenium-webdriver

Answer 1

Answer №1

To select the element, you can use By.CSS_SELECTOR and set the selector to

"div.git-issue-description.markdown-body"

:

print(driver.find_element("css selector", "div.git-issue-description.markdown-body").text)

Below is a complete SeleniumBase script example for this:

from seleniumbase import Driver

driver = Driver()
driver.get("https://gitee.com/openharmony/arkui_ace_engine/issues/I92R3M?from=project-issue")
print(driver.find_element("css selector", "div.git-issue-description.markdown-body").text)
driver.quit()

Keep in mind that By.CLASS_NAME should only have one class name with no spaces. Your previous selection had multiple classes separated by a space (

'git-issue-description markdown-body'

).

Answer 2

To select the element, you can use By.CSS_SELECTOR and set the selector to

"div.git-issue-description.markdown-body"

:

print(driver.find_element("css selector", "div.git-issue-description.markdown-body").text)

Below is a complete SeleniumBase script example for this:

from seleniumbase import Driver

driver = Driver()
driver.get("https://gitee.com/openharmony/arkui_ace_engine/issues/I92R3M?from=project-issue")
print(driver.find_element("css selector", "div.git-issue-description.markdown-body").text)
driver.quit()

Keep in mind that By.CLASS_NAME should only have one class name with no spaces. Your previous selection had multiple classes separated by a space (

'git-issue-description markdown-body'

).

Answer 3

Answer №2

Revise your code as follows:

Issue_description = driver.find_element(By.CLASS_NAME,'git-issue-description.markdown-body').text

When using Selenium, the By.CLASS_NAME selector can be changed to By.CSS_SELECTOR by simply adding a dot before the class name. This modification allows you to target elements with multiple classes by separating them with dots in the CSS selector.

To enhance the cleanliness of your code, consider updating the selector to:

issue_description = driver.find_element(By.CSS_SELECTOR,'.git-issue-description.markdown-body')
text_content = issue_description.text # Extract only the text
html_content = issue_description.get_attribute('innerHTML') # Retrieve all the HTML content

Answer 4

Revise your code as follows:

Issue_description = driver.find_element(By.CLASS_NAME,'git-issue-description.markdown-body').text

When using Selenium, the By.CLASS_NAME selector can be changed to By.CSS_SELECTOR by simply adding a dot before the class name. This modification allows you to target elements with multiple classes by separating them with dots in the CSS selector.

To enhance the cleanliness of your code, consider updating the selector to:

issue_description = driver.find_element(By.CSS_SELECTOR,'.git-issue-description.markdown-body')
text_content = issue_description.text # Extract only the text
html_content = issue_description.get_attribute('innerHTML') # Retrieve all the HTML content

Code to retrieve text from the project issue description using Selenium

Answer №1

Answer №2

Similar questions

Automate Chrome browser to continuously refresh until desired item is located using Selenium and Chromedriver

There has been an exception: AttributeError - the object of 'WebDriver' does not have the attribute 'link'

Terminate the Chrome browser using python and end the process

What causes the inconsistency in single quote escaping when reading files in Python?

Having trouble with file downloads while using Selenium in Python for downloading files in headless mode?

Creating an xpath for selecting the calendar element nested inside a td element with specific class and aria-label

Can a test script be executed on a webpage (Login) simultaneously for over 20 users without relying on selenium grid?

Unable to locate the mp3 file, a FileNotFoundError has occurred while attempting to

Is there a way to ensure that pdoc keeps the whitespace intact?

Tips for modifying configuration properties within a Jenkins Pipeline using a Groovy script

Error message 'Unable to resolve hostname' encountered by Webdriver in IE10/Windows 7

One way to analyze data in two separate pandas dataframes is to compare specific columns and store the variations in a new dataframe

Tips for adding a line break to axis labels in Plotly when utilizing LaTeX encodings

What is the best way to continuously run tests in Python with Selenium and unittest?

Minimize the memory usage of a Python program

How to handle duplicate keys in JSON parsing using Python3

Guide to creating a new browser tab with Selenium WebDriver in Java

Locate a specific entry through several Python Pandas data structures

Python WebDriverException: The loading status cannot be determined because there is no execution context available

Unable to retrieve linked model in a generic Django view using Python