Code to retrieve text from the project issue description using Selenium

I am encountering an issue with extracting content from Gitee using Selenium in Python. Whenever I attempt to extract the text, it returns blank results. Here is the element inspection:

https://i.stack.imgur.com/5aulV.png

My goal is to retrieve all the text within the div class 'git-issue-description markdown body'.

However, my current code does not seem to work when I use the following snippet:

Issue_description = driver.find_element(By.CLASS_NAME,'git-issue-description markdown-body').text

What steps should I take to successfully fetch the content inside this specified div class? The website link that I am attempting to extract text from can be found here -

Answer №1

To select the element, you can use By.CSS_SELECTOR and set the selector to

"div.git-issue-description.markdown-body"
:

print(driver.find_element("css selector", "div.git-issue-description.markdown-body").text)

Below is a complete SeleniumBase script example for this:

from seleniumbase import Driver

driver = Driver()
driver.get("https://gitee.com/openharmony/arkui_ace_engine/issues/I92R3M?from=project-issue")
print(driver.find_element("css selector", "div.git-issue-description.markdown-body").text)
driver.quit()

Keep in mind that By.CLASS_NAME should only have one class name with no spaces. Your previous selection had multiple classes separated by a space (

'git-issue-description markdown-body'
).

Answer №2

Revise your code as follows:

Issue_description = driver.find_element(By.CLASS_NAME,'git-issue-description.markdown-body').text

When using Selenium, the By.CLASS_NAME selector can be changed to By.CSS_SELECTOR by simply adding a dot before the class name. This modification allows you to target elements with multiple classes by separating them with dots in the CSS selector.

To enhance the cleanliness of your code, consider updating the selector to:

issue_description = driver.find_element(By.CSS_SELECTOR,'.git-issue-description.markdown-body')
text_content = issue_description.text # Extract only the text
html_content = issue_description.get_attribute('innerHTML') # Retrieve all the HTML content

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Automate Chrome browser to continuously refresh until desired item is located using Selenium and Chromedriver

I am attempting to create a Python script using selenium that will continuously refresh the current Chrome page until a specific item, identified by driver.find_element_by_partial_link_text("Schott"), is found. This is what I had in mind: while not drive ...

There has been an exception: AttributeError - the object of 'WebDriver' does not have the attribute 'link'

While I was working on an availability checker for some Amazon products, everything seemed to be going well. However, when I returned after taking a break, I encountered this error. I'm not sure if I overlooked something or made a mistake during editi ...

Terminate the Chrome browser using python and end the process

How can I close the Chrome browser window and kill the process opened in the beginning in the program below? from selenium import webdriver import subprocess subprocess.Popen('"C:\\Program Files\\Google\\Chrome&bsol ...

What causes the inconsistency in single quote escaping when reading files in Python?

When I open two similar text files on MacVim and read them into Python variables, I notice that their content behaves differently. Can someone explain why this happens and suggest a solution to ensure consistent behavior? Here is an example, where f1.txt ...

Having trouble with file downloads while using Selenium in Python for downloading files in headless mode?

In the midst of a project, I found myself needing to download numerous reports from the SSRS web server. To streamline this process, I developed a Python program with Selenium for automation. The key requirement was that the downloads had to occur seamless ...

Creating an xpath for selecting the calendar element nested inside a td element with specific class and aria-label

I am having difficulty identifying the correct XPath for the element below. <td class="_1dmyat7f" role="button" aria-disabled="false" aria-label="Selected start date. Saturday, 31 August 2019" tabindex="0" style="width: 33px; height: 32px;">31</t ...

Can a test script be executed on a webpage (Login) simultaneously for over 20 users without relying on selenium grid?

My goal is to run a single script simultaneously for over 20 users without utilizing selenium Grid. Situation: Logging into the application, locating a specific element, and then closing the particular login. I attempted TestNG parallel execution but was ...

Unable to locate the mp3 file, a FileNotFoundError has occurred while attempting to

Currently experimenting with pydub to convert uploaded mp3 files to specific bitrates. Here's the code I'm working with: from pydub import AudioSegment def process_mp3(mp3, id): print(mp3) # media/track1-original audio = AudioSegment.fr ...

Is there a way to ensure that pdoc keeps the whitespace intact?

Trying to create documentation with pdoc, and my docstrings are formatted like this: """ I provide a description of what the method does :param param1: an integer :param param2: a str """ Came across this question on preserving line breaks when generati ...

Tips for modifying configuration properties within a Jenkins Pipeline using a Groovy script

As a newcomer to Jenkin Pipeline, I am faced with the task of integrating our maven-based selenium-java Automation framework into the pipeline using a groovy script. Within the framework, we utilize a config.properties file to store crucial information su ...

Error message 'Unable to resolve hostname' encountered by Webdriver in IE10/Windows 7

In our setup, we utilize a virtual Windows 7 image for running functional tests on IE10 using Ruby with Cucumber + Watir-Webdriver. While the system generally operates smoothly, there are occasional instances where all tests fail, accompanied by the error ...

One way to analyze data in two separate pandas dataframes is to compare specific columns and store the variations in a new dataframe

In order to compare two dataframes, df1 (blue) and df2 (orange), you want to extract the rows from df2 (orange) that do not exist in df1 and save them to a separate dataframe. Additionally, you would like to merge these extracted rows back into df1, assign ...

Tips for adding a line break to axis labels in Plotly when utilizing LaTeX encodings

When creating a plotly diagram, I need to display a 2-line ticktext while also using latex encoding for other functionality. However, neither HTML encoding <br> nor the latex \linebreak \newline \\ seem to be effective when ...

What is the best way to continuously run tests in Python with Selenium and unittest?

In this test snippet, you can see my test class. It runs successfully when called once with the command python3 main.py. However, there seems to be an issue when attempting to run this test multiple times, say 100 times. How can I achieve that? When trying ...

Minimize the memory usage of a Python program

I'm working on a data analysis project in Python using numpy and pandas. Since I plan to deploy multiple instances of this project, I am aiming to keep it as lightweight as possible. I conducted an investigation with the following code: import loggi ...

How to handle duplicate keys in JSON parsing using Python3

Excuse my lack of experience in this area, but there's a json response that looks like this; import json jsonObj = json.loads(""" { "data": [ { "name_space": "name", "value": &q ...

Guide to creating a new browser tab with Selenium WebDriver in Java

Is there a way to launch a new tab within the current Firefox browser using Selenium WebDriver (also known as Selenium 2) in Java? ...

Locate a specific entry through several Python Pandas data structures

Suppose I have three different dataframes, each containing information about various records. I am looking to identify in which of the dataframes a specific record is present. This is dataframe1 (df1) index | name | acct_no | country 2 | alex | 112233 | ...

Python WebDriverException: The loading status cannot be determined because there is no execution context available

Currently using Selenium for web scraping with the following code: driver.switch_to.window(driver.window_handles[1]) WebDriverWait(driver, 60).until(EC.presence_of_element_located((By.CSS_SELECTOR,'#listForm > div ...

Unable to retrieve linked model in a generic Django view using Python

Recently started learning Python and Django. Currently, I am working with 3 models that are linked: patient -> visit -> prescription. I am trying to override the get_context_data method in a detailView so that I can have access to all prescriptions relate ...