Extract reviews and ratings from IMDB by utilizing Selenium

I am in the process of extracting reviews and rating data for specific movies on IMDB. Below is the snippet of code I am using to scrape the rating:

 try:
     rating = review.find_element_by_css_selector('[class = "rating-other-user-rating"]')
     star_rating.append(rating.text)
 except:
     rating = None

This is the corresponding HTML:

<span class="rating-other-user-rating">
        <svg class="ipl-icon ipl-star-icon  " xmlns="http://www.w3.org/2000/svg" fill="#000000" height="24" viewBox="0 0 24 24" width="24">
            <path d="M0 0h24v24H0z" fill="none"></path>
            <path d="M12 17.27L18.18 21l-1.64-7.03L22 9.24l-7.19-.61L12 2 9.19 8.63 2 9.24l5.46 4.73L5.82 21z"></path>
            <path d="M0 0h24v24H0z" fill="none"></path>
        </svg>
            <span>7</span><span class="point-scale">/10</span>
        </span>

Queries:

  1. I am looking to extract the value "7" from the provided HTML content. What modifications should be made to the code to successfully retrieve it? It seems that the rating exists within a span tag without any specific classes or IDs, making it challenging to access. Your guidance would be greatly appreciated. Thank you.

  2. Is there a way to scrape a specific number of reviews from IMDB, such as extracting only 50 reviews? I attempted using the code below, but it does not halt the program after reaching 50 reviews:

      nextbutton = WebDriverWait(driver,5).until(EC.presence_of_element_located((By.CLASS_NAME,'ipl- load-more__button')))
    
      if len(movie_title) == 50: # movie_title indicates the number of reviews titles extracted so far. The target is 50
         break
    
      nextbutton.click()
    

Answer №1

Almost there! The rating score of 7 is located within a <span> tag and is the second child element of its parent <span>

<span class="rating-other-user-rating">
    <svg class="ipl-icon ipl-star-icon  " xmlns="http://www.w3.org/2000/svg" fill="#000000" height="24" viewBox="0 0 24 24" width="24">
        <path d="M0 0h24v24H0z" fill="none"></path>
        <path d="M12 17.27L18.18 21 1.64-7.03L22 9.24l-7.19-.61L12 2 9.19 8.63 2 9.24l5.46 4.73 5.82 21z"></path>
        <path d="M0 0h24v24H0z" fill="none"></path>
    </svg>
    <span>7</span>
    <span class="point-scale">/10</span>
</span>

Solution

To extract the text 7, you can use WebDriverWait with the visibility_of_element_located() method and either of these locator strategies:

  • Using CSS_SELECTOR and the text attribute:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.rating-other-user-rating span:first-of-type"))).text)
    
  • Using XPATH and

    get_attribute("innerHTML")
    :

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[@class='rating-other-user-rating']//span[not(@class)]"))).get_attribute("innerHTML"))
    
  • Note: Make sure to import the following modules:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

For further insights, refer to How to retrieve the text of a WebElement using Selenium - Python

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Tips for fixing a null pointer exception with Selenium when utilizing the Page Factory method

I am facing an issue with my code that consists of 3 classes - one for page locator, one for page action, and the other as a script to execute the function. I am encountering a nullpointer exception in the main scripts where the function is called. Can som ...

"Can you guide me on the steps to include a salesperson in the invoice printout in Odoo V11.0? Also, could you please

As a beginner in Odoo, I am looking to include the salesperson's name on the invoice like shown in this image, and have it displayed on the invoice PDF printout. Can anyone advise me on which file to edit in the Odoo system? Also, how can I successful ...

Error encountered when trying to import a file from a specific directory in Python: `ModuleNotFoundError`

Encountering a moduleNotFoundError issue while trying to import a file from the root directory. Here is the directory structure for my Flask App: index.py auth_ - server.py Pages - home.py I can successfully import home.py from the Pages directory, b ...

The algorithm for dual verification for retrieving a code from a nearby database

My application needs to be automated. It should first enter the username and password automatically, then click the login button. Next, it needs to retrieve a static OTP code from the database and input it into the OTP text field so the user can access ...

What steps can be taken to proceed with test execution following a failed assertion?

Although I realize this question may be a duplicate, I have been searching for a solution since yesterday without any luck. I am currently using Selenium Webdriver 2.47.1 and TestNG for automation purposes. My automation script consists of 12 tests, wherei ...

Calculate the sum of elements in an array based on a condition provided in a different

Looking for help with summing an array based on conditions in another array using Python. sum=0 for i in range(grp_num): if lower_bounds[i] > 0: sum = sum + histo1[i] I was thinking the numpy equivalent would be np.where(lower_bounds>0, ...

Encountering a NullPointer Exception while attempting to locate a WebElement in the code

My current challenge involves locating a particular webelement with the code snippet below: private WebElement loc_Start; public void clickButton() { loc_Start.findElement(By.xpath("//button[contains(text(), 'Start')]")).click(); } Upon i ...

Extracting Information with BeautifulSoup for every individual subpage - lengthy and varied URL structure

My current project involves scraping NFL passing data from the years 1971 to 2019. I was successful in extracting data from the first page of each year using the following code: # Here is the code that works: passingData = [] # initializing an empty list ...

Utilize BeautifulSoup and Selenium to extract dynamically generated table data from div elements and store it in a list

I am working on scraping table data using Selenium and passing it to Beautiful Soup. The current script I have pulls all text data, but it ends up as one big element in a list. Is there a way for Beautiful Soup to filter by the "table-container" div class ...

Seaborn provides a visual representation of lexical dispersion through its unique

Utilizing the seaborn library to generate a plot akin to the example provided below. import pandas as pd import matplotlib.pyplot as plt import numpy as np import seaborn as sns location = "/global/scratch/umalmonj/WRF/juris/golden_hourly_manual_obs.csv" ...

Luigi's Task Manager in a State of Inactivity

I am trying to utilize the luigi(2.8) task scheduler, however, I am encountering issues during the setup process. Initially, I attempted to run the luigid command: luigid Following this, I tried accessing localhost:8082 using Firefox. Instead of icons, ...

Utilizing filedialog in wxPython to dynamically update a text control widget

I am currently working on developing a simple dialog for editing a settings file. I have a predefined list of essential files that need to be included. options = ["dog", "cat", "chicken"] In my design, there is a column displaying the option names using ...

Using Protractor to extract text from multiple paragraphs

How do I retrieve the values of all paragraphs (p) at once? Below is an example of how my inspect view appears: "Testing sample one." "Testing sample two." And here is a snippet of my code to extract the value of id 'run': browser.findElement ...

"Looking to find the mean and standard deviation for each channel in the cifar10 dataset using PyTorch? Let's see

I am attempting to calculate the mean and standard deviation of each channel in the CIFAR10 dataset. Here is the code snippet I am using: import torch import torchvision import torchvision.transforms as transforms transform = transforms.Compose([transform ...

From Jquery to Python: Converting Query Dict in Django

My method of sending an array using JQuery is as follows: var items = []; items.push({ "item":someItem0, "quantity": someQuantity0, "category": someCategory0}) }); it ...

Error encountered during Raspberry Pi build with OpenCV

While attempting to build opencv-3.2.0 on a Raspberry Pi using cmake, I encountered an unusual error during the installation process when it was 99% complete. I didn't make any changes to avoid causing any problems, but it appears to be a simple codi ...

What is the best way to download a file without interference from automated

I am currently working on automating tests for some websites using WebDriver, TestNG, and Java code. I have encountered a challenge with downloading files. Specifically, I am trying to download a file from the following link: http://www.labmultis.info ...

Waiting for Angular pages in Selenium with Protractor JavaScript code

I'm currently in the process of testing an AngularJS page and utilizing Selenium (Java) to create automation scripts for it. Below is the code snippet that I've implemented to ensure synchronization with the page before moving on to the next scr ...

Continuously click on button 1 in selenium with Java until button 2 is visible

In my current project, I am conducting tests on a native iOS mobile app by utilizing Selenium and Appium in conjunction with Java code. An essential part of the teardown process involves continuously clicking the "back" button until the "setting" button be ...

Putting a Random Picture to the Test with a Python Keras/Tensorflow Convolutional Neural Network

I have successfully developed a CNN and now I am exploring ways to test a random image with it. The tools I am using are Keras and Tensorflow. Let's say I want to test the image available at this link: . Could someone guide me on how to save, load th ...