Retrieving text from a collection of Web Elements can be time-consuming

I have written a code to extract data from a web table, where each row is read as text and added to another list before being sent to a method for writing to an excel file. However, this process of reading approximately 200 rows and writing the data to a new list is quite slow. Is there a more efficient way to achieve this task or is the current performance expected?

Below is the snippet of my code:

package mypackage;

import java.io.IOException;
import java.time.Duration;
import java.util.ArrayList;
import java.util.List;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;

import com.seleniumpractice.utilities.XLUtils;

import io.github.bonigarcia.wdm.WebDriverManager;

public class CovidWebTable {
    static WebDriver driver;
    static XLUtils xl;
    static List<WebElement> header;
    static List<WebElement> rows;
    static List<ArrayList<String>>rowsXL;

    public static void main(String[] args) throws IOException {
        WebDriverManager.chromedriver().setup();
        driver = new ChromeDriver();
        driver.get("https://www.worldometers.info/coronavirus");
        driver.manage().window().maximize();
        driver.manage().timeouts().implicitlyWait(Duration.ofSeconds(10));
        
        WebElement table = driver.findElement(By.xpath("//table[@id='main_table_countries_today']"));
        rows = table.findElements(By.xpath(".//tr[@role='row']"));
        System.out.println("Total rows: "+rows.size());
        
        xl = new XLUtils(".\\datafiles\\covid.xls");
        //xl.setCellData(null, rows, rows, null);
        
        rowsXL = new ArrayList<ArrayList<String>>();
        
        //Add header
        header = table.findElements(By.xpath(".//thead//th"));
        System.out.println("Header cols: "+ header.size());
        
        ArrayList<String> headerXL = new ArrayList<String>();
        
        for(int col=1; col<header.size()-1; col++) {
            //xl.setCellData("Covid Data", 0, col-1, header.get(col).getText());
            headerXL.add(header.get(col).getText());
        }
        
        rowsXL.add(headerXL);
        
        int xlRow = 1;
        int skippedRows = 0;
                
        for(int r=1; r<rows.size(); r++) {
            
            String a = rows.get(r).getText();
            
            //skip empty rows
            if(rows.get(r).getText().equals("")) {
                skippedRows++;
                continue;
            }
            System.out.println("Reading row "+r);   
            
        ArrayList<String> cols = new ArrayList<String>();
            
            for(int c=1; c<header.size(); c++) {
                String data = rows.get(r).findElement(By.xpath(".//td["+(c+1)+"]")).getText();
                //xl.setCellData("Covid Data", xlRow, c-1, rows.get(r).findElement(By.xpath(".//td["+(c+1)+"]")).getText());
                cols.add(data);
                
            }
            rowsXL.add(cols);
            xlRow++;
            
        }
        xl.setCellDataFromList(rowsXL, "Orders");
        System.out.println("Scraped Rows: "+ rowsXL.size());
        System.out.println("Skipped Rows: "+skippedRows);
        System.out.println("Complete.");
        
        driver.close();
        
        
        
        

        }

}

Answer №1

extracting the information and organizing it into a structured format

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Python Webdriver Manager: Troubleshooting Linux Issue with Python Webdriver Manager

Trouble Using Webdriver Manager Python with Linux System Specifications: Distro - Manjaro Linux IDE: Visual Studio Code I recently followed a tutorial on using the Webdriver Manager in Python to streamline my workflow. However, I faced some issues when ...

Unable to retrieve the element from the website using Selenium

Having trouble accessing the element to change pages one by one. Ongoing attempts have been unsuccessful. Reference image: http://prntscr.com/o0f4mx. Would greatly appreciate any assistance. XPath = //*[@id="___gcse_0"]/div/div/div/div[5]/div[2]/div/div/d ...

Selenium - incapability to select the subsequent page

I'm experiencing difficulties with clicking the next button (>) and repeating this process until reaching the last page. Despite researching similar issues, I have been unable to identify what's causing my code to malfunction. Below is the co ...

Gather data from a nested page's table by clicking with the help of selenium

I am looking to extract data from nested tables on this specific page: Upon clicking any cell, a new nested table appears which contains the data I want to scrape. https://i.stack.imgur.com/96T3s.png To achieve this, I wrote a Python script using Seleniu ...

Tips for circumventing the validation popup on Dell's support page while submitting the search string with Python and Selenium WebDriver

For my automation project, I am facing a challenge with inputting service tags on the Dell support page to extract laptop information. Occasionally, a validation pop-up appears when trying to submit, resulting in a 30-second waiting time. https://i.stack. ...

Selenium in Python failing to check the checkbox

Why am I encountering difficulty in selecting the checkbox on the webpage using Selenium with Python? import traceback import selenium.webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as ...

Is it possible to make a GET request to a webpage that requires logging in without actually logging in first?

In my recent project, I was tasked with developing an automation script that could log into a website, navigate to a specific page, and download a CSV file. However, I encountered a challenge as the website's security measures prevented me from loggin ...

Struggling to Connect to Geckodriver Service on Mac using Python? Here's the Solution!

Recently, I encountered a new issue with one of my web scrapers on my Mac. After leaving the scraper idle for about a month, it has mysteriously stopped working! I suspect something may have become outdated, but I'm unable to pinpoint the exact cause. ...

how to create a custom ExpectedCondition class in Python using Selenium webdriver

Currently, I am working with Selenium WebDriver in Python and I need to set up an explicit wait for a popup window to show up. Unfortunately, the standard methods in the EC module don't offer a straightforward solution for this issue. After browsing t ...

Simple steps to retrieve a value in Python: Let's say you have the following HTML code: <span class="label">Google</span

How can I retrieve the value of "Google"? I attempted using the get.attribute method, but it returned "None". Strangely, when I used get.attribute('innerHTML'), it displayed the entire element: <span class="label">Google</spa ...

What is the reason that dynamic languages do not have a requirement for interfaces?

In Python, do we not need the concept of interfaces (like in Java and C#) simply because of dynamic typing? ...

Encountering a problem while executing a Python script with Selenium on GCP Cloud Run

I have a Python script that requires logging in and retrieving an access_token from an authentication server. The process involves navigating to the authentication server's URL, entering a username and password, clicking 'login', waiting for ...

When trying to retrieve an element, Selenium may throw an exception indicating that there is an

Recently, I've been working on coding a bot that can automatically log in to a website. Right now, I've started with the code for the login process focusing only on collecting the email input field. try: WebDriverWait(bot, delay).unti ...

Encountering a coding issue in Arabic when utilizing the requests RESTful client

Below is a Python code snippet for a RESTful client: import requests; s= 'This is the message to be sent'; resp = requests.post('http://localhost:8080/MyApp/webresources/production/sendMessage', json={'message': s,} ) This c ...

Error with Selenium version 117.0.5938.150: When passing proxies, 'WebDriver.init() got multiple values for argument 'options'

Encountering an issue with the latest version of Selenium when attempting to pass proxies using SeleniumWire. The error message received is "WebDriver.init() got multiple values for argument 'options'". See below for the current code snippet: fro ...

Selenium is being used to block login automation in Edge modal

Currently, I am working on a Python script to automate logging into Edge using msedge-selenium-tools. However, when running the script, a modal window pops up and I am facing challenges in identifying CSS selectors to use with Selenium. Here is the snippet ...

Having trouble accessing the website link in the browser

I recently developed a basic automation script in Python using Selenium but encountered an unexpected exception. Here is the code snippet causing the issue: import pandas as pd from pandas import ExcelWriter from selenium import webdriver import seleniu ...

I am having trouble finding the tabFrame frame shown in the screenshot below

https://i.stack.imgur.com/wCJhN.png I'm having trouble finding the frame labeled tabFrame in the screenshot provided. I did manage to locate outlookFrame successfully. This is the code I used: driver.switch_to.frame('outlookFrame') However ...

"Enhancing test automation in Selenium: A guide on sending multiple values to the same input

Is there a way to send multiple input values for the same input field in selenium using python? Currently, my code only sends a single value to the input field. I am looking to modify it to test and run for multiple values. from selenium import webdrive ...

Error: The __init__() function is lacking the required argument 'timeout.'

Despite initializing the super class constructor, I am encountering an error while trying to run some basic test cases for a simple project. Here is the code snippet: home.py class home(Pages): user = (By.NAME, "user-name") utxt = " ...