Questions tagged [web-scraping]

Web scraping refers to the technique of acquiring targeted data from websites that do not offer an API or any other means of automated data extraction. It is essential to thoroughly investigate inquiries related to "Getting Started with Scraping" (such as using Excel VBA) since there exist numerous practical code examples. Approaches for web scraping encompass utilizing external tools, creating tailored software solutions, or even conducting manual data collection in a standardized manner.

JS Executing functions in a pop-up window

Recently, I have been immersing myself in learning JS and experimenting with webpage interactions. It started with scraping data, but now I am also venturing into performing actions on specific webpages. For example, there is a webpage that features a butt ...

Python script using Selenium to only print elements in an HTML page that have a specific class value and print only certain elements based on

I am currently developing a script to automatically test all the free proxies available on The website contains a comprehensive list of all available proxies, and I have successfully made my script extract and display them. However, I only want to display ...

Utilize Selenium to Fetch Elements by Xpath: Retrieve only the most recent 60 elements displayed on the page

I am currently faced with a challenge in figuring out how to extract the last 60 elements on a webpage. posts = driver.find_elements_by_xpath("""(//div[@class='hotProductDetails'])""") for post in posts: print(post.text) The code above retrieves all ...

What is the method to perform a "Copy" operation with Selenium WebDriver?

How can I click the "Copy" button located at this URL: https://i.stack.imgur.com/XyfYi.png The element I need to interact with is labeled as "Copy." Despite attempting various "find element by" methods, I have been unsuccessful due to encountering error ...

Using Selenium for web scraping, certain instances involve scanning data, while others may not have this capability

Currently, I am developing a Python script utilizing the Selenium library to scrape hotel information from the e-dreams platform. The main purpose of this script is to gather specific data such as the title and current price, organize them into lists, and ...

Ways to extract information from a table cell located next to a table cell with colspan

As a beginner in web scraping, I am gradually making progress. However, I'm facing a tough challenge with this particular task. My goal is to extract data from the ESPN NBA boxscore website: I aim to scrape the names of players labeled as "DNP" (Did Not ...

Having difficulty extracting table data with selenium and beautiful soup

I've hit a wall in trying to extract data from a table on Yahoo's daily fantasy webpage. Despite scouring StackOverflow for solutions, I can't seem to get the desired information. The table either appears empty or the elements within it are elusive. NOTE: ...

Tips for locating an element B that is positioned beneath another element A

Imagine you have the following elements on a webpage: Element A #A is found here ... #Some code Element B #B is located here Even though A and B do not have a parent-child relationship, they share the same locator. There are no element ...

Developing a unified driver that can navigate through various platforms with the help of proxy servers

After writing a Python script using Selenium and proxies to extract titles from various sites, I discovered that my current approach involves creating separate browser instances for each site. However, my primary objective is to utilize the same browser wi ...

Selenium keeps displaying the error "Error: element is not clickable" while attempting to input text

I have been trying to automate interactions with elements on Soundcloud's website using Selenium, but I am facing difficulties when attempting to interact with the input tags. Whenever I try to write in the input tag with the class "headerSearch__input" ...

Ways to retrieve the data from the table on this webpage

I am attempting to extract data from a table displayed on the screen in order to select rows with a status other than accepted. The code I'm using is as follows: thead = driver.find_element_by_tag_name('thead') columns = [th.text for th in thead.find_eleme ...

Error message: IndexError: list index out of range while attempting to trigger a click event using selenium

There are a total of 41 category checkboxes on the page, with only 12 initially visible while the rest are hidden. To reveal the hidden checkboxes, one must click on "show more." This simple code accomplishes that: [step 1] Loop through all checkboxes >> ...

Navigating a Frame and Clicking a Link with Puppeteer: A Step-by-Step Guide

I am facing an issue with clicking on an anchor link within a page that is supposed to open a new tab for exporting a PDF. The problem arises because this link is located inside a frame within a frameset structure as shown below: https://i.stack.imgur.com ...

Boost the efficiency of my code by implementing multithreading/multiprocessing to speed up the scraping process

Is there a way to optimize my scrapy code using multithreading or multiprocessing? I'm not well-versed in threading with Python and would appreciate any guidance on how to implement it. import scrapy import logging domain = 'https://www.spdigit ...

Not all HREF links are being captured by BeautifulSoup while scraping this website... no results are being returned

My goal is to extract all the links from a specific website in order to compile a comprehensive repository of its associated products. import requests from bs4 import BeautifulSoup import pandas as pd baseurl = "https://www.examplewebsite.com/&quo ...

Automation tool encountering difficulty in selecting pagination button on website

I'm having trouble using selenium and geckodriver with Firefox to scrape eBay while running on Ubuntu 16.04. All I want to do is click the next button, but my code doesn't seem to be working correctly. The two instances of button assignment I tr ...

What is the best way to obtain just the name and phone number information?

Looking to extract the name and contact number from a div that can contain one, two, or three spans. The requirements are: Extract name and contact number only when available. If contact number is present but name is missing, assign 'N/A' to the name var ...

Triggering specialized Timeouts when waiting for certain elements

This question is a continuation of my previous inquiry regarding inconsistencies in scraping through divs using Selenium. Currently, I am working on extracting Air Jordan Data from grailed.com's selection of high-top sneakers by Jordan Brand. My objec ...

What is the best way to extract content between <span> and the following <p> tag?

I am currently working on scraping information from a webpage using Selenium. I am looking to extract the id value (text) from the <span id='text'> tag, as well as extract the <p> element within the same div. This is what my attempt ...

Is it possible to adjust environment variables during runtime in a live production environment for a rails application?

I am currently developing a web application that focuses on automating CRUD tasks on Amazon. This includes the ability for users to delete and add addresses on their Amazon account. To automate these tasks, I am utilizing Selenium WebDriver with a Mozilla ...

Puppeteer encountered an issue: The execution context was destroyed, possibly due to a page navigation

I have a coding challenge with submitting a form that requires a selector (await page.waitForSelector("selector")) to determine if the submission was successful. The code successfully fills out all the necessary fields in the form and submits it ...

Click on the desired choice by using the right-click feature in Selenium while on the PowerBi

I have written the following Python code to extract data from a PowerBI website: driver = webdriver.Firefox() driver.get("https://app.powerbi.com/view?r=eyJrIjoiZGYxNjYzNmUtOTlmZS00ODAxLWE1YTEtMjA0NjZhMzlmN2JmIiwidCI6IjljOWEzMGRlLWQ4ZDctNGFhNC05NjAwLTRiZT ...

"What is the best method for clicking the 'show more' button when scraping data from multiple pages

I have a script that simultaneously scrapes data from 10 different pages. #hyperlink_list is the list of the pages options = webdriver.ChromeOptions() driver = webdriver.Chrome(ChromeDriverManager().install(),options=options) for i in range(0,10): url ...

When I attempted to extract data with selenium through web scraping, the data stored in my csv file appeared unusual. Instead of the expected content, all I found were j

from selenium import webdriver from selenium.webdriver.chrome.service import Service from webdriver_manager.chrome import ChromeDriverManager from selenium.webdriver.common.keys import Keys from selenium.webdriver.common.by import By import time import cs ...

Using Beautiful Soup, extract various elements from a webpage in a repeated sequence

I'm trying to scrape a table that contains a loop, but I'm running into issues with extracting certain elements. <ul> <li class="cell036 tal arrow"><a href=" y/">ALdCTL</a></li> <li class="cell009">5,71</li> <l ...

Python Selenium: Execute web scraping script only once all lazy-loading components have been fully loaded

Just started using selenium and I still have a question even after searching for solutions. I am attempting to retrieve all the links on this website (). The links are loaded in a "lazy-load" manner, gradually appearing as the user scrolls down the scree ...

XPath allows for the selection of both links and anchors on a webpage

Currently, I am using Screaming Frog and seeking to utilize XPath for a specific task. My objective is to extract all links and anchors that contain a certain class within the main body content, excluding any links found within div.list. I have attempted ...

Attempting to utilize a Python 3 web scraping tool

Having trouble creating a web scraper for the first time. The script is failing to run and displaying an error code. The tutorial I followed can be found at: Despite following all steps and watching other YouTube tutorials for additional guidance, none o ...

Searching for an element using Xpath and need to remove unwanted elements within the Xpath

I am currently using Selenium to scrape a website. However, I have encountered an issue when trying to retrieve the coins' names because there are 2 elements inside each 'td'. How can I eliminate the unwanted element or only select the first ...

Getting a SyntaxError when attempting to use WebDriverWait to automate clicking a button in Selenium with

Issue at hand: My code is running smoothly until I introduce a segment that involves clicking an arrow on the product square/profile. Main concern: While the code as a whole functions properly, the dataset it retrieves is distorted. Upon further investiga ...

Cheerio - Ensure accurate text retrieval for selectors that produce multiple results

Visit this link for more information https://i.stack.imgur.com/FfYeg.png I am trying to extract specific market data from the given webpage. Specifically, I need to retrieve "Sábado, 14 de Abril de 2018" and "16:00". Here is how I did it using Kotlin an ...

Is there a way to completely scrape YouTube comments by automating the process of clicking the "Read more" button for each comment using Selenium and Python?

Attempting to scrape YouTube comments has been a challenge for me. While I have succeeded in scraping one-liner comments, the longer comments with a "Read more" button are causing issues. I am unable to interact with these buttons using Selenium and Python ...

Guide on clicking a label element with Python and Selenium

I have been working on a web scraping bot using Python and Selenium, but I've encountered an issue. The website I'm trying to scrape has a fieldset HTML tag with 4 label tags inside it. All these labels have the same class name and I need to click on one o ...

Can HTML comments prevent Selenium from running?

Currently, I am attempting to scrape a website with the following layout: <div> <!----> <!----> <!----> <div class="block"> <h3 class="subtitle is-4"></h3> ... ... </d ...

Utilize Python's Selenium webdriver to operate Firefox in the background

Currently, I am involved in a website scraping endeavor utilizing Selenium with Python. One question that has crossed my mind is whether it's feasible to initiate the Firefox browser in the background or launch Firefox on a separate workspace within U ...

Is it possible to extract data from tables that have the 'ngcontent' structure using Selenium with Python?

Scraping basic tables with Selenium is a simple task. However, I've been encountering difficulties when trying to scrape tables that contain "_ngcontent" notations (as seen at "https://material.angular.io/components/table/overview"). My goal is to con ...

Setting up User Agent (UA) and Headless mode for Selenium on SafariIs there a

Desired Solution for Selenium Safari Setup I am seeking a way to configure the User Agent and Headless settings for Selenium when using Safari, similar to ChromeOptions. My specific requirement for Safari is due to needing access to a website that is only ...

What is the method to extract div text using Python and Selenium?

Is there a way to extract the text "950" from a div that does not have an ID or Class using Python Selenium? <div class="player-hover-box" style="display: none;"> <div class="ps-price-hover"> <div> ...

Retrieve search results from Bing using Python

I am currently working on a project to develop a Python-based chatbot that can retrieve search results from Bing. However, my efforts have been hindered by the outdated Python 2 code and reliance on Google API in most available resources online. The catch ...

Encountering a problem with the Python request library

Hello there, Lately, I've been experiencing timeouts when making requests to the ebay website. Here is the simple code that I am using: import requests headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; ...

obtain the text content from HTML response in Node.js

In my current situation, I am facing a challenge in extracting the values from the given HTML text and storing them in separate variables. I have experimented with Cheerio library, but unfortunately, it did not yield the desired results. The provided HTML ...

What is the best way to use selenium to extract all p-tags and their corresponding h2-tags?

I am looking to retrieve the title and content from an article: <h2><span>Title1</span></h2> <p>text I want</p> <p>text I want</p> <h2><span>Title2</span></h2> <p>text I want< ...

An issue with Selenium web scraping arises: WebDriverException occurs after the initial iteration of the loop

Currently, I am executing a Selenium web-scraping loop on Chrome using MacOS arm64. The goal is to iterate through a list of keywords as inputs in an input box, search for each one, and retrieve the text of an attribute from the output. A few months ago, I ...

Before I open a new browser window, I receive an error message saying "Element not found

When trying to interact with the element labeled as "mat-select-arrow-wrapper.ng-tns-c73-7" on the provided website, I encounter an error stating that the element cannot be located. Strangely, this issue is resolved after clicking the new browser window on ...

You are unable to extract data from an HTML table

I have been encountering an issue while trying to extract data from an HTML table. Every time I attempt to do so, I am faced with an ERROR Message 13 "Type Mismatch". I suspect that the problem lies in my use of incorrect HTML tags. Despite spending severa ...

Scraping websites efficiently using Selenium for pagination features

Recently, I've been delving into the world of creating web scrapers using Selenium. One particular challenge I'm encountering involves scraping pages with pagination. I put together a script with high hopes of successfully scraping every page. fr ...

Preventing Selenium from immediately exiting and addressing issues with keys not being typed

Is anyone else experiencing the issue where the site opens for a split second and exits, and it doesn't type what it's supposed to? How can this be fixed? I've tried multiple methods to locate the element, and I believe my approach is correct. Please corr ...

Unable to extract information from empty <td> using python and selenium

Currently, I am facing an issue while trying to fetch values from a <tr> using Python Selenium. Specifically, I need these values to be ordered and also identified based on whether they contain the word "PICK" or not. My goal is to determine the exa ...

Unable to harvest data from Google Adsense

I am having trouble scraping a website to extract URL's and images from Google AdSense. Unfortunately, I am not receiving any details from Google AdSense. Here is what I need: When we search for "refrigerator" on Google, we see ads that I want to retrie ...

What is the process for retrieving a player's data from the Statistics webpage using HTML code?

I am currently using selenium to scrape data from a website. Here's the link to the website I am working on: . The specific information I am trying to extract is located under the player's 'statistics' section. My current code opens the ...

The Selenium driver's execute_script() function is not functioning as expected

When I attempt to execute a JavaScript using driver.execute_script, nothing happens and the system just moves on to the next line of Python code. Any ideas? I've been web scraping on a webpage, using JavaScript in the Console to extract data. The JavaScri ...

Struggling to retrieve content from a webpage, encountering an unexpected error

I am having trouble understanding why I keep encountering an error. My goal is to extract the description and price of the first five search results from a specific webpage. The code successfully performs tasks like searching for terms in a CSV file, openi ...

Node.JS Logic for Scraping and Extracting Time from Text

Currently, I am working on developing a web scraper to gather information about local events from various sites. One of my challenges is extracting event times as they are inputted in different formats by different sources. I'm seeking advice on how to ide ...

Encountered an issue during the data extraction process utilizing BeautifulSoup

My goal is to extract the membership years data from the IMDB Users page. Link On this page, there are multiple badges and one badge that is common for all users is the last one. This is the code I am using: def getYear(review_url): respons ...

BeautifulSoup does not recognize circular HTML pages

Encountered an issue where the page parsing code consistently checks the same page every time, despite using it alongside selenium. Selenium has no problem opening new links, but the parsing only occurs on the initial page. The frustrating part is that si ...

Acquire session cookies using scrapy

I've been utilizing scrapy for web scraping on sites that require login, but I'm unsure about the specific fields needed to save and load in order to maintain the session. With selenium, I am saving the cookies like so: import pickle import sele ...

Scrapy fails to retrieve closing prices from Yahoo! Finance

When attempting to extract closing prices and percentage changes for three tickers from Yahoo! Finance using Scrapy, I am encountering an issue where no data is being retrieved. I have verified that my XPaths are correct and successfully navigate to the de ...

BS4 is having trouble selecting the right 'span' element

When attempting to scrape pricing information from a specific website, I encountered the following portion of HTML code: </div> </div> <div class="right custom"> <div class="description custom"> <aside> < ...

Leveraging Selenium to dismiss a browser pop-up

While scraping data from Investing.com, I encountered a pop-up on the website. Despite searching for a clickable button within the elements, I couldn't locate anything suitable. On the element page, all I could find related to the 'X' to cl ...

Getting an UnhandledPromiseRejectionWarning while attempting to navigate through Google Maps using Node.js Puppeteer

(node:15348) UnhandledPromiseRejectionWarning: Error: Execution context was destroyed due to a potential navigation issue. const browser = await puppeteer.launch({headless: false}); const page = await browser.newPage(); page.goto("https://www.google. ...

Is it possible to modify the value on the src img tag prior to scraping the data

Hey everyone, check out this piece of code I have: foreach ($image as $snimka){ $url = $snimka->src; $ch = curl_init($snimka->src); $fp = fopen('images/' . basename($url), 'wb'); curl_setopt($ch, CURLOPT_FILE, $fp); curl_setopt($ ...

Tips for extracting numerical data from Google search results

import requests from bs4 import BeautifulSoup # Fetch data from Google Search response = requests.get("https://www.google.com/search?q=book") soup = BeautifulSoup(response.content, 'html.parser') phrase_extract = soup.find_all(id="result-stats") ...

Guide to harvesting popular searches on Google

My challenge involves scraping Google Hot Trends. Initially, I attempted to utilize Chrome developer tools to capture all requests, however, no requests were being made. Therefore, I turned to using selenium, but encountered difficulties in fetching the ...

Utilize Python, BeautifulSoup, and Selenium to extract real-time information from a dynamic table

My goal is to extract all the URL links connected to the soccer matches displayed in the table on this specific website. Here's the python code snippet for this task: from selenium import webdriver from bs4 import BeautifulSoup driver = webdriver.Firefo ...

Is it possible to save and utilize cookies in a program without relying on the selenium driver.add_cookie

In the midst of a project, I find myself faced with the task of extracting URLs for all products on a given page and utilizing Scrapy to sift through each URL for product data. The challenge arises when a pop-up emerges 3-5 seconds after loading every URL, ...

Access a webpage using a Python web-scraping tool to log in

Currently, I am employing Selenium WebDriver within Python to carry out a web scraping endeavor. My aim is to log in by inputting the necessary login credentials and subsequently clicking on the submit button. Although successful in entering the Username ...

The table content is not displayed by Selenium when using ::before selector

I'm currently attempting to extract data from the table of Brazilian debentures using a combination of BeautifulSoup and Selenium. Interestingly, when I utilized only BeautifulSoup for scraping, it retrieved even less information compared to when used in c ...

Notification: The specific HTML element identified as <div id="tabber_obj_0_div_3" class="ptr"> cannot be smoothly brought into view by scrolling

I am currently working on extracting book reviews from a specific URL using web scraping techniques. My goal is to gather the book name and each review in separate rows. Here's the code snippet I have written so far, which utilizes both selenium and bs4 fo ...

Unsuccessful endeavor at web scraping using Selenium

Having faced challenges using just BeautifulSoup in a previous attempt, I decided to switch to Selenium for this task. The goal of the script is to retrieve subtitles for a specific TV show or movie as defined. Upon reviewing the code, you'll notice severa ...

Encountering issues with the .exe file after the conversion from .py, specifically receiving the error message: "ModuleNotFoundError: no module named 'selenium'"

Can somebody lend a hand? I'm fairly new to Python and it seems like I might have made a mistake somewhere in my code. This is the error message I'm getting: Traceback (most recent call last): File "webScrapingTool.py", line 1, in <module&g ...

I am looking for assistance with a Python web-scraping issue. I need help with scraping URLs from a webpage that has partially hidden pagination numbers. Can you lend a hand?

Looking to extract the URLs of each page from the pagination on a specific site located at . The challenge lies in the fact that not all page URLs are displayed simultaneously. When attempting to scrape using the provided link, only page-1, page-14, page- ...

Tips for efficiently scraping multiple hrefs within a webtable using Selenium

I am currently faced with the challenge of web scraping a website using Python and Selenium. The information I need is spread across different pages linked in the 'Application number' column. How can I programmatically click on each link, navigat ...

Automating website navigation using dynamic waiting with Selenium and Scrapy

Recently, I successfully developed a web scraper using Python and Selenium. Initially, I relied on fixed time-outs to load the page while making use of Ajax calls to retrieve data. However, I discovered that Selenium offers a built-in function called WebDr ...

Guide to scraping a website using node.js, ASP, and AJAX

I am currently facing an issue where I need to perform web scraping on this specific webpage form. This webpage is dedicated to vehicle technical reviews, and you can try inputting the car license CDSR70 for testing purposes. As mentioned earlier, I am u ...

What is the best way to extract information from an online discussion thread?

Currently, I am delving into the world of Curl/PHP and finding it quite fascinating. However, I have hit a roadblock that has been hindering my progress for a few days now, and I am in need of some assistance. I have come across some unique data that requ ...