Questions tagged [web-scraping]

Web scraping refers to the technique of acquiring targeted data from websites that do not offer an API or any other means of automated data extraction. It is essential to thoroughly investigate inquiries related to "Getting Started with Scraping" (such as using Excel VBA) since there exist numerous practical code examples. Approaches for web scraping encompass utilizing external tools, creating tailored software solutions, or even conducting manual data collection in a standardized manner.

Python automation tool to extract reviews from website sidebars using selenium

Recently, I've been exploring the world of review scraping on booking.com. One method I tried was randomly selecting a hotel and using both Selenium and BeautifulSoup to extract reviews. However, my attempts have not been successful as I am not gettin ...

Leverage the power of the OR operator in a Lambda function for Python web scraping

Referencing the example provided in this link - How to extract html links with a matching word from a website using python I developed a web scraping script to search for specific keywords in both recent and cached versions of a local newspaper. from bs4 ...

Navigate one level up or down from the current tag that contains a specified value by utilizing Scrapy

To extract the price text from within the custom-control / label / font style, I must use the data-number attribute data-number="025.00286R". This unique identifier differentiates between control section divs based on the letter at the end. <d ...

Certain HTML code on a webpage is currently inaccessible to me

Struggling with web scraping on a webpage due to inaccessible code. The part of the page's code that remains out of reach is only accessible through an anchor tag, with the following html: <a class="MTLink" href="#d192633539-47" title="example" &g ...

How can I retrieve the specific date of a LinkedIn post using Selenium for site inspection?

Currently, I am utilizing the Selenium Chrome driver to scrape profiles on LinkedIn. I am conducting an analysis for my blog post. I am looking for a method to extract precise dates from posts on LinkedIn in the format "dd.mm.yyyy" rather than "1 month ag ...

Evade Incapsula's JStest

Seeking assistance with updating my status using cURL on a website secured by Incapsula. I am encountering difficulty accessing the main page due to their JS test security measures, despite cloning headers, useragent, and IP. Can anyone suggest a solution ...

What is the process for extracting the download button URL and parsing a CSV file in Python?

In my Python Google Colab project, I am attempting to access a CSV file from the following link: After scrolling down slightly on the page, there is a download button visible. My goal is to extract the link using Selenium or BeautifulSoup in order to read ...

I'm looking to choose an xpath image with no class name using Selenium in Python - any suggestions on how to

Is there a way to select an image xpath without using a class name in HTML code? The specific code looks like this: <img alt="" class src="https://images.craigslist.org/00J0J_i9BI6mN6rKP_300x300.jpg"> When I try to copy the xpath by right-clicking, ...

URLs not being gathered by the web scraping tool

Currently involved in scraping data from this specific website to compile a database. Previously, I had functioning Python code available on GitHub that successfully accomplished this task. However, due to a major overhaul in the HTML structure of the site ...

Should I employ Scrapy Selenium to scrape the initial request page?

I have successfully implemented a solution using scrapy_selenium to scrape a website that uses JavaScript for loading content. In the code snippet provided below, you can see that I am using SeleniumRequest when yielding detailPage with parseDetails. Howe ...

Python can be used to extract data from Highcharts through scraping techniques

I have been attempting to extract information from the chart located at . I made an effort to gather the data by utilizing the corresponding XPaths for the data in the sections, but unfortunately, it was not successful. I experimented with using Scrapy: d ...

Error: Element not found - The specified CSS selector [id="loginUsername"] cannot be located in the document

Struggling to master the art of Web Scraping with Python through Selenium, my current testing grounds are Reddit.com. Unfortunately, I’ve hit a roadblock during script execution which halts at the login page, throwing this error message: (selenium.common ...

Automatically use JavaScript to send an email to my email address

I have a question I'm trying to solve: Is there a way to send myself an email notification (using a different address) when a specific event occurs in Javascript or Node.js? For example: if (100 > 90) { emailto="xxxxx.gmail.com" subject="It happened" ...

Tips for extracting dynamically loaded content from a website using Node.js and Selenium?

I'm currently encountering some challenges when trying to scrape a website that utilizes react for certain parts of its content, and I'm unsure about the reason behind my inability to extract the data. Below is the HTML structure of the website: view imag ...

Print the countdown of elements on the Python page by subtracting a specified number from the total count of

I'm looking to implement a page count down feature in my Python script for each page it navigates to. Below are my attempts so far. How can I achieve the desired result? In order to easily keep track of my script's progress, I have used the (len(elem_href ...

Interacting with Discord Using Selenium in Python

Greetings, I am encountering a problem with the code snippet below. The specific issue is that after opening a submenu and attempting to send down arrow keys, it does not work properly unless I use a multiplier like *100. Any assistance on this matter wo ...

Unable to interact with element on the following page using Selenium in Python

Attempting to extract data from multiple pages of a URL using Selenium in Python, but encountering an error after the first page. The code successfully navigates to the subsequent pages but fails to scrape, throwing an error message: "Element ... is not ...

Mastering the art of Python to extract all XPATHs from any website

Is there a method to extract product listings from various websites without needing to manually loop through each XPATH? Some sites like Amazon and Alibaba display up to 10 products per page, while others may have 20. I am looking for a way to retrieve a ...

Using Selenium in Python to extract information from a dynamic table with various dropdown menus

Recently, I've delved into the world of web scraping and am currently in the process of extracting data on various water utilities from a website. This site offers options to select different regions, and my goal is to output this information into a c ...

Having trouble extracting a specific field within a JSON on a webpage using VBA

My current project involves extracting property data from this specified link, which returns a JSON response. I've utilized a combination of JSON and VBA converter tools for this task. However, upon executing the script provided below, I consistently encou ...

extract data from numerous pages using fixed link

After previously seeking help on navigating multiple pages with static URLs at , I appreciate the assistance received so far! However, my current goal is to extract ethnicity information for every character listed by clicking on each name. Although I am ab ...

Tips for extracting particular information from a website and displaying it within my application

While I have searched for a solution to my problem, I couldn't find an answer that fits my specific needs. I am attempting to extract specific information (highlighted in red) from this website: . I understand that JSON parsing can retrieve this data, bu ...

The TimeoutException from selenium.common.exceptions has occurred: Notification

https://i.stack.imgur.com/telGx.png Experiencing the following issue with the line pst_hldr. Identified the error: File "/home/PycharmProjects/reditt/redit1.py", line 44, in get_links pst_hldr = wait.until(cond.visibility_of_element_locate ...

Preventing a specific element from appearing with Selenium

Currently, I am utilizing the Selenium Firefox web driver to extract data from a specific webpage. This webpage consists of multiple subpages (ranging from 1 to 100 pages), and I am continuously navigating through them to scrape the necessary data. Howev ...

Tips for guaranteeing that you receive a 200 response code for numerous requests during web scraping operations

After experimenting with different scraping methods, I've found that using a cloud scraper allows me to bypass Cloudflare protection for only a limited number of requests before receiving a 403 response. Even when I tried modifying the user-agent string in ...

Our job site Selenium webscraper encounters a glitch at a particular step during its operation. How can this be resolved and pinpointed the root of the issue?

Several months ago, I created a web scraper to track job listings for a football club. Everything was running smoothly until about a week ago when the program started experiencing multiple issues. Despite my efforts to troubleshoot and make changes to the ...

What is the best way to locate an element on a website that consistently changes its label ID buried deep within nested DIVs?

I'm facing an issue with my website tracking system, where I am trying to extract shipment details using Selenium in Python and save them to an Excel file. However, I am encountering difficulties with getting Selenium to function properly. It keeps sh ...

Error encountered when executing Selenium WebDriver

Currently in the middle of an online Python course and making progress, but encountering a challenge with pulling HTML data. The TypeError that occurs every time the process finishes has me puzzled. I've followed the instructor's steps closely, y ...

Website information not loaded into browser even after being displayed with HTML requests

I've been testing out html-requests on different websites, and I'm facing issues with extracting the stock price from this specific site: My approach involves using html-requests and utilizing html.render to execute javascript code. However, the data isn' ...

Is it possible to access a hidden JavaScript variable in Selenium?

Is there a way to extract the array "o" that contains data used for drawing a polygon? Simply using driver.execute("return o") or console.log doesn't seem to work. Any suggestions on how to achieve this? const zt = function(e, t, n, r) { c ...

The dropdown menu in Selenium is experiencing issues with option selection

I am currently working on scraping data from a specific website which can be found at the following link: My task involves navigating to the terminal code column and selecting 'General Cargo' Below is the HTML code snippet: <select name="terminal" id ...

Encountering problems with the python and selenium code I used to create my Twitter scraper

I have developed a Python script that extracts information like name, tweets, followers, and following from the profiles available in the "view all" section of my Twitter profile page. The script is currently functioning as intended. However, I have encoun ...

Selenium in Python encounters difficulty locating web element

I've been attempting to extract posts from a forum I found at this specific URL: The main content I'm trying to pull is located within: <div class="post-content"> However, no matter if I use get element to search by XPATH or CLASS_NAME, I ...

The dynamic dropdown on https://www.nseindia.com/ does not display auto-suggestions when Selenium and Python are used to pass values

driver = webdriver.Chrome('C:/Workspace/Development/chromedriver.exe') driver.get('https://www.nseindia.com/companies-listing/corporate-filings-actions') inputbox = driver.find_element_by_xpath('/html/body/div[7]/div[1]/div/section ...

Automating Checkbox Selections with Selenium in Python

I'm having trouble clicking on a checkbox. Here is the HTML Code: <div class="mb-1 p-3 termsCheck"> <input class="form-check-input float-end" type="checkbox" value="" id="flexCheckDefault&qu ...

Python program that runs a loop through a file, sends API requests using the `requests` library, and saves the results

Currently working on a program to loop through movie titles in a text file and make API calls to store the responses. The text file contains a single title on each line, for example: Titanic Avatar A Star Is Born The API being used is from a website ca ...

Python's BeautifulSoup is throwing a KeyError for 'href' in the current scenario

Utilizing bs4 for designing a web scraper to gather funding news data. The initial section of the code extracts the title, link, summary, and date of each article from n number of pages. The subsequent part of the code iterates through the link column and ...

Discover the solution by utilizing XPath

I am struggling to extract data from an HTML table: <div class="parameters"> <div class="property">property 1</div> <div class="value">value</div> </div> <div class="paramete ...

Challenges encountered while executing a loop using Selenium in Python code

I need to scrape data from the following website . My task involves selecting "Rio de Janeiro" in the 'Estado' field. Entering an empty value in the 'Nome' field. Performing a search. Once the table results are displayed, cli ...

What is preventing selenium from locating the chrome driver?

As I was following a tutorial on creating a web scraper for Twitter using Selenium and Python, I encountered an issue. File "C:Python34libsite-packagesseleniumwebdriverchromewebdriver.py", line 62, in __init__ self.service.start() Fi ...

Discovering the parent element of a find_element_by_partial_link_text: a step-by-step guide

Using the find_element_by_partial_link_text selector to locate the "next" button for crawling continuity has proven problematic for me. The presence of the word "next" in various other links on the page often disrupts the script execution. Despite attemp ...

Utilizing Scrapy for Extracting Size Information obscured by Ajax Requests

I am currently trying to extract information about the sizes available for a specific product from this URL: However, I am encountering difficulty in locating the details hidden within the Select Size Dropdown on the webpage (e.g., 7 - In Stock, 7.5 - In ...

Struggling to Connect to Geckodriver Service on Mac using Python? Here's the Solution!

Recently, I encountered a new issue with one of my web scrapers on my Mac. After leaving the scraper idle for about a month, it has mysteriously stopped working! I suspect something may have become outdated, but I'm unable to pinpoint the exact cause. ...

Obtaining a data table from a website using various levels of tag hierarchy

I attempted to extract data from the table located at using Python's lxml library. However, when I used code snippets similar to those in this How to extract tables from websites in Python question, I encountered issues with <a>-tags and image ...

Looking for a Solution to Secure Files Using Selenium Webdriver During Threading?

Implementing multiple threads to initialize PhantomJS or Chromedriver with the following code: Driver= webdriver.PhantomJS('C:phantomjs.exe',desired_capabilities=dcap, service_args=service_args) or Driver= webdriver.Chrome(executable_path='C:/chromedr ...

Executing a program using Selenium to gather information

Current Progress I successfully developed a Python script that utilizes Selenium to open a Firefox browser and extract data to an excel file. Furthermore, I have converted this script into an executable file without encountering any errors thanks to Pyins ...

What steps can I take to prevent BeautifulSoup from interpreting commas as tab characters?

My recent project involved creating a web scraping code to extract information from a local news website. However, I have encountered two issues with the current code. One problem is that when the code retrieves paragraph data and saves it to a CSV file ...

Scrapy is adept at gathering visible content that may appear intermittently

Currently, I am extracting information from zappos.com, specifically targeting the section on the product details page that showcases what other customers who viewed the same item have also looked at. One of the item listings I am focusing on is found her ...

What is the best way to utilize Selenium with Python for choosing an item from a dropdown menu when the choices are contained within non-interactable <div> elements?

I am currently working on a program that navigates through the well-known website using selenium automation. However, I have encountered an obstacle on the second page where there is a dropdown menu prompting users to select a top-level domain. The dropdo ...

Python web scraping with selenium

Looking to extract all href contents from the "news" class (URL provided in the code), I attempted this script but encountered issues... Here is the code snippet: from bs4 import BeautifulSoup from selenium import webdriver Base_url = "http://www.thehin ...

Mastering the art of selecting a web element using xpath with Python and Selenium

Struggling with web scraping on a site featuring this specific structure. <div> <div class = “class1” > <div class = “class2” > <div class = “class3” > <div style = “clear: both; ” > </div&g ...

Locate child elements within a parent class that includes the term "hijack" using

Check out this page: Extract data from: <tr> <td class="event_type">Possible Hijack</td> <td class="country"> </td> <td class="asn">; <i>Expected Origin AS:</i ...

Waiting for Elements to be Added to Parent in a Lazy Loading Website with Selenium and Python

When working with Selenium's expected conditions for wait, such as those mentioned here: https://selenium-python.readthedocs.io/api.html#module-selenium.webdriver.support.expected_conditions, I find myself unsure about which one to use or if that is e ...

Extracting the number of likes from each post on a specific profile using Python scrapy

Here is the code snippet that I am currently working with: #IMPORT THESE PACKAGES import requests import selenium from selenium import webdriver import pandas as pd #OPTIONAL PACKAGE, BUY MAYBE NEEDED from webdriver_manager.chrome import ChromeDriverManage ...

Utilize Regex to isolate text between specified string markers

I am attempting to scrape a website and I need to extract the JSON data from the data variable in the JavaScript code below using Python Regex. <script type="text/javascript"> P.when('A').register("ImageBlockATF", function(A){ var data = { ...

Ways to iterate through child elements using the Selenium xpath method

I'm currently facing an issue while trying to extract names from reviews of a specific product. I am struggling with looping over each review block to retrieve the names nested within them. This is the code I have written so far: from bs4 import Beau ...

Experimenting with Selenium Webdriver to locate an input element by its name and input a specific value

I'm having trouble inputting a value in the final box. (Chassis Number) Here's what I've attempted: python from selenium import webdriver from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.by imp ...

A guide on extracting links from a webpage using selenium

I've been attempting to extract the links that end with 20012019.csv from a specific webpage using the provided script, but I keep encountering a timeout exception. I believe I have followed the correct approach for this task. Nevertheless, I would g ...

Extracting the text content of a specific tag while ignoring the text within other tags nested inside the initial one

I am trying to extract only the text inside the <a> tags from the first <td> element of each <tr>. I have provided examples of the necessary text as "yyy" and examples of unnecessary text as "zzz". <table> <tbody> <tr ...

Learn how to execute JavaScript code in Selenium without launching a web browser

I am currently using the Selenium API for web scraping on pages that contain JavaScript code. Is there a way to retrieve the code without having to open a web browser? I am still learning how to use this API Is this even feasible? ...

Extracting information from several span elements within the main span class using BeautifulSoup

I'm currently attempting to extract data from the following HTML code: <span class="double-line-ellipsis"> <span> ₹ 2800 for 2 (approx) </span> <span> | </span> <a data-w-onclick="stopClickPropagation|w1-restara ...

What is the best way to access the statistics for each game on a live score platform?

I am looking to retrieve the statistics for each game on livescore and gather all the stats of all games in a day simultaneously. driver.get('https://www.livescore.com/en/football/2022-12-01/') time.sleep(2) scroll_pause_time = 1 # You can adjust ...

Selenium Scrolling: Improving Web Scraping Efficiency with Incomplete Data Extraction

I have been attempting to extract product data from a website that utilizes JavaScript to dynamically render HTML content. Despite using Selenium, implementing scrolling functionality to reach the end of the page, and allowing time for the page to reload, ...

I am attempting to select a checkbox with Selenium WebDriver in Python, but encountering an error message: "MoveTargetOutOfBoundsException: move target out of bounds"

As I was attempting to interact with a checkbox on this particular webpage , I encountered some challenges. Initially, I tried using the ActionChains library to click on the checkbox by locating its input tag via xpath or css selector and then using the ...

Python - Extracting text with Beautiful Soup, but missing certain parts

I'm attempting to scrape the top 100 job listings in the United States from this source. However, when I execute the following code: import urllib.request from bs4 import BeautifulSoup url = 'https://www.ranker.com/list/most-common-jobs-in-america/america ...

Gather information that is dynamic upon the selection of a "li" option using Python Selenium

I need to extract data from this website (disregard the Hebrew text). To begin, I must choose one of the options from the initial dropdown menu below: https://i.stack.imgur.com/qvIyN.png Next, click the designated button: https://i.stack.imgur.com/THb ...

Python -Finding flexible outcomes choosing - Google exploration -tips on implementing

Currently, I am attempting to gather data from certain web pages by accessing them through Google Search. To ensure accuracy in my results, I need to implement a list of restricted words. Let's imagine that the top 4 search results for Python on Goog ...

Scraping data from the web using Selenium and the 'CLASS_NAME' attribute should only exclude certain elements

In Python, when using Selenium for web scraping, is there a way to locate an element by its CLASS_NAME but only return the elements under the class name 'xxxx' and not those under 'xxxxyy'? The following code retrieves all elements with the CLASS_NAME of ...

Selenium ceases to function intermittently

I am attempting to extract data from a website that has a total of 9000 pages. However, the extraction process stops after retrieving approximately 1700 pages and then restarts from the beginning and continues for about 1000 more pages. Is there a way to s ...

Tips for ensuring a page has fully loaded before extracting data using requests.get in Python without relying on an API

Currently, I am using Python along with the requests library for web-scraping. I've encountered an issue regarding the loading of a page; I would like to implement a delay before receiving the result from requests.get(). I have come across some indiv ...

Having trouble locating element using xpath within a div containing ::before pseudo-element

When working with the web driver object, I often need to obtain a list of web elements. findElements(By.xpath("")); Usually, I use xpath like //*[@class="providers-list clearfix"] to retrieve the list. However, I encounter an error when trying to acces ...

Having trouble continuously clicking the 'more' button to access all the complete reviews

I have developed a Python script using Selenium to extract all the reviews from a specific page on Google Maps. This page contains numerous reviews that are only visible when scrolling down. My script successfully retrieves all of them. However, I am curr ...

python utilizing selenium for scraping data from multiple href links

Click this link to test it out: I've been able to extract the links for all product detail pages, but I'm only getting one result at the end. It should be going through all the links and extracting the names and image URLs. What am I missing her ...

Unveiling the Secrets of AJAX Requests with Enigmatic Strings Attached to URLs

My goal is to keep track of cricket scores on scorespro/cricket by utilizing browser AJAX requests. By analyzing the network traffic in Google Chrome, I have noticed that my browser sends out requests in this format: Whenever I click on the response withi ...

What is the best way to extract data from a nested web table with rows within rows?

Currently, I am attempting to extract data from a table where each row contains multiple rows of information within a single column. My goal is to scrape each individual row inside the main row and create a data frame with it. Additionally, I want to use ...