Questions tagged [web-scraping]

Web scraping refers to the technique of acquiring targeted data from websites that do not offer an API or any other means of automated data extraction. It is essential to thoroughly investigate inquiries related to "Getting Started with Scraping" (such as using Excel VBA) since there exist numerous practical code examples. Approaches for web scraping encompass utilizing external tools, creating tailored software solutions, or even conducting manual data collection in a standardized manner.

Troubleshooting API password issues when fetching a Json from an API using VBA in Excel

I'm facing challenges with an API that provides natural gas data. The documentation for this API can be found at . It allows me to access Json-formatted data by inputting a URL into my internet browser. However, in order to download the Json data, I need t ...

Scraping a website with Python that contains redirection to another website

I'm struggling with scraping the contents of a specific web page. Here is an example of my Python code: response = requests.post('http://a836-acris.nyc.gov/bblsearch/bblsearch.asp?borough=1&block=733&lot=66',{'User-Agent' ...

locate several elements based on their unique identifiers

I am currently attempting to locate numerous elements using their individual ids. The elements have the following names: dcm-reservation-limit-multiple-input-generic-X where X represents the number of elements, such as: dcm-reservation-limit-multiple-in ...

What is the most effective Xpath for retrieving text from <td> elements when there is text present in both?

I have the following XML that needs to be extracted: <div class="tab_product_details"> <table> <tbody> <tr>...</tr> <tr>...</tr> <tr>...</tr> <tr> ...

Disabling Notifications using Selenium for Microsoft Edge Browser Driver

My Selenium script opening the Edge WebDriver is encountering issues with notifications popups that interfere with clicking on buttons on websites. I suspect these notifications are overlaying the website content, making it difficult for the code to locate ...

retrieve the class name of a reusable component in HTML

I have been assigned a web scraping project and I am required to scrape data from the following website for testing: The HTML: <div class="quote" itemscope itemtype="http://schema.org/CreativeWork"> <span class="text& ...

Preserve the information within the Postgresql database

I'm struggling with saving the scraped data in a Postgresql database. I attempted to use Psycopg2 without success, so now I'm considering using Django models instead. The scraper needs to collect data from every blog post on each page and store it in the ...

Is there a different option to use instead of time.sleep() when web scraping with Selenium in Python?

I am currently working on a project that involves scraping prices for specific food items based on different locations across the country. One of the features allows users to enter the name of a city in an input text box and then view a list of available i ...

Retrieving URLs for CrawlSpider using Scrapy

I have developed a script to systematically catalog all URLs on a website. Currently, I am utilizing CrawlSpider with a rules handler to manage the scraped URLs. The "filter_links" function checks an existing table for each URL and writes a new entry if i ...

Encountering an ElementClickInterceptedException while trying to click the button on the following website: https://health.usnews.com/doctors

I am currently in the process of extracting href data from doctors' profiles on the website . To achieve this, my code employs Selenium to create a web server that accesses the website and retrieves the URLs, handling the heavy lifting for web scraping. Wh ...

Python and Selenium: Mastering the Art of Drop-Down Menus

I am currently on the hunt for a dropdown element in order to select an option from it. I know that Selenium has a built-in class specifically for handling select drop-downs, but I'm having trouble locating the actual element. Could someone point out ...

Is there a way to extract information from a <span> element with restricted access?

I'm currently utilizing bs4 and urllib2 to extract data from a website. Check out the webpage here. The goal is to retrieve the remaining digits of the telephone number 3610...... but beforehand, it's necessary to click on a button to reveal th ...

Scraping social media followers using web scraping, however, the list is massive with hundreds of thousands. Selenium crashes due to memory overload

After using Selenium in Chrome to gather usernames from a social media profile, I encountered an issue with the limited loading of the page and Chrome crashing due to running out of memory. The list of followers is extensive, reaching hundreds of thousands ...

List index out of range error occurs in the if-else block when the condition is met

This script retrieves data from Oddsortal website: import pandas as pd from bs4 import BeautifulSoup as bs from selenium import webdriver import threading from multiprocessing.pool import ThreadPool import os import re from math import nan class Driver: ...

Compiling a directory of video URLs - hindered by email encryption

I am relatively new to Python and coding, so please bear with me as I explain my current project. Basically, what I'm trying to accomplish is creating a script that opens my monthly fire department training page, navigates to the video section where differ ...

Querying HTML wrapped in a JSON response using Scrapy: a step-by-step guide

I'm currently in the process of scraping a website that relies on dynamically loaded content through JavaScript. In my attempts to request the data source, I received a JSON response where a key 'results_html' holds all the HTML necessary for querying an ...

Leveraging Selenium for extracting data from a webpage containing JavaScript

I am trying to extract data from a Google Scholar page that has a 'show more' button. After researching, I found out that this page is not in HTML format but rather in JavaScript. There are different methods to scrape such pages and I attempted to use Sele ...

Obtaining the first element with Selenium's CSS Selector: Tips for accessing multiple elements

https://i.stack.imgur.com/BBk53.png https://i.stack.imgur.com/WEa6i.png Trying to extract multiple reviews using cssSelector from a div element. public void getFacebookData() { driver.get("https://www.facebook.com/?stype=lo&jlou=AffEX_j6PH-b ...

Troubleshooting Datepicker Problems with Python Selenium

When attempting to retrieve the availability and price for each day on , I navigate through the calendar by checking which days are booked or not, and then clicking the "next" button to move to the next month. In addition, I click on the arrival date and ...

How to use Selenium in Excel VBA to wait for a specific value to appear

Having an element with the following HTML: <span id="ContentPlaceHolder1_Label2" designtimedragdrop="1319" style="display:inline-block;color:Firebrick;font-size:Medium;font-weight:bold;width:510px;"></span> Upon clicking the Save button on ...

Is there a way to locate the content enclosed within a span tag without any specific class or ID attribute using Selenium with Python?

I have been attempting to utilize Selenium in order to locate the text within a span element that lacks any specific attributes such as class or id for identification. Here is the HTML structure: HTML snippet obtained from inspecting the element in Chrome ...

What is the best way to extract data from a website that shuffles its media files every time it is refreshed?

Trying to extract media files from a specific website with notes has been quite the challenge. Despite easily downloading the files, they are not in the correct order. It seems that the website makes an Ajax call after scrolling to page 30 and then loads ...

TikTok pages are failing to load with Selenium

I'm currently working on a TikTok crawler project that uses both selenium and scrapy start_urls = ['https://www.tiktok.com/trending'] .... def parse(self, response): options = webdriver.ChromeOptions() from fake_useragent import UserAgent ua = ...

Extracting table information from a webpage with PowerShell

As a newcomer to both PowerShell and HTML, I am venturing into the realm of extracting table data from a webpage using the powerful combination of PowerShell and Selenium webdriver. My approach involves automating the process of launching a specific webpag ...

What is the best method for extracting content from <li> tags that have the class "active" or "selected"?

I am currently facing a challenge while trying to extract a list from a website. The website has two separate lists, with the second one only loading after selecting an option from the first list. Unfortunately, I am having trouble selecting the first opti ...

Retrieving data from intricate JSON structures

Currently, I am engaged in web scraping to extract the "id" of all locations from a complex json content. Click here for the JSON link I attempted using the dict.items method, but it only extracted 2 values at the start of the dictionary followed by a li ...

"Mastering the art of inputting values on a webpage with the use of

Consider the following HTML structure: <div class="divSearchContainer"><input type="search" class="FL H100P" placeholder="Select"><div class="divSearchIconConatiner H100P CP FL" title="S ...

Utilizing Python with Selenium for Web Scraping

https://i.stack.imgur.com/klhCL.png I need to retrieve numbers like 14,401. I have attempted the following code: WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[@id='wiz-iframe-intent']"))) WebDriverWait(drive ...

Utilize Beautiful Soup, Selenium, and Pandas to extract price information by scraping the web for values stored within specified div class

My goal is to retrieve the price of a product based on its size, as prices tend to change daily. While I succeeded in extracting data from a website that uses "a class," I am facing difficulties with websites that use div and span classes. Link: Price: $ ...

Is the content of the website altered by using the request.get method? (Web scraping)

While attempting to extract data from a particular website using the requests.get method, I encountered an issue. The information retrieved from the website seems to be inconsistent and does not align with the actual data displayed on the site. For instan ...

Utilize Python Selenium to extract the complete table data from a React.js application

I attempted to extract information from a table on wyscout.com, which appears to be constructed with Reactjs. Once logged in, the script selects the country (e.g. England), League (e.g. Premier League), and Team (e.g. Arsenal). From there, it navigates to ...

Extracting YouTube Videos Using Python and Selenium

My friend asked me to scrape all the videos from 'TVFilthyFrank'. I have access to all the links for each video. I want to determine the size of each video in MB and then proceed with downloading them. However, using driver.get(VIDEO_URL) and ext ...

A guide on using Python to interact with webpage elements

Custom Image Link I need assistance with coding a function that can interact with the ellipsis icon on a specific webpage. I have provided details about its location. Due to security measures, I am unable to share the exact page where the ellipsis is loc ...

Is Amazon altering the names of their CSS selectors and HTML elements on the fly?

For my Amazon.es web scraper built with Selenium, I am using a CSS selector to determine the total number of pages it will iterate through. However, the selector name seems to change dynamically and I must update it daily. As someone not well-versed in H ...

Uncovering content across multiple pages using BeautifulSoup for web scraping

Currently, I am working on a project for one of my courses that involves web scraping data from Bodybuilding.com. My main objective is to collect information regarding the members of the website. Initially, I was able to successfully scrape data from the 1 ...

Scraping data from a dropdown menu to retrieve both the selected option and its corresponding outcomes

My experience with webscraping is limited to the basics, so this task is a bit out of my comfort zone. What I'm hoping to achieve is a comprehensive list of farmers along with the markets they sell at. The website features a table where you can select ...

Uncover concealed email addresses on a webpage

On this website , I am trying to extract the email. I attempted using requests and Beautifulsoup without success. I also wrote this code utilizing selenium, but it did not work: from selenium import webdriver url = "https://aiwa.ae/company/arad-bui ...

Extracting information from a dynamically changing and sporadically updated website

Is there a way to scrape a web application and retrieve the values from a table as soon as new values are added? If not, what is the best method to scrape the website?Visit the website here. The current code I have only allows manual scraping which result ...

When a function is called within a 'For' loop, it may result in a NameError

I've encountered an issue while attempting to call a function from within a FOR loop. The error message I receive is: test() NameError: name 'test' is not defined Below is the code in question: from selenium import webdriver from selenium.common.excep ...

When using Selenium webdriver, the function find_elements_by_X sometimes results in an empty list being

My objective is to compile a list of the names of all newly posted items on within a 24-hour period. After some research, I've discovered that Selenium is the ideal tool for this task as the website I am scraping is dynamic and loads more content as the ...

Determining when the scroll bar has reached the end using Selenium in Python

I'm working on implementing a while loop in Selenium, and I want to set a condition for the loop to stop when the scroll bar reaches the end of the page. How would I go about coding this type of condition within the while loop? Right now, my loop is set to ...

While conducting my web-scraping, I encountered instances where certain divs were not visible on the page

Attempting to extract data from an HTML page using the following code : driver = webdriver.Chrome() driver.get(url) try: element = WebDriverWait(driver, 20).until( EC.presence_of_element_located((By.CLASS_NAME, & ...

Using Python and Selenium to Retrieve and Load All Website Comments

I am trying to extract around 7000 comments from this link. The challenge is that the website only displays 10 comments at a time, so I am using Selenium in Python to load all comments and then parse them with BeautifulSoup. Here is the HTML segment of th ...

Encountering difficulties extracting audio from the webpage

Attempting to extract audio(under experience the sound) from 'https://www.akrapovic.com/en/car/product/16722/Ferrari/488-GTB-488-Spider/Slip-On-Line-Titanium?brandId=20&modelId=785&yearId=5447'. The code I have written is resulting in an ...

Is there a way to automate the process of navigating through multiple pages in order to extract and download Excel files using

I am currently developing a web scraping tool that is designed to navigate through website pages in order to extract Excel files from a dropdown menu located at the bottom of each page. Unfortunately, the webpages only allow me to download the 50 location ...

Is there a method to review all the details of a button on a website once Selenium has identified that specific element?

Could I potentially analyze the attributes of a button element that I have selected using selenium? I am currently utilizing selenium to navigate through complex JavaScript-based web pages. My goal is to download certain files from these pages, but before ...

How can the jQuery click() method be utilized?

Currently working on a web scraping project, I have managed to gather some valuable data. However, I am now faced with the challenge of looping through multiple pages. Update: Using nodeJS for this project Knowing that there are 10 pages in total, I atte ...

Using Node.js to retrieve table data from a URL

My journey with Node JS and express is just beginning as I dive into building a website that serves static files. Through my research, I discovered the potential of using NodeJS with Express for this purpose. While I have successfully served some static HT ...

CSV file displaying incorrect data due to xPath expression issue

I have written a code to extract data for the values "Exam Code," "Exam Name," and "Total Question." However, I am encountering an issue where the "Exam Code" column in the CSV file is populating with the same value as "Exam Name" instead of the correct ...

Automate Data Extraction from Tables Using Python Selenium and Convert them into a DataFrame

I am just starting to learn Python and selenium, and I'm facing a challenge that I need help with. Currently, I am attempting to extract data from a particular website: "" The goal is to convert the table on this website into a dataframe similar to ...

Receiving text with embedded newline characters

As I scrape data from a website, the output I am receiving is as follows: ['1 tablespoon vegetable or coconut oil 1 tablespoon peeled and minced fresh ginger (from a 1-inch piece) 2 cloves garlic, minced 3 tablespoons vegan Thai red curry paste, su ...

Combining cURL with multiple URLs for efficient result parsing

I am currently developing a PHP web scraper with the following objectives: Retrieve content from less than 10 URLs using cURL, Add the HTML content of each URL to a DOMDocument, Search the DOM document for <a> elements that link to PDF files, ...

Using R for web scraping, I encountered a unique situation where the 'next page' button was only programmed to navigate to the top of the page

I am attempting to extract the headline snippets from a local newspaper's articles by utilizing rvest and RSelenium. To access pages beyond the initial one, I need to click on the 'next page' button. Strangely, when I perform this action through RSelenium, ...

I encountered a problem extracting title URLs with Python during web scraping

I have encountered an issue while trying to scrape title URLs using my code. Can someone please help me troubleshoot it? Here is the code snippet: import requests from bs4 import BeautifulSoup # import pandas as pd # import pandas as pd import csv def ...

Python - Retrieving Required Text from a <td class = "text">Grab This Information</td>

Being a beginner in using selenium and python, my main objective is to retrieve the revenue value for a specific company from the Hoovers website. Here's my current code: company = 'Trelleborg' page = 'https://hoovers.com/company-information/cs.html?term ...

Having trouble with XPath in Scrapy?

Looking to extract data from the XPath provided below: /html/body/div[2]/div[2]/div/div/div[4]/ul[2]/li/div Currently testing this with Scrapy Shell using the following commands: scrapy shell "https://www.rentler.com/listing/520583" and running: hxs.s ...

Encountering a JSONDecodeError with the message "Expecting value: line 1 column 1 (char 0), I have come across this

Looking for a solution to the JSONDecodeError: Expecting value: line 1 column 1 (char 0) error? Check out the code snippet provided below: from urllib.request import urlopen api_url = "https://samples.openweathermap.org/data/2.5/weatherq=Lon ...

Is there a way to locate classes containing underscores within them using Selenium?

Having trouble scraping a webpage with a class name containing an underscore, specifically this element: <span class="s-item__time-left">30m</span> == $0 I attempted to locate it by class name: time = driver.find_elements_class_name("s-item_ ...

How to Enhance HTML Requests with a Variety of Search Terms

I recently came across a helpful post on how to use R to search for news articles on Google. The post provides a link to Scraping Google News with Rvest for Keywords. The example in the post demonstrates searching for a single term, such as: keyword <- ...

Utilizing Selenium to extract information from a data table

I am facing a challenge with extracting data from a paginated table using Selenium. Despite having code that can successfully retrieve data, it only grabs the first 50 results out of the entire table. I believe utilizing Selenium to iterate through all p ...

The best way to pause execution for a specific manual action with Selenium in Python

Is there a way to make my script wait for a manual click on a submit button similar to the website below? driver.get("http://www.propertyguru.com.sg/singapore-property-listing?listing_type=sale&search_type=district&property_id=&interest=&d ...

What is the best method for extracting information from the about section of a Facebook page?

Is it possible to extract information from the Facebook About section using tools like the Facebook Graph API or Python web-scraping libraries such as Scrapy and Beautiful Soup? ...

Could this site be inhibiting my scraping efforts using BeautifulSoup?

For the past few years, I've been utilizing BeautifulSoup to extract TopCashBack website links. However, when I attempt to change the URL to a Screwfix link, I am not able to retrieve any data. s = requests.get("https://www.screwfix.com/p/128hf&q ...

Python and Selenium are having trouble locating the search bar

My attempt to locate and interact with the first search box on the following website has been unsuccessful: This is the code I've used: for ii in testList2: varTitel = ii searchBox = driver.find_element_by_id('MainContent_SuchworteField') ...

Learn the steps to automate clicking on the "next" button using Selenium or Scrapy in Python

While attempting to gather data from flipkart.com using scrapy, I successfully collected everything except for navigating to the next page. Initially, I attempted to use scrapy followed by selenium. Interestingly, a class contains two links - one for the p ...

What could be causing the promises in Promise.all to remain in a pending state?

After restructuring my code to correctly utilize promises, I encountered a challenge with ensuring that the lastStep function can access both the HTML and URL of each page. To overcome this issue, I'm attempting to return an object in nextStep(). Alt ...

Is there a way to extract the complete table from a website and import it into an excel spreadsheet?

I am attempting to extract the complete table data from the following website: Note that upon clicking the link, a public login button will need to be clicked first. I have already set up a bot to handle the login process and navigate through the site, so ...

Inspecting contents of browser using Python for web scraping

Currently, I am utilizing Python along with Selenium and Chrome web drivers to conduct web scraping within Visual Studio Code. Upon sending a GET request like this: driver.get('https://my_test_website/customerRest/show/?id=123') I am curious ab ...

What is the process for obtaining the most recent stock price through Fidelity's screening tool?

Trying to extract the latest stock price using Fidelity's screener. For instance, the current market value of AAPL stands at $165.02, accessible via this link. Upon inspecting the webpage, the price is displayed within this tag: <div _ngcontent-cx ...

Tips for extracting text content that is not within an HTML element

Looking to extract data from this particular webpage: The information I'm interested in scraping includes Product Sku, Price, and List Price. I've successfully scraped the Price but I'm encountering issues with the other two, particularly t ...

Discovering the audio file URL hidden within javascript code

Is it possible to programmatically locate a link to an audio pronunciation clip on a website? I am in the process of creating a personalized language learning Anki deck. The specific site I am referring to is: When clicking on "Framburður," the audio cli ...

Unable to cycle through various categories by clicking in order to navigate to the desired page

After creating a Python script using Selenium to click through various categories on a website and reach the target page, I encountered an issue. The script works once but throws a 'stale element' error when trying to repeat the process. How can I address ...

Using BeautifulSoup to scrape a URL and generate a list of link addresses

Seeking insights on website similarity, I aim to extract data from the following link: Focusing on class='site', my goal is to retrieve information like: <a href="/siteinfo/ebay.com" class="truncation">ebay.com</a> ...

Tips for locating an element beyond the page source with Puppeteer

My goal is to extract specific information from a webpage by utilizing this code snippet to target an element and retrieve certain values within it: const puppeteer = require('puppeteer'); function run (numberOfPages) { return new Promise(async (reso ...

Execute multiple PHP scripts from a central script at various time intervals

I have a set of custom PHP scripts that I use in my browser to scrape data from URLs and display it either as a table or download it as an Excel file. However, when I try to process more than 3 URLs at once, I keep encountering a network connection error ( ...