Questions tagged [screen-scraping]

Utilizing screen-scraping, an approach also referred to as web-scraping or data-scraping, involves extracting and interpreting data from user interfaces. When seeking guidance on scraping data from websites or web-APIs, be sure to utilize the appropriate [web-scraping] tag.

The HtmlUnit Ajax call in Java does not appear to render properly on the HtmlPage

My goal is to scan a webpage using HtmlUnit 2.31 by simply obtaining an HtmlPage via URL. The issue arises when the page triggers AJAX calls (without user interaction). I need to wait for these calls to complete and see the resulting values. Below is my co ...

Navigating redirects in HTML parsing with Python

I am currently experimenting with submitting multiple forms using a Python script and making use of the mechanized library. The purpose behind this is to set up a temporary API. The issue I am encountering is that upon form submission, a blank page appea ...

Sometimes, requests in Python may yield an empty list as the result

Recently, I've been attempting to extract the "2005 - 2013" from the text "Drink Between 2005 2013". Initially, this code was functioning properly for me. However, now it only returns empty lists even though my requests still receive a status code of ...

What is the best approach for handling empty list items during web data scraping?

I need help with scraping data into a CSV file from a website that lists contact information for professionals in my field. Everything works smoothly until I encounter a page where certain entries are missing specific details. For instance: I am gatherin ...

Ways to extract information from a table cell located next to a table cell with colspan

As a beginner in web scraping, I am gradually making progress. However, I'm facing a tough challenge with this particular task. My goal is to extract data from the ESPN NBA boxscore website: I aim to scrape the names of players labeled as "DNP" (Did Not ...

Obtain Selenium Server directly from the website

Is there a simple way to send commands to a selenium server via a web interface? I need to automate filling out multiple online forms that require login credentials, but I want to do it remotely so my team can also access it. Currently, we are manually ent ...

Attempting to unveil concealed download URLs

Trying to extract download links from a website, but the format is as follows: <form action="" method="post" name="addondownload" id="addondownload" > <input type="hidden" name="addonid" id="addonid" value="2109" /> <input class="re ...

Error in Selenium: Unable to interact with submit button

For the past day, I have been trying to scrape a website, but I am unable to make the submit button work when clicked. Here is the code snippet: button = driver.find_element_by_id('ctl00_PlaceHolderMain_g_6c89d4ad_107f_437d_bd54_8fda17b556bf_ctl00_btnSear ...

Exploring the possibilities of web scraping using phantomJS and NodeJS

Currently, I'm working through a tutorial found at the following link: However, when I execute the code provided in the tutorial: var host = 'http://www.shoutcast.com/?action=sub&cat=Hindi#134'; var phantom = require('phantom'); phantom.create(f ...

Utilizing getElementsByClassName for web scraping: encountering inaccurate outcomes

I am attempting to extract the inner text from all classes with the className = "disabled" within the provided snippet of HTML Code: HTML code In my effort to achieve this task using MS Access (VBA), the code I have implemented is as follows: Set IE = C ...

Scraping data using Selenium with paragraph tags

I've been scouring the internet for examples similar to the one in question, but with no luck. The challenge at hand revolves around extracting text from a webpage where only one of two p tags contains important information. How can data be extracted from ...

Utilizing Selenium to extract engagement data, such as likes and comments, from a photo on Facebook

Excited to obtain the specific content as outlined in the title. I have successfully figured out how to log in and retrieve photos from any profile I search for. However, I am facing an issue when trying to access comments or likes on selected photos. Desp ...

Is it possible to webscrape a jTable that contains hidden columns?

I'm currently working on setting up a Python web scraper for this webpage: specifically targeting the 'team-players jTable' I've successfully scraped the visible table using BeautifulSoup and selenium, but I'm facing difficulties scraping the hidden colu ...

In Python 2.7, we can utilize scraping techniques along with the 're' library to extract float numbers from a given string. By using the '

I just scraped this content import re import urllib from BeautifulSoup import BeautifulSoup After scraping, I have results like this (printed numbers_in_mill.text): 9.27[7] 9.25[8] 10.17[9] 10.72[10] How can I modify these results to get: 9. ...

The JSON data was retrieved in a different order compared to the text data during scraping

Attempting to retrieve data from www.crunchbase.com through their API, I created a basic Python script for fetching responses. However, when writing the json_data to a file, I noticed that the order of the keys does not match the response order obtained di ...

Python Web Scraping: Issues with Duplication and Displaying Outputs

I have encountered a problem in my code that is causing issues with the output of loops and inserting data into my database. Despite attempting to troubleshoot, I am unable to pinpoint the exact source of the problem. What I am striving for is to have each ...

Downloading a file utilizing Selenium through the window.open method

I am having trouble extracting data from a webpage that triggers a new window to open when a link is clicked, resulting in an immediate download of a csv file. The URL format is a challenge as it involves complex javascript functions called via the onClick ...

Slowly scrolling down using Selenium

Struggling with performing dynamic web scraping on a javascript-rendered webpage using Python. 1) Encountering an issue where elements load only when scrolling down the page slowly. Tried methods such as: driver.execute_script("window.scrollTo(0, Y)") ...

Exploring HTML Data with Python String Manipulation

My goal is to use Python to dynamically extract data from an HTML page that is constantly changing. I have identified that the specific data I am interested in is located between a tag that resembles 'abcd>' and another tag. For example: abcd>MyData. ...

Automated Web Testing: Mastering the Selection of Images on a Page That Keeps Changing Dynamically

My current project involves creating a bot that automates scrolling through Instagram's explore page to like the first 100 pictures for a specific hashtag. I am utilizing Selenium, Python, and Chrome for this task. The issue I'm facing is that as I scrol ...

Scraping data from dropdown menus with Selenium in web development

When attempting Selenium web scraping, I encountered the following issues: Dropdown list did not change as expected Unable to scrape desired results Unsuccessful in resolving the problems Here is a snippet of the code used: ''' from selenium import webdr ...

Having Issues Getting Results from Curl Post Scraping

I am trying to scrape data from the website marcanet.impi.gob.mx/marcanet/controler/RegistroBusca using the code below, but I am unable to reach the result page. $form_url = "http://marcanet.impi.gob.mx/marcanet/controler/RegistroLista"; $data_to_post = a ...

Struggling to extract dynamic table information from the Human Microbiom Project (HMP) with Python's Beautiful Soup and Selenium

Currently, I am trying to extract the dynamic table data from the 'File UUID' column on the HMP website using Python (Beautiful Soup and Selenium). However, despite successfully extracting other columns, I am facing difficulty in retrieving the specific co ...

Gather information that is dynamic upon the selection of a "li" option using Python Selenium

I need to extract data from this website (disregard the Hebrew text). To begin, I must choose one of the options from the initial dropdown menu below: https://i.stack.imgur.com/qvIyN.png Next, click the designated button: https://i.stack.imgur.com/THb ...

Discovering the location of a button on a website: Tips and tricks

Currently, I am extracting data from the following webpage: My task is to click a button in order to access the reviews. When using Google Chrome, this is the XPATH path that I obtain: //*[@id="myspanpos"] However, upon attempting to execute the script ...

Gathering the real-time activity of Xbox Live users

I am currently exploring methods to determine the presence of an Xbox Live member (e.g. Not Online, Online playing _). The only known method is to log into Xbox.com, navigate to the player's page, and extract certain text from a specific div. The pro ...

Extracting data from websites using Python's Selenium module, focusing on dynamic links generated through Javascript

Currently, I am in the process of developing a webcrawler using Selenium and Python. However, I have encountered an issue that needs to be addressed. The crawler functions by identifying all links with ListlinkerHref = self.browser.find_elements_by_xpath( ...

How to Extract Information from a Table Enclosed in a Div Using HTML Parsing?

I'm new to HTML parsing and scraping, looking for some guidance. I want to specify the URL of a page (http://www.epgpweb.com/guild/us/Caelestrasz/Crimson/) to grab data from. Specifically, I'm interested in extracting the table with class=listing within th ...

Discovering the parent element of a find_element_by_partial_link_text: a step-by-step guide

Using the find_element_by_partial_link_text selector to locate the "next" button for crawling continuity has proven problematic for me. The presence of the word "next" in various other links on the page often disrupts the script execution. Despite attemp ...

Extracting data from a continually updating DataTable spanning multiple pages with a consistent URL

Currently, I have experience working with C and am in the process of learning Python as a hobby. My latest project involves scraping data from a dynamically generated table on https://www.justetf.com/it/find-etf.html?groupField=index&from=search&/i ...