I am trying to extract only the text inside the <a> tags from the first <td> element of each <tr>. I have provided examples of the necessary text as "yyy" and examples of unnecessary text as "zzz". <table> <tbody> <tr ...
I'm encountering an issue with this error message: urllib.error.HTTPError: HTTP Error 400: Bad Request It seems to be related to the links I am using, as I always get the same error when I input them and replace the placeholders '{}'. However, I'm unsure ...
Hello everyone! I'm a beginner in Python and currently using Python 3.6.4 (64-bit). I recently installed pandas and matplotlib successfully, but I'm facing difficulties importing bs4. Can someone please provide guidance on how to resolve this is ...
<span> I Enjoy <span class='not needed'> slapping </span> your back </span> How can "I Enjoy your back" be displayed instead of "I Enjoy slapping your back" This is what I attempted: result = soup.find_all('span') for ite ...
Looking to extract the name and contact number from a div that can contain one, two, or three spans. The requirements are: Extract name and contact number only when available. If contact number is present but name is missing, assign 'N/A' to the name var ...
I am struggling to extract the data from the first table on a website. Despite attempting various solutions found here, I have been unsuccessful in locating the table and consequently retrieving the data within it. The methods I have tried are as follows: ...
I am currently working on extracting specific URLs from a webpage that contains a list of various links. My goal is to only retrieve the URLs that match certain strings in a predetermined list. These strings are a subset of the text found within the links ...
I am facing an issue with this specific piece of code: import re from lxml import html from bs4 import BeautifulSoup as BS from selenium import webdriver from selenium.webdriver.firefox.firefox_binary import FirefoxBinary import requests import sys import ...
Looking to extract the URLs of each page from the pagination on a specific site located at . The challenge lies in the fact that not all page URLs are displayed simultaneously. When attempting to scrape using the provided link, only page-1, page-14, page- ...
Hey, I've encountered something really strange - I'm receiving an error that says "Exception has occurred: JavascriptException / Message: javascript error: this.each is not a function" on a specific line in my code: waiting.until(EC.visibility_of_element_l ...
Hello everyone! I am reaching out for help once more. While I am comfortable scraping simple websites with tags, I recently came across a more complex website that includes JavaScript. Specifically, I am looking to extract all the estimates located at the ...
Currently, I am diving into BS4 to enhance my skills and expertise. My goal is to scrape various tables, lists, and other elements from well-known websites in order to grasp the syntax. However, I am encountering difficulties when it comes to formatting a ...
I am currently utilizing BeautifulSoup with Python to scrape web data. My current goal is to determine the size of a downloadable file directly from a webpage. As an example, consider this particular page which contains a link to download a text file (acc ...
Currently, I am delving into the world of web scraping using selenium along with parsing the page_source utilizing "html.parser" of BS4 soup. I have successfully identified all the Tags that include the h2 tag and a specific class name, however, ...
I attempted to scrape data from this webpage using the python-requests library. import requests from lxml import etree,html url = 'http://www.amazon.in/b/ref=sa_menu_mobile_elec_all?ie=UTF8&node=976419031' r = requests.get(url) tree = etree.HTML(r.te ...
I've hit a roadblock with my current project. I'm attempting to extract player names and projections from this website: The plan involves running a script that loops through various PID values, but that part is not causing any issues. The real challenge a ...
I have managed to scrape the website below using BeautifulSoup, but I am encountering an issue where the list of products displayed changes depending on the user's location. How can I add a location tag or cookie to ensure that I only extract the products ...
This script retrieves data from Oddsortal website: import pandas as pd from bs4 import BeautifulSoup as bs from selenium import webdriver import threading from multiprocessing.pool import ThreadPool import os import re from math import nan class Driver: ...
Hey there! I'm currently working on scraping a website and initially used Bs4 to extract certain elements like sector and name. However, I'm facing difficulty in retrieving the financial data using it. In the page source below, the "-" should be ...
Currently facing a challenge here! I am trying to pick an item from the 'All reviews' drop down but it does not behave like a typical menu where I can select each item individually and then click on it. Instead, the drop down functions as an ele ...
Is it possible to extract the value of "id" from the variable 'meta' using beautifulsoup and python? I am facing difficulty in locating the specific 'script' tag that contains the 'meta' variable as it doesn't have a uniq ...
I managed to extract data successfully from the given website . I created an excel file with the information for one product. However, when trying to scrape data for a second product, I encountered issues adding another sheet to the existing excel file. ...
Looking to extract all href contents from the "news" class (URL provided in the code), I attempted this script but encountered issues... Here is the code snippet: from bs4 import BeautifulSoup from selenium import webdriver Base_url = "http://www.thehin ...
I have recently started coding and decided to scrape the Indeed website for job listings. Using selenium's find_element function, I successfully scraped job titles, company names, and locations. Now, I'm seeking guidance on how to store these ind ...
I've been experiencing an issue where the data from my code is getting overwritten when I try to write it to a CSV file. The output file only shows the last set of data scraped from the website. from bs4 import BeautifulSoup import urllib2 import csv impo ...
Why is it not functioning properly when there is a <br> tag in the text? Instead of displaying the expected content, I am getting an empty text. opener = urllib2.build_opener() opener.addheaders = [('User-agent', 'Mozilla/5.0')] address = 'http://ww ...
Seeking to extract table data from the following website: stock = 'ALCAR' page = requests.get(f"https://www.isyatirim.com.tr/tr-tr/analiz/hisse/Sayfalar/sirket-karti.aspx?hisse={stock}") soup = BeautifulSoup(page.content, 'html.parser') table ...
Currently, I am working on a project for one of my courses that involves web scraping data from Bodybuilding.com. My main objective is to collect information regarding the members of the website. Initially, I was able to successfully scrape data from the 1 ...
I am attempting to retrieve information from the following URL . Displayed below is the code I have developed. import requests from bs4 import BeautifulSoup url_str = 'https://99airdrops.com/page/1/' page = requests.get(url_str, headers={&apo ...
Currently facing an issue with scraping a specific webpage. The link to the page is provided here. Within this webpage, there is a crucial Cross Reference section that I am trying to scrape. However, when attempting to collect the content using Python requ ...
I have an HTML document that contains two different types of 'tr' tags. <tr bgcolor="lightgrey"> <tr> Each 'tr' tag includes 3-4 lines of code with embedded 'tags' within them. However, when I try to access the attributes, I don't obtain 'lig ...
I'm working on a script that interacts with by sending a string and receiving one or all of the "fancy text" variations provided by the site. I am struggling to identify the input area within the HTML structure, especially since I aim to use requests for ...
While analyzing the information of products on the Myntra website, such as Title, Discount, and Price, I utilized the same tags I observed while inspecting the page in the Chrome browser and incorporated them into my code. Please see the following code: i ...
When I hover over a product on the e-commerce webpage (), the color name is displayed. I was able to determine the new line in the HTML code that appears when hovering, but I'm unsure how to extract the text ('NAVY'). <div class="ui top left popu ...
I'm currently utilizing bs4 and urllib2 to extract data from a website. Check out the webpage here. The goal is to retrieve the remaining digits of the telephone number 3610...... but beforehand, it's necessary to click on a button to reveal th ...
Is there a method to extract product listings from various websites without needing to manually loop through each XPATH? Some sites like Amazon and Alibaba display up to 10 products per page, while others may have 20. I am looking for a way to retrieve a ...
I have encountered a problem in my code that is causing issues with the output of loops and inserting data into my database. Despite attempting to troubleshoot, I am unable to pinpoint the exact source of the problem. What I am striving for is to have each ...
My current project involves automating the extraction and printing of infobox data from Wikipedia pages. For example, I am currently working on scraping the Star Trek Wikipedia page (https://en.wikipedia.org/wiki/Star_Trek) to extract the infobox section d ...
Seeking insights on website similarity, I aim to extract data from the following link: Focusing on class='site', my goal is to retrieve information like: <a href="/siteinfo/ebay.com" class="truncation">ebay.com</a> ...
Currently attempting to extract census data from a website that changes dynamically based on the county selected from a drop-down menu. The HTML structure looks like this: <select id="cat_id_select_GEO" onchange="changeHeaderSelection('GEO'); ...
My goal is to extract the membership years data from the IMDB Users page. Link On this page, there are multiple badges and one badge that is common for all users is the last one. This is the code I am using: def getYear(review_url): respons ...
I have discovered a way to locate Google's "quick answer box" by searching for the "_XWk HTML Element using the code below: from bs4 import BeautifulSoup # Beautiful Soup is part of the bs4 package import requests URL = 'https://www.google.com/ ...
Seeking assistance with scraping a sequence of pages similar to the following: . The structure of the URLs is straightforward -- increment the number after "precinctreport" to navigate to subsequent pages. Specifically, I am interested in extracting only ...
I have been attempting to extract product data from a website that utilizes JavaScript to dynamically render HTML content. Despite using Selenium, implementing scrolling functionality to reach the end of the page, and allowing time for the page to reload, ...
I am currently working on a project to create a program that can scan through a website and censor inappropriate language. While I have been able to manually edit the text using Chrome Dev tools, I am unsure of how to automate this process with code. I ha ...
Below is the code I am currently working with: url="https://www.betexplorer.com/soccer/poland/ekstraklasa/lks-lodz-lechia-gdansk/fgQY4hAD/" browser = webdriver.Chrome() browser.get(url) time.sleep(1.5) trs = browser.find_elements_by_xpath(".//div[@id='o ...
Hello everyone, I am trying to retrieve accurate daily temperatures from www.wunderground.com and encountering the 'NoneType' error occasionally. For instance, the code below attempts to fetch the temperature data from March 1 to March 5, 2020. S ...
I am currently learning how to use beautiful soup by watching videos and trying examples. However, I am facing a challenge where the examples have well-structured HTML layouts and do not search for specific words anywhere. What I want to achieve is to prin ...
My current project involves scraping NFL passing data from the years 1971 to 2019. I was successful in extracting data from the first page of each year using the following code: # Here is the code that works: passingData = [] # initializing an empty list ...
I am facing a challenge in retrieving or selecting data from two different tables that share the same class. My attempts to access this information using 'soup.find_all' have proven to be difficult due to the formatting of the data. There are multiple ta ...
I am having trouble extracting data from the table located at . The code snippet I am using seems to be unable to access it for some reason. Can anyone suggest why the table scraping is not working? from bs4 import BeautifulSoup from selenium import webd ...
Currently, I am facing an issue when trying to extract the "href" link from the following HTML code: https://i.stack.imgur.com/Gtzf4.png This is the code that I'm using: from selenium import webdriver from splinter import Browser from bs4 import Beautif ...
I need assistance in scraping a web table using Selenium and BeautifulSoup. In the table, there are 10 instances of 'resultMainRow' and 4 instances of 'resultMainCell'. Each 4th resultMainCell contains 8 spans, each with an img src attr ...
For my current project working with Python 3.6.3, bs4, and Selenium 3.8 on Win10, I am faced with the task of scraping pages that contain dynamic content. Specifically, I need to extract numbers and text from websites like . It seems using requests+beautif ...
While working with html files, I often come across partial files that have unbalanced html tags. For instance, there might be a missing <title> tag in the first line of this partial html file. Despite this issue, I wonder if Beautiful Soup can still ...
Currently, I am utilizing the selenium webdriver to interact with a javascript onclick element. After successfully clicking on this element, my goal is to extract and parse the content of the clicked element using BeautifulSoup. Below is the code snippet s ...
Trying to extract text from a specific webpage, I encountered an issue. Despite using the following code snippet: url = "http://www.koeri.boun.edu.tr/sismo/2/latest-earthquakes/list-of-latest-events/" response = requests.get(url) html = response ...
import requests from pprint import pprint headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36', } params = ( ('LeagueID', '00'), ('Season', '2017-18'), ...
Looking to extract text from a medical document webpage for a Natural Language Processing project using BeautifulSoup, but encountering challenges. The specific website in question is located here: The goal is to capture the entire text body by utilizing ...
I am having trouble importing the beautifulSoup module and encountering an error. Can anyone explain why this is happening or provide guidance on how to fix it? Microsoft Windows [Version 6.1.7600] Copyright (c) 2009 Microsoft Corporation. All rights res ...
I am looking to extract the store locations from this specific URL Currently, my approach is as follows: def getAllStoreLocation(): session = requests.Session() url = "https://www.walmart.com/store/finder?location=Pennsylvani&distance=100" ...
I would like to retrieve a table from an HTML file. Below is the code-snippet I have written in order to extract the first table: import urllib2 import os import time import traceback from bs4 import BeautifulSoup #find('table',{'class&ap ...
Initially, I had the intention of visiting each hotel for: https://i.stack.imgur.com/HrrtX.png Unfortunately, there seems to be a JavaScript process required to open this subpage, and my script is unable to recognize its presence. Even with the correct U ...
Currently, I am working on a Python script to generate a daily COVID-19 dashboard for my country and state. Unfortunately, I have hit a roadblock in downloading one of the required files. To download this file, I need to visit the website () and click on ...
I encountered an issue while trying to scrape data which resulted in the error message UnboundLocalError: local variable 'd3' referenced before assignment . Can anyone provide a solution to resolve this error? I have searched extensively for a so ...
My goal is to retrieve the price of a product based on its size, as prices tend to change daily. While I succeeded in extracting data from a website that uses "a class," I am facing difficulties with websites that use div and span classes. Link: Price: $ ...
Recently, I came across this URL: All I want to do is extract the zestimate and include it in a list. The specific class where it's located is: class="Text-c11n-8-65-2__sc-aiai24-0 eUxMDw". I attempted to target it at a higher level in the HTM ...
During a recent project, I encountered the challenge of web scraping a specific website for a test case using this link. When entering a certain value in the search box, the table displayed multiple values. My goal was to automatically click on the link in ...
Using Python libraries like requests and BeautifulSoup, I am attempting to scrape the tables from the following Wikipedia page: https://en.wikipedia.org/wiki/Mobile_country_code. While I am able to retrieve all the data within the tables, my goal now is to ...
I have successfully used Beautiful Soup to extract CDATA from an HTML page, but now I need to parse the contents and save them in a CSV file. Here is the code I am using: from bs4 import BeautifulSoup from urllib.request import urlopen import re import c ...
Currently, I am attempting to scrape the tables available on a certain webpage after specifying a particular date range (such as January 2015 to February 2022). You can find the page I'm referring to here: During my initial Selenium attempts, I encountere ...
Looking to extract specific href links from a web application that are labeled as "Show on diagram." Below is an image of the HTML I want to retrieve: web application code This is my Python script: import webbrowser from bs4 import BeautifulSoup import ...
I have encountered an issue while trying to scrape title URLs using my code. Can someone please help me troubleshoot it? Here is the code snippet: import requests from bs4 import BeautifulSoup # import pandas as pd # import pandas as pd import csv def ...
One thing that I am struggling with is determining the number of "levels" of child elements an element contains. Take, for instance: <div id="first"> <div id="second"> <div id="third"> <div id="fourth"> <div id="fifth" ...
I am currently working on extracting book reviews from a specific URL using web scraping techniques. My goal is to gather the book name and each review in separate rows. Here's the code snippet I have written so far, which utilizes both selenium and bs4 fo ...