Questions tagged [beautifulsoup]

Beautiful Soup, an incredible Python package built to dissect HTML/XML files, emerges as a true gem. One must embrace the beauty of version 4, lovingly referred to as bs4 in the coding community.

Extracting the text content of a specific tag while ignoring the text within other tags nested inside the initial one

I am trying to extract only the text inside the <a> tags from the first <td> element of each <tr>. I have provided examples of the necessary text as "yyy" and examples of unnecessary text as "zzz". <table> <tbody> <tr ...

Encountering a glitch while attempting to find images through Google Search, receiving an error 400

I'm encountering an issue with this error message: urllib.error.HTTPError: HTTP Error 400: Bad Request It seems to be related to the links I am using, as I always get the same error when I input them and replace the placeholders '{}'. However, I'm unsure ...

Error: Unable to locate the module titled 'bs4'. The module cannot be utilized at this time

Hello everyone! I'm a beginner in Python and currently using Python 3.6.4 (64-bit). I recently installed pandas and matplotlib successfully, but I'm facing difficulties importing bs4. Can someone please provide guidance on how to resolve this is ...

Do not include undesired tags when using Beautifulsoup in Python

<span> I Enjoy <span class='not needed'> slapping </span> your back </span> How can "I Enjoy your back" be displayed instead of "I Enjoy slapping your back" This is what I attempted: result = soup.find_all('span') for ite ...

What is the best way to obtain just the name and phone number information?

Looking to extract the name and contact number from a div that can contain one, two, or three spans. The requirements are: Extract name and contact number only when available. If contact number is present but name is missing, assign 'N/A' to the name var ...

BeautifulSoup fails to detect tables within webpage

I am struggling to extract the data from the first table on a website. Despite attempting various solutions found here, I have been unsuccessful in locating the table and consequently retrieving the data within it. The methods I have tried are as follows: ...

Extract HTML href links that correspond to a specific string from a given list of strings using Beautiful Soup

I am currently working on extracting specific URLs from a webpage that contains a list of various links. My goal is to only retrieve the URLs that match certain strings in a predetermined list. These strings are a subset of the text found within the links ...

Selenium encountered an error in retrieving the section id from the web page

I am facing an issue with this specific piece of code: import re from lxml import html from bs4 import BeautifulSoup as BS from selenium import webdriver from selenium.webdriver.firefox.firefox_binary import FirefoxBinary import requests import sys import ...

I am looking for assistance with a Python web-scraping issue. I need help with scraping URLs from a webpage that has partially hidden pagination numbers. Can you lend a hand?

Looking to extract the URLs of each page from the pagination on a specific site located at . The challenge lies in the fact that not all page URLs are displayed simultaneously. When attempting to scrape using the provided link, only page-1, page-14, page- ...

Experiencing an intermittent issue with the "JavascriptException" error while using Selenium

Hey, I've encountered something really strange - I'm receiving an error that says "Exception has occurred: JavascriptException / Message: javascript error: this.each is not a function" on a specific line in my code: waiting.until(EC.visibility_of_element_l ...

Using BeautifulSoup to extract data from a webpage containing JavaScript

Hello everyone! I am reaching out for help once more. While I am comfortable scraping simple websites with tags, I recently came across a more complex website that includes JavaScript. Specifically, I am looking to extract all the estimates located at the ...

extracting information with beautifulsoup

Currently, I am diving into BS4 to enhance my skills and expertise. My goal is to scrape various tables, lists, and other elements from well-known websites in order to grasp the syntax. However, I am encountering difficulties when it comes to formatting a ...

Obtaining the file size of a webpage using BeautifulSoup

I am currently utilizing BeautifulSoup with Python to scrape web data. My current goal is to determine the size of a downloadable file directly from a webpage. As an example, consider this particular page which contains a link to download a text file (acc ...

Content surrounded by two h2 elements with BeautifulSoup

Currently, I am delving into the world of web scraping using selenium along with parsing the page_source utilizing "html.parser" of BS4 soup. I have successfully identified all the Tags that include the h2 tag and a specific class name, however, ...

Python Requests encountered an error with too many redirects, surpassing the limit of 30 redirects

I attempted to scrape data from this webpage using the python-requests library. import requests from lxml import etree,html url = 'http://www.amazon.in/b/ref=sa_menu_mobile_elec_all?ie=UTF8&node=976419031' r = requests.get(url) tree = etree.HTML(r.te ...

Uncovering Secret Information with Beautiful Soup 4

I've hit a roadblock with my current project. I'm attempting to extract player names and projections from this website: The plan involves running a script that loops through various PID values, but that part is not causing any issues. The real challenge a ...

Parsing product pages tailored to individual countries using BeautifulSoup

I have managed to scrape the website below using BeautifulSoup, but I am encountering an issue where the list of products displayed changes depending on the user's location. How can I add a location tag or cookie to ensure that I only extract the products ...

List index out of range error occurs in the if-else block when the condition is met

This script retrieves data from Oddsortal website: import pandas as pd from bs4 import BeautifulSoup as bs from selenium import webdriver import threading from multiprocessing.pool import ThreadPool import os import re from math import nan class Driver: ...

Automated data collection with selenium

Hey there! I'm currently working on scraping a website and initially used Bs4 to extract certain elements like sector and name. However, I'm facing difficulty in retrieving the financial data using it. In the page source below, the "-" should be ...

Using Selenium in Python to choose an option from a dropdown menu within a <div> element

Currently facing a challenge here! I am trying to pick an item from the 'All reviews' drop down but it does not behave like a typical menu where I can select each item individually and then click on it. Instead, the drop down functions as an ele ...

Learn how to extract JavaScript variables within a script tag using Python and Beautifulsoup

Is it possible to extract the value of "id" from the variable 'meta' using beautifulsoup and python? I am facing difficulty in locating the specific 'script' tag that contains the 'meta' variable as it doesn't have a uniq ...

Expanding Excel Spreadsheets Following Data Extraction from the Internet

I managed to extract data successfully from the given website . I created an excel file with the information for one product. However, when trying to scrape data for a second product, I encountered issues adding another sheet to the existing excel file. ...

Python web scraping with selenium

Looking to extract all href contents from the "news" class (URL provided in the code), I attempted this script but encountered issues... Here is the code snippet: from bs4 import BeautifulSoup from selenium import webdriver Base_url = "http://www.thehin ...

Seeking assistance with adding information to a list that currently has no data stored

I have recently started coding and decided to scrape the Indeed website for job listings. Using selenium's find_element function, I successfully scraped job titles, company names, and locations. Now, I'm seeking guidance on how to store these ind ...

Having difficulty exporting data to a CSV file using Beautiful Soup 4 in Python

I've been experiencing an issue where the data from my code is getting overwritten when I try to write it to a CSV file. The output file only shows the last set of data scraped from the website. from bs4 import BeautifulSoup import urllib2 import csv impo ...

There is no specific need for using the <br> tag when working with

Why is it not functioning properly when there is a <br> tag in the text? Instead of displaying the expected content, I am getting an empty text. opener = urllib2.build_opener() opener.addheaders = [('User-agent', 'Mozilla/5.0')] address = 'http://ww ...

Using BeautifulSoup for extracting data from tables

Seeking to extract table data from the following website: stock = 'ALCAR' page = requests.get(f"https://www.isyatirim.com.tr/tr-tr/analiz/hisse/Sayfalar/sirket-karti.aspx?hisse={stock}") soup = BeautifulSoup(page.content, 'html.parser') table ...

Uncovering content across multiple pages using BeautifulSoup for web scraping

Currently, I am working on a project for one of my courses that involves web scraping data from Bodybuilding.com. My main objective is to collect information regarding the members of the website. Initially, I was able to successfully scrape data from the 1 ...

Why is Python BeautifulSoup's findAll function not returning all the elements in the web page

I am attempting to retrieve information from the following URL . Displayed below is the code I have developed. import requests from bs4 import BeautifulSoup url_str = 'https://99airdrops.com/page/1/' page = requests.get(url_str, headers={&apo ...

Python is struggling to scrape dynamically loaded elements from a webpage

Currently facing an issue with scraping a specific webpage. The link to the page is provided here. Within this webpage, there is a crucial Cross Reference section that I am trying to scrape. However, when attempting to collect the content using Python requ ...

What methods are available to modify, append, or remove an attribute from a tag?

I have an HTML document that contains two different types of 'tr' tags. <tr bgcolor="lightgrey"> <tr> Each 'tr' tag includes 3-4 lines of code with embedded 'tags' within them. However, when I try to access the attributes, I don't obtain 'lig ...

What is the best way to use requests in python to enter text in a textarea, then scrape the html with bs4, all while the input remains hidden?

I'm working on a script that interacts with by sending a string and receiving one or all of the "fancy text" variations provided by the site. I am struggling to identify the input area within the HTML structure, especially since I aim to use requests for ...

When scraping, the text within dynamically generated content's html tags does not appear as expected

While analyzing the information of products on the Myntra website, such as Title, Discount, and Price, I utilized the same tags I observed while inspecting the page in the Chrome browser and incorporated them into my code. Please see the following code: i ...

Is there a way to extract the text that is displayed when I hover over a specific element?

When I hover over a product on the e-commerce webpage (), the color name is displayed. I was able to determine the new line in the HTML code that appears when hovering, but I'm unsure how to extract the text ('NAVY'). <div class="ui top left popu ...

Is there a way to extract information from a <span> element with restricted access?

I'm currently utilizing bs4 and urllib2 to extract data from a website. Check out the webpage here. The goal is to retrieve the remaining digits of the telephone number 3610...... but beforehand, it's necessary to click on a button to reveal th ...

Mastering the art of Python to extract all XPATHs from any website

Is there a method to extract product listings from various websites without needing to manually loop through each XPATH? Some sites like Amazon and Alibaba display up to 10 products per page, while others may have 20. I am looking for a way to retrieve a ...

Python Web Scraping: Issues with Duplication and Displaying Outputs

I have encountered a problem in my code that is causing issues with the output of loops and inserting data into my database. Despite attempting to troubleshoot, I am unable to pinpoint the exact source of the problem. What I am striving for is to have each ...

Looking to automate the scraping of Wikipedia info boxes and displaying the data using Python for any Wikipedia page?

My current project involves automating the extraction and printing of infobox data from Wikipedia pages. For example, I am currently working on scraping the Star Trek Wikipedia page (https://en.wikipedia.org/wiki/Star_Trek) to extract the infobox section d ...

Using BeautifulSoup to scrape a URL and generate a list of link addresses

Seeking insights on website similarity, I aim to extract data from the following link: Focusing on class='site', my goal is to retrieve information like: <a href="/siteinfo/ebay.com" class="truncation">ebay.com</a> ...

To extract data from a website using a dynamic dropdown menu that alters the website in real-time when an option

Currently attempting to extract census data from a website that changes dynamically based on the county selected from a drop-down menu. The HTML structure looks like this: <select id="cat_id_select_GEO" onchange="changeHeaderSelection('GEO'); ...

Encountered an issue during the data extraction process utilizing BeautifulSoup

My goal is to extract the membership years data from the IMDB Users page. Link On this page, there are multiple badges and one badge that is common for all users is the last one. This is the code I am using: def getYear(review_url): respons ...

Searching for the class _XWk using BeautifulSoup: A beginner's guide

I have discovered a way to locate Google's "quick answer box" by searching for the "_XWk HTML Element using the code below: from bs4 import BeautifulSoup # Beautiful Soup is part of the bs4 package import requests URL = 'https://www.google.com/ ...

What is the best way to extract data from multiple pages with varying xpaths in Selenium?

Seeking assistance with scraping a sequence of pages similar to the following: . The structure of the URLs is straightforward -- increment the number after "precinctreport" to navigate to subsequent pages. Specifically, I am interested in extracting only ...

Selenium Scrolling: Improving Web Scraping Efficiency with Incomplete Data Extraction

I have been attempting to extract product data from a website that utilizes JavaScript to dynamically render HTML content. Despite using Selenium, implementing scrolling functionality to reach the end of the page, and allowing time for the page to reload, ...

Is there a way to manipulate a website's HTML on my local machine using code?

I am currently working on a project to create a program that can scan through a website and censor inappropriate language. While I have been able to manually edit the text using Chrome Dev tools, I am unsure of how to automate this process with code. I ha ...

Selenium's get_attribute method is not providing a value

Below is the code I am currently working with: url="https://www.betexplorer.com/soccer/poland/ekstraklasa/lks-lodz-lechia-gdansk/fgQY4hAD/" browser = webdriver.Chrome() browser.get(url) time.sleep(1.5) trs = browser.find_elements_by_xpath(".//div[@id='o ...

When using BeautifulSoup, there may be instances where the table element is not always detected, resulting in a 'NoneType' error being returned on occasion

Hello everyone, I am trying to retrieve accurate daily temperatures from www.wunderground.com and encountering the 'NoneType' error occasionally. For instance, the code below attempts to fetch the temperature data from March 1 to March 5, 2020. S ...

Is it possible to scrape using Python Beautiful Soup only when the text matches?

I am currently learning how to use beautiful soup by watching videos and trying examples. However, I am facing a challenge where the examples have well-structured HTML layouts and do not search for specific words anywhere. What I want to achieve is to prin ...

Extracting Information with BeautifulSoup for every individual subpage - lengthy and varied URL structure

My current project involves scraping NFL passing data from the years 1971 to 2019. I was successful in extracting data from the first page of each year using the following code: # Here is the code that works: passingData = [] # initializing an empty list ...

What is the best method to retrieve information from two tables on a webpage that have identical classes?

I am facing a challenge in retrieving or selecting data from two different tables that share the same class. My attempts to access this information using 'soup.find_all' have proven to be difficult due to the formatting of the data. There are multiple ta ...

Selenium encounters difficulties in extracting data from a table

I am having trouble extracting data from the table located at . The code snippet I am using seems to be unable to access it for some reason. Can anyone suggest why the table scraping is not working? from bs4 import BeautifulSoup from selenium import webd ...

What is causing find_by_css to return nothing when using nth-child?

Currently, I am facing an issue when trying to extract the "href" link from the following HTML code: https://i.stack.imgur.com/Gtzf4.png This is the code that I'm using: from selenium import webdriver from splinter import Browser from bs4 import Beautif ...

What's the best way to loop through a complete web table using Beautiful Soup?

I need assistance in scraping a web table using Selenium and BeautifulSoup. In the table, there are 10 instances of 'resultMainRow' and 4 instances of 'resultMainCell'. Each 4th resultMainCell contains 8 spans, each with an img src attr ...

When it comes to web scraping, what is the preferred choice: using selenium alone or combining it with beautifulsoup?

For my current project working with Python 3.6.3, bs4, and Selenium 3.8 on Win10, I am faced with the task of scraping pages that contain dynamic content. Specifically, I need to extract numbers and text from websites like . It seems using requests+beautif ...

Scraping an unbalanced HTML document using Beautiful Soup 4

While working with html files, I often come across partial files that have unbalanced html tags. For instance, there might be a missing <title> tag in the first line of this partial html file. Despite this issue, I wonder if Beautiful Soup can still ...

Steps to extract data from the webpage after triggering the onclick event

Currently, I am utilizing the selenium webdriver to interact with a javascript onclick element. After successfully clicking on this element, my goal is to extract and parse the content of the clicked element using BeautifulSoup. Below is the code snippet s ...

The pre tag cannot be retrieved by BeautifulSoup

Trying to extract text from a specific webpage, I encountered an issue. Despite using the following code snippet: url = "http://www.koeri.boun.edu.tr/sismo/2/latest-earthquakes/list-of-latest-events/" response = requests.get(url) html = response ...

Collect information using Python's request module

import requests from pprint import pprint headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36', } params = ( ('LeagueID', '00'), ('Season', '2017-18'), ...

Is there a way to extract all the content from a webpage's body using BeautifulSoup?

Looking to extract text from a medical document webpage for a Natural Language Processing project using BeautifulSoup, but encountering challenges. The specific website in question is located here: The goal is to capture the entire text body by utilizing ...

Having trouble importing `beautifulSoup` in Python 2.7 with Selenium

I am having trouble importing the beautifulSoup module and encountering an error. Can anyone explain why this is happening or provide guidance on how to fix it? Microsoft Windows [Version 6.1.7600] Copyright (c) 2009 Microsoft Corporation. All rights res ...

Techniques for retrieving website information through the Ajax approach

I am looking to extract the store locations from this specific URL Currently, my approach is as follows: def getAllStoreLocation(): session = requests.Session() url = "https://www.walmart.com/store/finder?location=Pennsylvani&distance=100" ...

Utilize Python to extract a table from an HTML document

I would like to retrieve a table from an HTML file. Below is the code-snippet I have written in order to extract the first table: import urllib2 import os import time import traceback from bs4 import BeautifulSoup #find('table',{'class&ap ...

Encountering issues while web scraping from booking.com

Initially, I had the intention of visiting each hotel for: https://i.stack.imgur.com/HrrtX.png Unfortunately, there seems to be a JavaScript process required to open this subpage, and my script is unable to recognize its presence. Even with the correct U ...

Use Python to access a website button and download a xlsx file with just one click

Currently, I am working on a Python script to generate a daily COVID-19 dashboard for my country and state. Unfortunately, I have hit a roadblock in downloading one of the required files. To download this file, I need to visit the website () and click on ...

When utilizing Beautiful Soup and Scrapy, I encountered an error indicating a reference issue prior to assignment

I encountered an issue while trying to scrape data which resulted in the error message UnboundLocalError: local variable 'd3' referenced before assignment . Can anyone provide a solution to resolve this error? I have searched extensively for a so ...

Utilize Beautiful Soup, Selenium, and Pandas to extract price information by scraping the web for values stored within specified div class

My goal is to retrieve the price of a product based on its size, as prices tend to change daily. While I succeeded in extracting data from a website that uses "a class," I am facing difficulties with websites that use div and span classes. Link: Price: $ ...

Discovering a deeply nested div or class using Beautiful Soup

Recently, I came across this URL: All I want to do is extract the zestimate and include it in a list. The specific class where it's located is: class="Text-c11n-8-65-2__sc-aiai24-0 eUxMDw". I attempted to target it at a higher level in the HTM ...

Select a hyperlink corresponding to the data in a separate column within a table - utilizing Python for web scraping

During a recent project, I encountered the challenge of web scraping a specific website for a test case using this link. When entering a certain value in the search box, the table displayed multiple values. My goal was to automatically click on the link in ...

Is it possible to extract data from several tables on a Wikipedia page, including their headers, using Python's requests and BeautifulSoup libraries?

Using Python libraries like requests and BeautifulSoup, I am attempting to scrape the tables from the following Wikipedia page: https://en.wikipedia.org/wiki/Mobile_country_code. While I am able to retrieve all the data within the tables, my goal now is to ...

Tips for parsing information contained in a cdata tag using python

I have successfully used Beautiful Soup to extract CDATA from an HTML page, but now I need to parse the contents and save them in a CSV file. Here is the code I am using: from bs4 import BeautifulSoup from urllib.request import urlopen import re import c ...

Using Python to extract data from this website

Currently, I am attempting to scrape the tables available on a certain webpage after specifying a particular date range (such as January 2015 to February 2022). You can find the page I'm referring to here: During my initial Selenium attempts, I encountere ...

Tips for displaying the href of a table cell that contains a specific word

Looking to extract specific href links from a web application that are labeled as "Show on diagram." Below is an image of the HTML I want to retrieve: web application code This is my Python script: import webbrowser from bs4 import BeautifulSoup import ...

I encountered a problem extracting title URLs with Python during web scraping

I have encountered an issue while trying to scrape title URLs using my code. Can someone please help me troubleshoot it? Here is the code snippet: import requests from bs4 import BeautifulSoup # import pandas as pd # import pandas as pd import csv def ...

Discover the number of nested child elements within an element using Beautiful Soup

One thing that I am struggling with is determining the number of "levels" of child elements an element contains. Take, for instance: <div id="first"> <div id="second"> <div id="third"> <div id="fourth"> <div id="fifth" ...

Notification: The specific HTML element identified as <div id="tabber_obj_0_div_3" class="ptr"> cannot be smoothly brought into view by scrolling

I am currently working on extracting book reviews from a specific URL using web scraping techniques. My goal is to gather the book name and each review in separate rows. Here's the code snippet I have written so far, which utilizes both selenium and bs4 fo ...