Swipe down to view all the links

Question

Swipe down to view all the links

Out of 3821 links, only 103 were provided to me. I tried applying the condition `window.scroll` to retrieve all the links, but unfortunately, it did not work as expected.

    from selenium.webdriver.common.by import By
    from selenium.webdriver.common.keys import Keys
    from selenium.webdriver.common.action_chains import ActionChains
    from selenium.webdriver.common.keys import Keys
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    import time
    import ssl
    import undetected_chromedriver as uc
    import requests
    from bs4 import BeautifulSoup
    import re
    
    ssl._create_default_https_context = ssl._create_unverified_context
    
    # ... Rest of your code ...
    
    options = uc.ChromeOptions()
    driver = uc.Chrome(options=options)
    driver.get("http://www.servicealberta.gov.ab.ca/find-if-business-is-licenced.cfm")
    
    # Click the button to load initial content
    click_on_button = driver.find_element(By.CSS_SELECTOR, "td:nth-child(1) input:nth-child(1)")
    click_on_button.click()
    
    
    time.sleep(2)
    
    base_url = "http://www.servicealberta.gov.ab.ca/"
    
    
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        
    data = BeautifulSoup(driver.page_source, "html.parser")
    links = data.select("tbody td[colspan='4'] a")
    for link in links:
        url = base_url + link['href']
        print(url)
    print(len(url))

python selenium-webdriver web-scraping beautifulsoup

Answer 1

Answer №1

If you're looking to extract all links from a webpage and store them in a pandas DataFrame, you can follow this code example:

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = "http://www.examplewebsite.com"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

all_links = []
for link in soup.find_all('a'):
    link_dict = {"link_text": link.text, "href": link.get('href')}
    all_links.append(link_dict)

df = pd.DataFrame(all_links)
print(df.head())

This code will print the first few extracted links from the webpage.

Answer 2

If you're looking to extract all links from a webpage and store them in a pandas DataFrame, you can follow this code example:

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = "http://www.examplewebsite.com"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

all_links = []
for link in soup.find_all('a'):
    link_dict = {"link_text": link.text, "href": link.get('href')}
    all_links.append(link_dict)

df = pd.DataFrame(all_links)
print(df.head())

This code will print the first few extracted links from the webpage.

Swipe down to view all the links

Answer №1

Similar questions

difficulty connecting scrapy data to a database

Discovering the clickable widget index in pyqt4: A beginner's guide

The program is not executing properly after the initial function is invoked, causing it to deviate from its intended

Python fastAPI and MongoDB environment allows the return of a tuple in the BaseModel list

Safari is unable to establish a connection with the local URL: http://127.0.0.1:5000

Differentiating python timezone names

Locating the jwt_key_id for the Box Python SDK: A helpful guide

Determine if there is any overlap between the elements in two lists in Python

Having trouble with the installation of pip and pandas

Ways to interact with a button that toggles between states

Retrieve the p-value associated with the intercept using scipy's linregress function

How can I use Python Selenium to switch a web page to dark mode?

There is a discrepancy between the source code obtained using Selenium's driver.getPageSource() method and the source code

Issue with fingerprinting while running selenium tests concurrently

Selenium Error: Unresolved attribute reference for class "list" with ".text"

Generating individual rows for every item belonging to a user within a Spark dataframe

Having difficulty incorporating custom JavaScript files into a testing framework

When it comes to looping, where is the best place to instantiate my WebDriver instance?

Selenium isn't navigating to the following web page

The forecast button seems to be malfunctioning. Each time I attempt to click on it, a message pops up saying, "The server failed to comprehend the request sent by the browser (or proxy)."