Swipe down to view all the links

Out of 3821 links, only 103 were provided to me. I tried applying the condition `window.scroll` to retrieve all the links, but unfortunately, it did not work as expected.

    from selenium.webdriver.common.by import By
    from selenium.webdriver.common.keys import Keys
    from selenium.webdriver.common.action_chains import ActionChains
    from selenium.webdriver.common.keys import Keys
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    import time
    import ssl
    import undetected_chromedriver as uc
    import requests
    from bs4 import BeautifulSoup
    import re
    
    ssl._create_default_https_context = ssl._create_unverified_context
    
    # ... Rest of your code ...
    
    options = uc.ChromeOptions()
    driver = uc.Chrome(options=options)
    driver.get("http://www.servicealberta.gov.ab.ca/find-if-business-is-licenced.cfm")
    
    # Click the button to load initial content
    click_on_button = driver.find_element(By.CSS_SELECTOR, "td:nth-child(1) input:nth-child(1)")
    click_on_button.click()
    
    
    time.sleep(2)
    
    base_url = "http://www.servicealberta.gov.ab.ca/"
    
    
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        
    data = BeautifulSoup(driver.page_source, "html.parser")
    links = data.select("tbody td[colspan='4'] a")
    for link in links:
        url = base_url + link['href']
        print(url)
    print(len(url))

Answer №1

If you're looking to extract all links from a webpage and store them in a pandas DataFrame, you can follow this code example:

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = "http://www.examplewebsite.com"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

all_links = []
for link in soup.find_all('a'):
    link_dict = {"link_text": link.text, "href": link.get('href')}
    all_links.append(link_dict)

df = pd.DataFrame(all_links)
print(df.head())

This code will print the first few extracted links from the webpage.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

difficulty connecting scrapy data to a database

Currently, I am attempting to insert scraped items using Scrapy into a MySQL database. If the database does not already exist, I want to create a new one. I have been following an online tutorial as I am unfamiliar with this process; however, I keep encoun ...

Discovering the clickable widget index in pyqt4: A beginner's guide

In my application, I am working on creating multiple widget orders using a list of dictionaries to create a list of widgets. I have implemented a mouse release event for clickable widgets, but I am facing issues in identifying the clicked widget index. C ...

The program is not executing properly after the initial function is invoked, causing it to deviate from its intended

Could you assist me with this source code for the hangman game? I have added an extra content genre and difficulty level to it. After playing the first round, if the player chooses to play again triggering the loop function, the words function is called o ...

Python fastAPI and MongoDB environment allows the return of a tuple in the BaseModel list

Currently, my setup includes env with python311, pydantic, fastapi, and mongod. return membercollection(members=await c_members.find(queryparam).to_list(1000)) This code snippet retrieves the following information: members=[membermodel(id='65b3908a77 ...

Safari is unable to establish a connection with the local URL: http://127.0.0.1:5000

I am attempting to execute this Flask code for a simple "hello world" application: app = Flask(__name__) @app.route("/") def hello_world(): return "<p>Hello, World!</p>" However, Safari is refusing to connect to the URL ...

Differentiating python timezone names

pytz offers a range of timezones such as America/Chicago, America/Los_Angeles, Asia/Kolkata or the tz abbreviation in various formats. I am interested in obtaining the complete names of timezones. Central Standard Time Pacific Standard Time Indian Stand ...

Locating the jwt_key_id for the Box Python SDK: A helpful guide

Can anyone help me locate the jwt_key_id? I could use some guidance in the right direction. auth = JWTAuth( client_id='found it on the app configuration page', client_secret='found it on the app configuration page', enterpr ...

Determine if there is any overlap between the elements in two lists in Python

Looking to extract specific information from a large log file can be challenging. Filtering out irrelevant lines is essential for efficiency. My approach involves creating a list of strings to search for and then iterating through the retained lines in the ...

Having trouble with the installation of pip and pandas

My attempt to install Pandas using Pip is encountering some unexpected challenges. When I tried running the command on Command Prompt, it returned an error stating that pip was not recognized as a command. To address this, I decided to rectify the situatio ...

Ways to interact with a button that toggles between states

I am working with a button that has both an enabled and disabled state. Enabled State: <button type="button" class="btt-download-csv site-toolbar-menu-button icon-download no-icon-margins ng-isolate-scope" title="Download CSV" rc-download-csv="" curren ...

Retrieve the p-value associated with the intercept using scipy's linregress function

The function scipy.stats.linregress outputs a p-value for the slope, but not for the intercept. Here's an example from the documentation: >>> from scipy import stats >>> import numpy as np >>> x = np.random.random(10) >&g ...

How can I use Python Selenium to switch a web page to dark mode?

Struggling to load this website in dark mode using selenium and python. Despite multiple attempts, the code I'm using doesn't seem to be effective: options = webdriver.ChromeOptions() options.headless = False options.add_argument('--force-da ...

There is a discrepancy between the source code obtained using Selenium's driver.getPageSource() method and the source code

I am attempting to extract the source code from a specified URL and save it into an HTML file using Selenium. However, I am facing an issue where the captured source code does not match exactly what is visible in the browser. Below is the Java code I am u ...

Issue with fingerprinting while running selenium tests concurrently

I have implemented fingerprint identification for anonymous site visitors using https://github.com/Valve/fingerprintjs2. However, I am facing an issue where I need to simulate multiple user sessions simultaneously in order to run tests. nosetests --proce ...

Selenium Error: Unresolved attribute reference for class "list" with ".text"

https://www.selenium.dev/documentation/webdriver/elements/information/#text-content As per the guidance provided by Selenium: # Opening the specified URL driver.get("https://www.example.com") # Extracting text from the element text = driver.find_element( ...

Generating individual rows for every item belonging to a user within a Spark dataframe

I have a dataset in Spark that looks like this: User Item Purchased 1 A 1 1 B 2 2 A 3 2 C 4 3 A 3 3 B 2 3 D 6 only showing top 5 rows Each user has a row for an item they purchased, with the 'Purchased' colum ...

Having difficulty incorporating custom JavaScript files into a testing framework

I am facing a unique challenge while integrating the Page Object design pattern into my test suite using selenium-webdriver and node.js. The first page object, pageObject/admin/login/index.js, works seamlessly. It contains selectors and methods to fill ou ...

When it comes to looping, where is the best place to instantiate my WebDriver instance?

Currently, I am going through a list of links for the purpose of screen scraping. Due to the presence of JavaScript on these pages, I rely on Selenium. To achieve this, I have created a function that retrieves the source code for each page. Should I ...

Selenium isn't navigating to the following web page

Having some trouble with Selenium in my code. When a button is clicked, a new tab opens and we need to select an option. However, the switch function is not working as expected on the new tab. I have included the code below for reference. Please review: p ...

The forecast button seems to be malfunctioning. Each time I attempt to click on it, a message pops up saying, "The server failed to comprehend the request sent by the browser (or proxy)."

Even though I provided all the necessary code, including the Flask app, predictionmodel.py, and HTML code, I am still encountering an error when running it locally after clicking submit. The browser (or proxy) sent a request that this server could not un ...