Guide to randomizing a collection of URLs and implementing it with the Webdriver

Looking to extract Href links from a website and shuffle them, then iterate through each line in the list to scrape each webpage using Selenium in Python. I have come across information on how to do this with a notepad file, but I am looking for guidance on working with lists specifically.

Here is my code:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import csv
import requests
import time
from selenium import webdriver

driver = webdriver.Chrome(executable_path=r'C:\Brother\chromedriver.exe')
driver.set_window_size(1024, 600)
driver.maximize_window()

driver.get('https://www.bookmaker.com.au/sports/soccer/')

SCROLL_PAUSE_TIME = 0.5

last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(SCROLL_PAUSE_TIME)

    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

time.sleep(5)

# Extract href list [Working]
elem = driver.find_elements_by_css_selector(".market-group a")
elem_href = []
for elem in elem:
    print(elem.get_attribute("href"))
    elem_href.append(elem.get_attribute("href"))

# Shuffle HREF (not working)
from random import shuffle
list1 = [link for link in elem_href]    
shuffle(list1)
print(list1)

# Iterate through list... (Need help here)

# Driver.get…. (Read from..)
Driver.get(LINE FROM NOTEPAD HERE)

sections = driver.find_elements_by_css_selector(".fullbox")

with open('I AM HERE12345.csv', 'w') as file:
    writer = csv.writer(file)
    for section in sections:
        link = section.find_element_by_css_selector("h3 a").get_attribute("href")
        team_name = section.find_element_by_css_selector("tr.row[data-teamname]").get_attribute("data-teamname")
        bet = section.find_element_by_css_selector("a.odds.quickbet").text

        writer.writerow((bet, team_name, link))

# Looping.. (Not yet functioning)
driver.back()

Answer №1

You may encounter several issues in your current code:

  1. for elem in elem:
          ^       ^
          |       |
         identical variable names
    

    Prior to entering the loop, elem represents a list of elements. However, upon exiting the loop, elem only holds the final element in the list.

  2. list1 = (elem.get_attribute("href"))
    

    In this line, list1 contains just one element (the current value of

    elem</code)), leading to an unsupported use of <code>shuffle()
    .

  3. Driver.get(LINE FROM NOTEPAD HERE)
    

    The reference to Driver is undefined. The correct syntax should be

    driver.get(LINE FROM NOTEPAD HERE)
    .

To address these issues, consider implementing the following revised code snippet:

elements = driver.find_elements_by_css_selector(".market-group a")
elem_href = [element.get_attribute("href") for element in elements]
shuffle(elem_href)
for link in elem_href:
    driver.get(link)
    ...

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Leveraging JavaScript within the Selenium JavaScript Executor

I am trying to check if the required text is visible on the page, but I am unable to use the gettext() method from Selenium WebDriver due to a permission exception. As a workaround, I have created a JavaScript script to compare the text. String scriptToE ...

"I'm encountering an issue with the driver .exe file not being found in the specified path. Can you help me troub

Currently on a journey to learn Selenium and I've encountered an issue while running my first practice script. The error message keeps popping up that the driver executable does not exist on the path, despite trying various locations and giving the co ...

What is the method for finding the scaling factor needed for the covariance matrix to have a leading element of 1?

I am in need of assistance with centering, scaling, and rotating data so that it is centered around the origin and the direction of maximum variance aligns with the x-axis. I have successfully calculated the mean and covariance of the data but struggling t ...

Press the item by utilizing Selenium and Python

My attempt to click on the "Training material statistics" element using Python code was unsuccessful: WebDriverWait(driver,20)\ .until(EC.element_to_be_clickable((By.XPATH,'//*[@id="report-navigation"]/div[2]')))\ .cli ...

Is there a way to execute Selenium RC JUnit tests with Maven outside of the regular lifecycle?

I am working on a Maven project that consists of Selenium 2 tests integrated with JUnit. These tests are located in the src/main/java source directory and are meant to test a web application external to the project, rather than testing the project itself. ...

Python input is restricted when executed from a batch file within a conditional statement and following a timeout period

Here is a simple working example: test.bat: @echo off if 0==0 ( timeout 3 python test.py ) test.py: input('press ENTER to exit') In cmd.exe: call test.bat > Waiting for 0 seconds, press a key to continue ... > press ENTER to exit_ ...

Finding the right XPath for a specific code in WebDriverSampler

In my current project, I am dealing with a page that contains several fields that need to be edited using WebSamplerDriver. The challenge I face is that the id values of these fields are not consistent and tend to change periodically. This makes it difficu ...

Is there a method to successfully execute this Python Selenium code in headless mode?

Earlier, I had posted a question (Unable to get selenium (python) to download a csv file which doesnt have a link but only appears after i click the download button) regarding an issue I was facing. After some troubleshooting, I discovered that my code was ...

Verify that the option is present in the dropdown menu

My dropdown menu contains a list of subjects. To retrieve the values from the dropdown, I used the following code snippet: IList<IWebElement> allOptions = check.Options; Next, I created an array of strings to hold the subject names that I need to ...

Troubleshooting Selenium JS: Challenges with Locating Elements Across Pages

I'm facing a challenge in accessing elements on pages other than the main page in my electron app. Although there are multiple elements that load when the app starts, I can only interact with elements on the initial page. I believe the issue lies in h ...

I am having trouble with base64 decoding and encoding in Python

This data is encoded in Base64 format: cg4AAAAAAAB7ACIAbQBTAGgAYQBkAG8AdwBGAG8AbgB0AE0AYQBwAEgAYQBzAGgAIgA6ACIAOQBjADkAOQBkAGYAMwAwAC0AOAA3AGUyAC0AMgBiNTMALQBkAGY5ADUALQBiN2VhZjAwADAAMAAwADAAMgAyAGQAIgAsACIAbQBUAGUAeAB0AFAAYQByAGEAbQAiADoAewAiAG0AQQBsAGkAZ ...

The attempted decoding of a JSON object was unsuccessful. The provided JSON is not valid

I've encountered an unusual issue In my program, I am sending a JSON string through a socket: json_string = JSONEncoder().encode({ "id_movil": str(id_movil), "correo": "<a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfema ...

Changing the default directory where profiles are saved in Selenium Firefox

Currently, I'm working with selenium using geckodriver. The main objective is to utilize a pre-existing session's profile stored in a specific path rather than the default directory tmp. Upon initiating a session with a new profile: from seleniu ...

Get the input provided by the user following the specified command in a Python Telegram bot

For instance, there is a command called /chart which I need to extract the user input value from For example: If a user enters /chart 123456, how do I retrieve the value 123456? Here is the code snippet used to define the command: def start(update: Updat ...

Upon running the Python Selenium script, Google Chrome will shut down automatically

I was testing this code and noticed that it runs without any errors, but for some reason it automatically closes Google Chrome after searching for w3schools. from selenium import webdriver from selenium.webdriver.common.keys import Keys driver = webdri ...

What is the process for executing tasks from a different server with Celery?

Two of my Python applications are utilizing Celery and connected to the same broker. Instance A contains all of my @tasks, but I need to run these tasks from Instance B. Unfortunately, I cannot perform standard imports as the tasks do not exist on Instanc ...

Troubleshooting and diagnosing issues with a subprocess.Popen function call

In my previous experiences, I've had success using subprocess.Popen to wrap binaries with a Python script and customize arguments. However, when developing another wrapper recently, I encountered an issue. Here's the snippet of code I'm wor ...

How can I extract the specific text within a <div> tag using Selenium and Python in an Angular website?

Seeking to extract the exact text enclosed within a tag using selenium with Python. Upon inspecting the element, the following HTML code is visible on the browser: <div class="value ng-binding" ng-bind="currentEarning">£8.8</div> == $0 A Py ...

What is the best way to transfer a string from one TestNG method to another?

During my tests, I encountered a challenge where I need to capture a string in one method and later utilize it in another method. The scenario is illustrated below: public class stackOverflowExample { public static WebDriver driver; public static Propert ...

Python Script: Updating entries in CSV files

I have a Python script that reads CSV files and looks for a column called "PROD_NAME". If it finds a value in that column, I want it to replace the value with another one. The issue is that even though the script runs without errors and prints messages ind ...