Tips for extracting the URL of a fresh webpage using Selenium and Scrapy

I'm currently working on a web-scraping project to extract data from a platform known as "Startup India," which facilitates connections with various startups. I have set up filters to select specific criteria and then click on each startup to access detailed information. However, I am encountering an issue where the URLs I need for scraping are not displaying in the console.

Below is the code snippet:

import scrapy
from selenium import webdriver
import os
import logging

class ProductSpider(scrapy.Spider):
    name = "product_spider"
    allowed_domains = ['https://www.startupindia.gov.in/']
    start_urls = ['https://www.startupindia.gov.in/content/sih/en/search.html?industries=sih:industry/advertising&states=sih:location/india/andhra-pradesh&stages=Prototype&roles=Startup&page=0']

    def __init__(self):
        cwd = os.getcwd()
        self.driver = webdriver.Chrome("C:/Users/RAJ/PycharmProjects/WebCrawler/WebCrawler/WebCrawler/spiders/chromedriver.exe")

    def parse(self, response):
        self.driver.get(response.url)
        
        next = self.driver.find_elements_by_css_selector('div#persona-results a')
        logging.info(next)
        
        for i in next:
            try:
                logging.info(i.click())
                logging.info(response.url)
                
                # Extract and store the required data using scrapy items
                
            except:
                print("Yolo")

Answer №1

Upon inspection, it appears that the website is loading the startup screen in a new tab, necessitating a switch back to the original tab

self.driver.switch_to.window(driver.window_handles[1])

Alternatively, you can locate the URL using Xpath

"//*[@id='persona-results']//a[@class='img-wrap']"
and directly open it without clicking for a faster experience

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Why won't my test in WebdriverJS and Jasmine redirect the browser to the intended URL?

Currently, I am executing a test suite with the following setup: nodejs selenium-webdriver jasmine-node (utilizing jasmine 1.3) Upon running the spec provided below, the browser window initializes but fails to redirect to the specified URL - instead, it ...

Issue with Python Selenium: Unable to extract text from HTML on subsequent pages

I am currently working on scraping information about drugs, their manufacturers, and authorization codes (AIC) from our country's official drug agency website. The script I have written is almost functional but it only retrieves text visible on the c ...

Steps for clicking on a dynamically moving element in real-time on a canvas

On my web page, I have a canvas element that contains an area where a certain value appears. When this value meets specific criteria, such as equaling 10 (see green line in example image), I want to be able to click on a separate element, like a button wit ...

Encountering a null pointer exception when attempting to declare a WebElementFacade within a page object

While attempting to implement the page object model in Serenity BDD, I encountered a null pointer exception when declaring WebElementFacade in my page object. Below is the code for my page object class: package PageObjects; import net.serenitybdd.core.an ...

Gradually increase the time in a dataframe column by the initial value of the column

I am facing a situation where I need to increment the timestamp of a particular column in my dataframe. Within the dataframe, there is a column that contains a series of area IDs along with a "waterDuration" column. My goal is to progressively add this d ...

Python - Troubleshooting a Timeout Exception Error in Selenium Webdriver

My objective is to extract movie genres from the IMDB website through a different platform. The main page I am working with is: driver.get("https://sfy.ru/scripts") # main website Please note that you may encounter a "certificate is not v ...

The Selenium Internet Explorer driver is having difficulty locating the accurate webpage source

Currently, I am faced with a challenge on my login page that redirects to a different page where I need to extract data from an element using Selenium. While running the code locally in Eclipse with the IE driver, I encountered an issue where the page sour ...

Content surrounded by two h2 elements with BeautifulSoup

Currently, I am delving into the world of web scraping using selenium along with parsing the page_source utilizing "html.parser" of BS4 soup. I have successfully identified all the Tags that include the h2 tag and a specific class name, however, ...

Having trouble populating the current date into a web form using Selenium, by retrieving it from an Excel sheet

I already know how to retrieve the current date, but now I need to set it as the previous day's date. My biggest challenge is figuring out how to input this yesterday's date into a web form using Selenium. The form requires information such as s ...

Discovering the top method to locate an Error Modal Message using Selenium

I am looking to automate a specific process that involves an Error modal popping up. I am trying to use Selenium to capture this modal which appears in multiple instances. So far, I have attempted to locate it by XPath without success. Using CSS selector o ...

Guide for setting the executable path of the Chrome driver to match the operating system. My goal is to create a generic executable path for the Selenium driver using Python

What is the best way to define the executable path of the Chrome driver to match the operating system's path? I am aiming to create a universal executable path that works across all systems for the Selenium with Python driver. ...

Guide to setting up a parameterized Selenium test using NUnit on TeamCity?

I am currently developing Selenium webdriver tests in Visual Studio with C# for regression testing. I have chosen NUnit as my testing framework. My goal is to parameterize the URL so that the same set of tests can be executed against various deployments u ...

Verifying user identity in Django Rest Framework by integrating with Google Authentication

When implementing JWT authentication using username/password, the process goes as follows: from rest_framework_simplejwt.serializers import TokenObtainPairSerializer '''The POST request appears like this: <QueryDict: { 'csrfmid ...

Acquire the browser token using Python

Is there a way to retrieve the Auth token from the browser using Selenium? I'm currently working on creating a browser instance and need to access the tokens in the network tab. Any suggestions on how to achieve this? ...

Guide to sending keystrokes to a web browser using Selenium and VBA

Recently, I created a straightforward piece of code to log into Gmail using Excel VBA. Here's what it looks like: Sub signInToGmail() Dim bot As New WebDriver Dim i As Integer bot.Start "chrome", "" bot.Wait 3000 bot.FindElementB ...

How can I invoke a function from the ListView using Kivy?

Having a bit of trouble figuring out how to pass a value through a procedure when a ListView item is pressed. The simplified example given below reflects my desired outcome and the structure of my main code. I am aiming to have ChangeScreen(self.index) cal ...

The Selenium FireFox webdriver is having trouble logging into Google accounts through Jenkins

For the past few days, I've been struggling with a major issue in my Java automated test that logs into Google Gmail. After integrating the test with Jenkins, it gets stuck when trying to enter the password. The user credentials are input correctly, a ...

Finding the element in the HTML using selenium and Python

Recently, I have been working on automated testing using Selenium. However, I have encountered a strange issue where I am unable to locate the element. Can someone please provide me with guidance on how to handle this situation? driver.find_element_by_xpa ...

How can I determine the specific quantity of XPATH links with unique identifiers in Selenium?

Seeking automation with Python3 and selenium to streamline searches on a public information site. The process involves entering a person's name, selecting the desired spelling (with or without accents), navigating through a list of lawsuits, and acces ...

Click on the 'Login' button on the webpage if it is available

Here is what I need: If the 'Login' button is displayed, click on it and proceed with the use case. If the 'Login' button is not present, go ahead with the use case directly (no need to click on 'Login' button). Based on t ...