Issues with Selector Scrapy Selenium functionality

My goal is to extract rank numbers from a website like

Currently, I am utilizing python with selenium and scrapy. However, the following code does not yield any output. What could be the reason for this?

sel=Selector(response)
rank=sel.xpath('//span[@class="number"]/text()').extract()
print(rank)

Answer №1

If you're unsure about using scrapy, an alternative method would be to utilize selenium along with beautifulsoup.

from selenium import webdriver
from bs4 import BeautifulSoup
import time

driver = webdriver.Chrome()
driver.maximize_window()
baseurl = "https://www.shazam.com/charts/top-100/united-states"
driver.get(baseurl)
time.sleep(2)
content = driver.page_source.encode('utf-8').strip()
soup = BeautifulSoup(content,"html.parser")
rank = soup.findAll("span", {"class": "number"})
title=soup.findAll("a",{'class':'ellip'})
d=[x.text for x in rank]
t=[y.text for y in title]
for c,v in zip(d,t):
    print c,v

driver.quit()

The output will display:

01 Black Beatles
02 Rae Sremmurd Feat. Gucci Mane
03 Starboy
04 The Weeknd Feat. Daft Punk
05 Don't Wanna Know
06 Maroon 5 Feat. Kendrick Lamar
07 Bad Things
08 Closer
09 The Chainsmokers Feat. Halsey
10 Love On The Brain
11 Rihanna
12 i hate u, i love u
13 Scars To Your Beautiful
14 Alessia Cara
15 24k Magic
16 Bruno Mars
17 Fake Love
18 Drake
19 Caroline
20 Aminé
21 All Time Low
22 Jon Bellion
23 Let Me Love You
24 DJ Snake Feat. Justin Bieber
25 Gold
26 Kiiara
27 This Town
28 Niall Horan
29 Unsteady
30 X Ambassadors
31 Side To Side
32 Ariana Grande Feat. Nicki Minaj
33 Heathens
34 Twenty One Pilots
35 Starving
36 Hailee Steinfeld & Grey Feat. Zedd
37 In The Name Of Love
38 Martin Garrix Feat. Bebe Rexha
39 The Greatest
40 Sia
41 Do You Mind
42 DJ Khaled
43 Chill Bill
44 Rob $tone Feat. J. Davi$ & Spooks
45 Broccoli
46 D.R.A.M. Feat. Lil Yachty
47 Bounce Back
48 Big Sean
49 Ooouuu
50 Young M.A.
51 Fade
52 Kanye West
53 No Problem
54 Juju On The Beat (TZ Anthem)
55 Love Me Now
56 John Legend
57 What They Want
58 Russ
59 Blue Ain't Your Color
60 Keith Urban
61 Cheap Thrills
62 Sia
63 Pick Up The Phone
64 Young Thug & Travis Scott Feat. Quavo
65 Come And See Me
66 PARTYNEXTDOOR Feat. Drake
67 Mercy
68 Shawn Mendes
69 Bad And Boujee
70 Migos
71 Capsize
72 FRENSHIP & Emily Warren
73 Ain't My Fault
74 Zara Larsson
75 Bailar
76 Deorro Feat. Elvis Crespo
77 Luv
78 Tory Lanez
79 You Was Right
80 Lil Uzi Vert
81 Girlfriend
82 Kap G
83 Fresh Eyes
84 Andy Grammer
85 Way Down We Go
86 Kaleo
87 Key To The Streets
88 YFN Lucci Feat. Migos & Trouble
89 X
90 21 Savage & Metro Boomin Feat. Future
91 All Eyez
92 Better Man
93 Little Big Town
94 Play That Song
95 Train
96 Now And Later
97 Sage The Gemini
98 Used To This
99 Safari
100 Otw

Answer №2

Utilizing JavaScript, the data is loaded without needing Selenium.

If you inspect your browser's dev tools "Network" tab, you will find a request to (or a similar URL with a different country code). The response returned in JSON format contains all the necessary data. Here's an example using scrapy shell:

$ scrapy shell https://www.shazam.com/charts/top-100/united-states -s USER_AGENT='Mozilla'
2016-11-25 16:33:51 [scrapy] INFO: Scrapy 1.2.1 started (bot: scrapybot)
(...)
2016-11-25 16:33:51 [scrapy] DEBUG: Crawled (200) <GET https://www.shazam.com/charts/top-100/united-states> (referer: None)
(...)
>>> fetch('https://www.shazam.com/shazam/v2/en/FR/web/-/tracks/web_chart_us')
2016-11-25 16:33:59 [scrapy] DEBUG: Crawled (200) <GET https://www.shazam.com/shazam/v2/en/FR/web/-/tracks/web_chart_us> (referer: None)

>>> import json
>>> from pprint import pprint

>>> data = json.loads(response.text)
>>> len(data['chart'])
100
>>> pprint(data['chart'])
[{u'alias': u'black-beatles',
  u'artists': [{u'alias': u'rae-sremmurd',
                u'follow': {u'followkey': u'A_43974610'},
                u'id': u'43974610'}],
  ...
  },
...
  

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

How can recursive data be displayed in a template?

I am working with a model in Django that has a ForeignKey pointing to itself, and I need to display all the data from the database using lists and sublists: Below is my model definition: class Place(models.Model) name = models.CharField(max_length=1 ...

C# Selenium - Troubleshooting "Send Keys" Functionality for Downloading Files - Seeking Resolution

I have been attempting to automate the process of 'Downloading a File' using c# selenium in the IE9 browser. Despite my efforts to scour Google for answers, I have not found a solution that works. My current approach involves using 'send ke ...

Command to locate elements in parallel using Selenium

Is it possible to execute multiple findElement commands in parallel/concurrently to retrieve multiple elements from a page? If so, what is the method for accomplishing this? I attempted running a background thread, but it was unsuccessful. I am looking ...

HubSpot3 customer encountering an issue with the dreaded "excessive retry attempts" error message

I've encountered an issue while trying to retrieve contact details from HubSpot using the recipient's email address. I am utilizing the Python3 client "hubspot3" (https://github.com/jpetrucciani/hubspot3). Below is the code snippet I have implem ...

Load workbook 'a' in read-only mode using openpyxl to access data from an Excel file that is currently open in Windows

My current task involves loading a workbook and extracting its data with openpyxl on a Windows system. I am facing the challenge of needing to access the workbook even when it is already opened in Excel. Despite using load_workbook(filename=filename, read ...

When using json.dumps, it appends the attribute '"__pydantic_initialised__": true' to the object

I have been experimenting with fastapi and working with data from json files. One issue I encountered is that when I use app.put to add an object, json.dumps automatically adds the attribute "__pydantic_initialised__": true to the newly created o ...

Pandas in an endless loop?

How can I identify and resolve a potential infinite loop in my code? This is the code snippet in question: new_exit_date, new_exit_price = [] , [] high_price_series = df_prices.High['GTT'] entry_date = df_entry.loc['GTT','entry_da ...

How to identify and assign labels to elements in a numpy group

I have a solid grasp on how to assign labels to elements in one input array as shown below: arr_value = np.array([0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 2, 1, 1, 1, 1]) arr_res_1 = np.array([0, 1, 2, 3, 3, 4, 5, 6, 7, 7, 8, 9, 9, 9, 9]) # treat zeros in arr_value ...

Capture an individual image using Wand

I'm encountering an issue with my script. I am trying to utilize wand to convert a PDF file to a JPEG file and I only want to save a specific frame. Here is what my script does: If the PDF document has just one page: it successfully converts and ...

Tips on utilizing json.tool in the command line to validate and format language files while preserving unicode characters

Ubuntu 16.04 Bash 4.4 python 3.5 After receiving a collection of language files from freelance translators on Upwork, it became evident that none of the files had matching line counts. To address this discrepancy, I decided to validate and format the ...

Issue with building PyQt6 PyInstaller application for MacOS (M1 chip) causing inability to launch

My goal is to develop a python desktop application using PyQt6. I attempted to convert it into a clickable app with pyinstaller, but encountered an "app closed unexpectedly" error when trying to run the app from finder. Similarly, running the app from term ...

generate a graph for the top 20 most common words in the

Currently, I am attempting to display the most frequently used words in a plot, however, I am encountering an issue due to the language being Arabic which does not align with the format. fig, ax = plt.subplots(figsize=(12, 10)) sns.barplot(x="word", y="fr ...

Assigning a value to a specific index in a sublist (b) nested within another list (a) will automatically assign that value to the corresponding index in all sublists within list (a)

Trying to create a 9x9 array using nested lists has resulted in an issue where setting a single value is affecting every row of the array. The initial list looks like this: [[0,0,0,0,0], [0,0,0,0,0], [0,0,0,0,0], [0,0,0,0,0], [0,0,0,0,0]] Upon executi ...

"Extracting items from a list and using them to locate elements with find_element

driver.find_element_by_xpath('//*[starts-with(@href, "javascript") and contains(@href, '"(list1)[3]"')]') I'm struggling with a piece of code similar to this. It's almost functional, but the issue arises w ...

Assistance required to create a regex

Similar Question: Need help with regex writing I'm dealing with a formatted logfile that looks like this: Using data from (yyyy/mm/dd): 2011/8/3 0 files queued for scanning. Warning: E:\test\foo Händler.pdf ...

Getting the percentage of code coverage in Selenium tests in relation to the web application code

I need to track the code coverage of my selenium tests in relation to the source code of the server (web application source code) that they cover. For instance, I want the tests for the login feature to measure how much of the web application's code ...

Is there a way to recognize AJAX requests within Python's Flask framework?

Is there a way to identify if the browser sent an AJAX request through AngularJS, allowing me to respond with a JSON array? Alternatively, if not an AJAX request, I need to know how to render the template. Any suggestions? ...

Python selenium is unable to interact with essential elements

PLEASE AVOID DOWNVOTING, THIS QUESTION HAS A UNIQUE APPROACH FROM THE PREVIOUS ONE, I'M UTILIZING DIFFERENT LOGIC HERE I am attempting to loop through all user reviews (represented by the "partial_entry" class) from the following page If a comment i ...

Currently, I am working on training an LSTM model for detecting DDoS attacks. However, I encountered an issue during the evaluation process

I have utilized this code but being new to such concepts, I'm unsure where I made a mistake. Essentially, my goal is to classify whether there is an attack or not. I assigned 'Y' as my label with '11' classes. import pandas as pd d ...

Automating Checkbox Selections with Selenium in Python

I'm having trouble clicking on a checkbox. Here is the HTML Code: <div class="mb-1 p-3 termsCheck"> <input class="form-check-input float-end" type="checkbox" value="" id="flexCheckDefau ...