Utilizing Scrapy for Extracting Size Information obscured by Ajax Requests

I am currently trying to extract information about the sizes available for a specific product from this URL:

However, I am encountering difficulty in locating the details hidden within the Select Size Dropdown on the webpage (e.g., 7 - In Stock, 7.5 - In Stock 5+, etc.).

Even when inspecting the code in debugger mode, my attempts to scrape this information using the provided Xpath have been unsuccessful:

item["Sizes"] = sel.xpath("//select[@name='siz']/option/text()").extract()

I suspect that these details may be obscured by Ajax. Any assistance would be greatly appreciated.

Answer №1

The issue arises from the fact that the sizes are contained within an iframe that is loaded from a separate URL. To resolve this, you first need to retrieve the URL from the src attribute of the iframe, then send a request to that URL in order to extract the sizes.

Here's a demonstration (using scrapy shell):

$ scrapy shell http://www.tennisexpress.com/k-swiss-mens-ultra-express-tennis-shoes-black-fade-and-electric-blue-38191
>>> from urlparse import urljoin
>>> url = 'http://www.tennisexpress.com/'
>>> path = response.xpath('//div[@id="prodPurchasing"]/iframe/@src').extract()[0]
>>> url = urljoin(url, path)
>>> fetch(url)
>>> response.xpath("//select[@name='siz']/option/text()").re(r'[\.0-9]+')
[u'7', u'7.5', u'8', u'8.5', u'9', u'9.5', u'10', u'10.5', u'11', u'11.5', u'12', u'13']

As a note, I am using re() to filter out legitimate sizes from the select options, as demonstrated below:

The pattern [\.0-9]+ will match one or more digits or dots.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Tips for detecting and eliminating outliers in a dataset containing a mixture of numerical and categorical data

I am working with a dataset and have identified outliers that are 3 standard deviations away from the mean in each numerical column. I need to remove these outliers and drop the rows that contain them. ...

What could be causing the parameters in my function to malfunction?

In my function, I am aiming to generate the initials of each name. For instance, LeBron James would produce LJ. def get_initials(first_name, last_name): first_initial = first_name[0] second_initial = last_name[0] print(first_initial + second_ ...

Alter the URL for Jquery AJAX post by utilizing radio buttons

I have a product search box that allows users to search by Cartridge Name or Printer Name. These are different SQL queries, so I have created separate PHP files for each query. However, I am facing challenges in using jQuery to check the value of the radio ...

Unable to convert JSON data into a datatable

My code looks like this: $.ajax({ type : 'POST', url : 'classfication_of_productjson.html', dataType : 'json', data : { "search_bunrui_code ...

What are the steps to modify the URL and template page post-login?

Currently, I am developing a web application using JSF, Hibernate, and ICEfaces. In my project, I have implemented two Facelets templates - one for the login page and another for the rest of the website. However, I am facing an issue where after clicking ...

Retrieve another resource from the Flask API

I have developed a basic Flask API that returns a list of strings. Here is the code: from flask import Flask, request from flask_restful import Resource, Api from json import dumps app = Flask(__name__) api = Api(app) class Product(Resource): def ge ...

Best practices for handling AJAX GET and POST requests in EmberJS

EmberJS has captured my interest and I am thoroughly enjoying diving into it. Although there is a learning curve, I truly believe in the meaningful principles it stands for. I am curious about the process of making GET and POST calls in Ember JS. While I ...

Using Django and mypy for type hinting with the ValuesQuerySet

What would be the appropriate type hint to use for a function that returns a queryset like the example below? def _get_cars_for_validation(filter_: dict) -> QuerySet: return ( Car.objects.filter(**filter_) .values("id", "brand", "en ...

Guide on how to dynamically add AJAX JSON array response to an HTML table

Hey! I need some advice on how to dynamically append a JSON Array response to an HTML table after submitting a form using AJAX. Here's the scenario: This is my form : <form id="myForm" method="POST"> <input type=" ...

The PartialView is currently displaying the entire page instead of just rendering the form

I'm attempting to refresh the partialview exclusively within my modal, following the ajax request. However, upon receiving the response, everything gets refreshed and only the partialview is displayed. Here is the PartialView: @{ var bidModel = ...

What is the best way to perform vectorized calculations on a pandas dataframe, replacing values with the data from the previous row if a certain condition is not satisfied

My current approach involves using a for loop with a conditional statement that triggers a calculation and updates the column in a dataframe when true. However, if the condition is not met, the data from the previous row is carried over to the new row. He ...

Tips for extracting particular information from a website and displaying it within my application

While I have searched for a solution to my problem, I couldn't find an answer that fits my specific needs. I am attempting to extract specific information (highlighted in red) from this website: . I understand that JSON parsing can retrieve this data, ...

"Create a notification pop-up in CSS that appears when a link is clicked, similar to

I am looking to create a model page that functions similarly to the inbox popup on Stack Overflow. When we click on the inbox icon, a small box appears with a tiny loader indicating messages or comments. The box then expands depending on the content inside ...

Exploring Python's list comprehension assignments

I am trying to find a way to streamline assignments using list comprehensions. I want to refactor a code snippet like the one below into a list comprehension. Here is the function: import time def f(x): time.sleep(1) return x + 1 And here is the ...

Updating array with user input using Ajax

For the past few days, I have been struggling to extract data from a PHP page (array) and store this data in a jQuery array. The issue arises when trying to send data to the PHP page using the following code: $.ajax({ type: "POST", url: 'test ...

Updating password in Django with AJAX

During the development of my website, I successfully added the functionality for users to change their passwords. This was achieved by utilizing the pre-existing view in django.contrib.auth. However, the next step is to enhance this feature by implementin ...

I am encountering an issue with implementing the (EC.presence_of_element_located(By.class, "")) function

I recently encountered a problem while working on my Python Selenium project. It seems that Python is having trouble recognizing the code snippet EC.presence_of_element_located. I am not sure why this error is occurring. Below is the segment of code where ...

"Enhance Your Form with Ajax Submission using NicEdit

Currently, I am utilizing nicEditor for a project and aiming to submit the content using jQuery from the plugin. Below is the code snippet: <script type="text/javascript"> bkLib.onDomLoaded(function() { new nicEditor().panelInstance('txt1' ...

Divergent behavior of Selenium xpath when used with python versus ruby

I'm currently facing an unusual problem with selenium. The following ruby code snippet: page.all(:xpath, "//table[3]//tr[last()]//td") Successfully retrieves all the cells of the last row from the third table on a webpage. In contrast, this python ...

Issue with IE7: jQuery.getJSON callback not triggering

A legitimate JSON file, returned with the correct HTTP headers: Content-Type:application/json; charset= Displays properly in Chrome/FF, but IE7 is having trouble parsing it. Where should I start investigating? $.getJSON(url, null, function(data){ aler ...