Extracting information from an API

Hey there, good morning!

I'm currently working on gathering car data from this website:

My process involves sending a request through the search bar on the homepage for a specific location and date. This generates a page like this:

From there, I use the developer tools in my web browser to extract the data into a JSON file by scraping it. However, the issue is that the JSON file changes every time I make a new location request and can be found at the same URL ().

Does anyone have any suggestions on how I can create a spider that will automatically send requests, retrieve the JSON file, and scrape it? Or perhaps advice on how to manipulate the API data directly to access information from different locations?

Thanks so much in advance!

Answer №1

Scrapy has a feature that filters requests to URLs that have already been visited in order to prevent looping. This means that if the resource you are trying to access always uses the same URL, Scrapy will automatically filter it out.

However, you can override this filtering by including dont_filter=True in your request. For example:

yield scrapy.Request(
   url='https://www.uniquewebsite.com/page',
   dont_filter=True,
   callback=self.parse_item
)

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

WebDriver and Python: A guide on capturing a dynamically changing URL and storing it in a variable efficiently

Python and Selenium webdriver script: elem = driver.find_element_by_css_selector("#username") elem.send_keys("username") elem = driver.find_element_by_css_selector("#password") elem.send_keys("password") drive ...

Discord.py - Utilizing Optional Arguments

Struggling with setting up a command that includes an optional argument to reload my cogs. Initially, I encountered issues where the command was not recognized and even after making some tweaks, it ended up reloading all arguments regardless of specifying ...

Generating hierarchical structures from div elements

Looking for guidance on how to parse a HTML page like the one below and create a hierarchical Javascript object or JSON. Any assistance would be much appreciated. <div class="t"> <div> <div class="c"> <input t ...

Obtaining and storing an array of JSON data from Socket.IO

Upon receiving an array of objects from my server, it looks like this: Log.i(TAG, "Destinos obtenidos: "+ destinosRuta); The log value is - Destinos obtenidos [{"Latitud":41.40404,"nombreDestino":"Barcelona","IDR":5,"IDD":6,"Longitud":2.168679},{"Latit ...

Running a Bokeh Server with Tornado on Amazon Web Services (AWS)

Is there a way to prevent public IP access to the Bokeh Server when hosting it on AWS with Tornado? ...

Generating a three-level unordered list using arrays and for-loops in JavaScript/JSON

Are there more efficient ways to achieve the desired results from this JSON data? Can someone assist me in understanding why it is working and if it can be optimized for cleanliness? <div id="accordion" class="display-data"> ...

Locate a Sub-Child href Element using Selenium

I am attempting to interact with a link using Selenium automation tool. <div id="RECORD_2" class="search-results-item"> <a hasautosubmit="true" oncontextmenu="javascript:return IsAllowedRightClick(this);" class="smallV110" href="#;cacheurl ...

The file data/mscoco_label_map.pbtxt cannot be found

Seeking Assistance! Thank You in Advance for your help. I am currently working on creating an object detector using Python in Google Colab. I'm facing some issues and would greatly appreciate your guidance. Could it be a module version error or perha ...

Transforming a string list into a list of numeric values

I am dealing with a list that has the following structure: mylist = ['1,2,3'] As it stands, this is a list containing one string. My goal is to transform it into a list of integers like so: mylist = [1,2,3] My attempt using [int(x) for x in m ...

What are the best practices for creating and referencing a .pdbrc file?

I attempted to create a configuration file called .pdbrc for both Python 2.7 and Python 3.5, following a code sample provided in the official pdb documentation. Here is the snippet I used: # Print instance variables (usage "pi classInst") alias pi for k i ...

Error: The `math.sqrt()` function encountered a math domain error due to an invalid value

Can anyone help me with a program to check for Herone Triangle in the specified range of tries to max_tries? I'm having trouble with the math.sqrt() function. This is the code I have so far: import math max_tries = 10000 tries = 1 half_perimeter = ( ...

Instead of printing only the JSON data using "response.data", the entire response is being printed out

I'm making a GET request from an AngularJS 1.5x client to a Java HTTP method. In the client, I want to display the response that includes some JSON data. // This is a method from an Angular service that sends AJAX requests this.getTask = function(tas ...

What is preventing me from breaking out of this endless loop of get requests?

While attempting web scraping, I set the code to break after 72 requests, but it continues running. How can I fix this issue? Even adding a print(variable) function didn't resolve the problem. # Re-initializing lists for data storage names = [] years ...

Determine the database selection for Django

I am facing a challenge in my Django project where I need to utilize multiple databases. Everything runs smoothly when there is only one database configured: Here is the setup in settings.py DATABASES = { 'default': { 'ENGINE&ap ...

Incorporating JavaScript/JSON into your Ebay listings to seamlessly receive user-selected choices

There has been a trend among Ebay users to incorporate dynamic data fetching and sending from external sources in their listings. This could be for implementing a shipping calculator or offering different product variants through dropdown lists. You can c ...

Divergent behavior of Selenium xpath when used with python versus ruby

I'm currently facing an unusual problem with selenium. The following ruby code snippet: page.all(:xpath, "//table[3]//tr[last()]//td") Successfully retrieves all the cells of the last row from the third table on a webpage. In contrast, this python ...

Experimenting with Flask redirects using Python unittests

I am currently working on writing unit tests for my Flask application. Within several of my view functions, like the login function shown below, I perform a redirect to a different page: @user.route('/login', methods=['GET', 'POST& ...

Creating a Function Object in Python

According to my understanding from a book, a function in Python is essentially an object of the Function class. I have a few questions regarding this concept: 1. When exactly is this object created? Is it at the moment we define the function or when we ac ...

Building a structured JSON request using Jbuilder

My current JSON request is returning the following data, where each person/lender has multiple inventories. #output of /test.json [ {"id":13, "email":"<a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="a4cecbcccac0cbc1e4c1dcc5c ...

Comparing frozensets in Python

Take a look at this script: # A list of 7 frozensets with different numbers of string objects multipleSmallFrozensets = [ frozenset({'YHR007C', 'YHR042W'}), frozenset({'YPL274W'}), frozenset({'YCL064C'}) ...