Python script to extract product information from Walmart search results

Attempting to gather search results from Walmart has been a challenge for me.

For instance, let's navigate to the website ""

Then, try to extract only the text from the element identified by the class name search-product-result, using python.

I have tested using selenium, but I encountered identity verification requests. When I tried with requests, Walmart displayed a forbidden page. I experimented with other libraries as well, but solutions have not emerged. Any suggestions?

Answer №1

The content on this webpage is dynamically loaded using JavaScript, rendering the use of beautifulsoup ineffective in this scenario.

Nevertheless, the information displayed on the page can be found as a JSON string within the <script> tag with the identifier id=searchContent in the HTML Code.

I have managed to isolate this <script> segment from the HTML code, made necessary modifications, and transformed the text into JSON format. This allows you to extract any desired data from the JSON structure.

Provided below is the code snippet that identifies the product IDs present in the search results:

from bs4 import BeautifulSoup
import requests
import json

headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"}
url = 'https://www.walmart.com/search?query=coffee%20machine'

r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.text, 'lxml')
s = str(soup.find('script', {'id': 'searchContent'}))
s = s.strip('<script id="searchContent" type="application/json"></script>')
j = json.loads(s)
x = j['searchContent']['preso']['items']

for i in x:
    print(i['productId'])

This will output the respective product IDs:

2RYLQXVZ80E8
7EYUEQ82RMBP
7A3VDQNS5R36
22GRP3PGSY4A
238DLP3R0M3W
52NMIX2M8SC5
1R4H630LRNSE
.
.
.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Challenges encountered during the execution of the test script using Maven

During the execution of my test script using Maven Test, I encountered errors at runtime. Here is a summary of the errors: [INFO] Scanning for projects... [INFO] [INFO] ------------------------------------------------------------------------ [IN ...

Selenium web driver failing to display auto suggestions

Upon loading this website using selenium webdriver in Chrome for Windows, the search bar now displays auto suggestions. Visit Avnet This behavior can be replicated within a selenium session, regardless of manual interaction, as long as the regular chrome ...

Remote control connections to the hub in Selenium Grid fail to register, yet the build is able to complete successfully

I have successfully set up a Selenium Grid on my local machine and then transitioned it to a server (Windows Server 2008 R2). The server instance is working well with locally launched agents, and the server's hosted console is visible over the intern ...

Executing a for loop asynchronously on an AsyncGenerator

When working with an asynchronous generator, the expectation is to be able to iterate through it asynchronously. However, after running the code below, a synchronous for loop is produced instead: import asyncio async def time_consuming(t): print(f"G ...

To effectively place a tkinter DoubleVar in a dictionary, ensure to integrate the __hash__ and __eq__ methods to the imported type

I have come across an issue where I imported a class from another module (in this case, a tkinter DoubleVar) and wanted to use it as a key in a dictionary. However, since the class does not have __hash__ and __eq__ defined, this approach did not work. Is ...

web2py: Stop deletion operation on the A() assistant based on the callback's outcome

Can the deletion of a 'tr' be prevented based on the return value of the service_record_del_request callback function? {{=A('Delete', callback=URL('service_record_del_request', vars={'id':r.id}), delete='tr&apos ...

Extract information from various JSON files and save it to the database

As a newcomer in the realm of Python, I find myself engrossed in a project that requires me to extract data from various JSON files and store it in a Postgresql database. These JSON files all originate from the same website: However, they contain distinct ...

Using Python selenium to locate and interact with a hidden drop-down option by targeting elements with the attribute "style='display

I am currently facing a challenge in using Python Selenium to choose an item from a dropdown menu, as well as selecting start_date and end_date. The URL I am working with is provided below: <select title="drop_down" name="sType1" cl ...

Discovering identical values across various text files using Python

I am dealing with 2 text files right now, and my goal is to extract a value from fileA, compare it with fileB, and display the outcome. Specifically, fileA includes information such as userID, artistID, and how many times a particular artist has been play ...

Tips for Filtering an Excel Spreadsheet Before Extracting Data

I'm currently working on a script to retrieve data from an API and output it into an Excel spreadsheet. I'm looking to add autofiltering as an optional feature. Unfortunately, I keep encountering an Attribute error with the message 'Workshe ...

What is the best way to create a regular expression that can either replace or append a new substring to the beginning of a string?

I am faced with a scenario where I have a string that may appear as either "string" (initial case) or [word]string[word] (latter case). The objective is to transform it into [new_word]string[new_word]. Upon utilizing my_string = re.sub(r'\[[^&b ...

What is the best way to extract videos from a YouTube search?

I am looking to search for a specific keyword and then extract all the URLs of videos. Although I understand that the code I'm going to share won't achieve this goal, I still want to demonstrate my progress so far. chrome_path = r"C:\Users ...

The issue persists with FastAPI CORS when trying to use wildcard in allow origins

A simplified representation of my code utilizing two different approaches. from fastapi import FastAPI middleware = [ Middleware( CORSMiddleware, allow_origins=["*"], allow_credentials=False, allow_methods=["*"], ...

Tips for accessing all elements of a 2D array using slicing in Python

I am trying to assign the value of G[1][0], G[1][1]... G[1][100] and G[0][1], G[1][1]... G[100][1] to be 1 G = [[0]*101]*101 G[1][:] = 1 G[:][1] = 1 However, this resulted in an error: ... G[1][:] = 1 TypeError: can only assign an iterable Is there a w ...

Tips for retrieving data from a customized combo box when the select and robot actions are not providing consistent results with the webdriver

String currentText = “Card77”; // selecting value from custom box initializeRobot(); // initializing the robot globally driver.findElement(By.classname(“custom-combobox”)).click(); // clicking on the custom box Type.text(currentText); // typing the ...

Output is absent within a switch-case statement

I have a question about the logic in my Hero and Enemy classes. It seems like when I input 1, the print statement from Hero.checkLife should be executed based on my testing. However, for some reason, it appears that the if/elif statements are being skipp ...

Unable to input any text using Selenium

In my document object model (DOM) structure, I have the following code: <?xml version="1.0" encoding="UTF-8"?> <div class="CodeMirror-scroll" tabindex="-1" draggable="true"> <div class="CodeMirror-sizer" style="margin-left: 53px; margin- ...

The activation of Grid Control upon the execution of a Selenium introductory script

I have encountered an issue while running the provided code in Eclipse as Google Chrome is not launching. The configurations I am using include: Eclipse IDE for Java Developers Version: 2019-12 (4.14.0) Updated GC driver versions Interestingly, ...

encountering difficulties while installing node-sass using npm

As I work on creating a local environment for PrestaShop, I am facing an issue during the npm installation process. There seems to be an error occurring when trying to install node-sass. Can someone provide guidance on resolving this issue? ...

transforming a string into a pandas dataframe using Python

I have a String variable with data that I would like to convert into a data frame using Python. I need someone to provide guidance on how to proceed. Data : data1 Name Space -------------------- ...