Utilize Regex to isolate text between specified string markers

I am attempting to scrape a website and I need to extract the JSON data from the data variable in the JavaScript code below using Python Regex.

<script type="text/javascript">
P.when('A').register("ImageBlockATF", function(A){
    var data = {
                'colorImages': { 'initial': [{"hiRes":"https://images-na.ssl-images-amazon.com/images/I/81Oo79kGp2L._SL1500_.jpg","thumb":"https://images-na.ssl-images-amazon.com/images/I/41SnVVzKChL._SS40_.jpg","large":"https://images-na.ssl-images-amazon.com/images/I/41SnVVzKChL.jpg","main":{"https://images-na.ssl-images-amazon.com/images/I/81Oo79kGp2L._SY355_.jpg":[355,270],"https://images-na.ssl-images-amazon.com/images/I/81Oo79kGp2L._SY450_.jpg":[450...

The regex I have been trying is not producing the desired results. Here is what I have attempted so far:

(var\s+data\s+=).*^[A.trigger('P.AboveTheFold')]$

In essence, I am seeking a regex pattern that will capture the text situated between var data = and A.trigger('P.AboveTheFold').

Answer №1

Make sure that your json data does not contain any instances of ;, and then you can use the following code:

var info\s*=\s*([^;]*});

While this method may not be very robust, it is still advised to utilize a parsing library for better results. The json data will be located within the 1st group.

Check it out here.

If you are certain that your data falls between var info = and A.trigger('Y.TopOfPage'), you can implement the following:

(?<=var info = ).*(?=A.trigger\('Y\.TopOfPage'\))

Explore it further there.

The entirety of the json data will be captured with this approach, thanks to positive lookarounds. Keep in mind that this method is also not foolproof – any variance in spacing between data and = could potentially cause issues. To work around this, consider using the re.DOTALL flag in Python to ensure that . includes newlines.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Customize default key bindings for functions in tkinter Listbox

I have designed a listbox feature and I am seeking to adjust the functionality of the Left and Right arrow keys. Currently, these keys are programmed to navigate horizontally within the listbox, but I want them to move up and down the list instead. How can ...

Issue with recording the outcomes, retrieving and recording data in Excel

I created this script to analyze a file containing 50,000 words but I'm struggling with the output. Currently, it is only displaying the last word in the file! Here is the code snippet: filename = 'words.txt' try: with open('G:& ...

Selenium ceases to function intermittently

I am attempting to extract data from a website that has a total of 9000 pages. However, the extraction process stops after retrieving approximately 1700 pages and then restarts from the beginning and continues for about 1000 more pages. Is there a way to s ...

Can you provide guidance on the correct syntax for sending a JSON post request to StubHub?

{ "listing":{ "deliveryOption":"option", "event":{ "date":"date", "name":"name of event", "venue":"venue" }, "externalListingId":"000000000", "inhandDate":"inhand date", "pricePerTicke ...

Delicious Tastypie Traits and Associated Titles, Void Attribute Exception

I encountered the following error message: The object '' is throwing an empty attribute 'posts' error which does not allow a default or null value. My goal is to retrieve the number of 'votes' on a post and return it in my m ...

Error message "File or directory does not exist" encountered in Qt Creator

Currently, I am using Qtcreator to create forms and want to execute a Python script when a button is clicked on my form. However, when I try to include the Python header file: #include <Python.h> I encounter the following error message: python.h: ...

What is the best way to create a regex pattern that can match both mandatory and optional characters in a route path?

I'm having trouble creating a route path that can match both mandatory and optional characters. Here's my current route: expressRouter.get(['/post/:PostID([0-9]+)(-*)?'], async function (req, res) { let Result; const PostID ...

What methods can be used to retrieve specific components from a given string?

https://i.stack.imgur.com/rbxYv.png I have a specific string that contains information about teams named 'a' and 'b'. My goal is to extract various elements from this string, such as 'City', 'Team Name', 'Sport ...

Retrieve data from an existing Google Spreadsheet, add new rows, and make updates

One of the tasks I have successfully accomplished is connecting to a Google Spreadsheet and adding new rows with actual values from a list. The code snippet below demonstrates how this can be achieved: import gspread from oauth2client.service_account impo ...

Splitting arrays at their edges

Query: Given an ndarray: In [2]: a Out[2]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) I am searching for a method that would result in: array([7, 8, 9, 0, 1]) Example: Starting at index 8, crossing the array boundary and stopping at index 2 (included). When ...

Can you explain the significance behind the error message " 'numpy.ndarray' object is not callable"?

I am trying to create an array, k of dimension N X 1 in MATLAB by using the following code: N = 2^15 dx = 0.1 k = [0:N/2-1 0 -N/2+1:-1]'*2*pi/(N*dx) However, when I attempted to do this in Python, I encountered a problem because I couldn't "flip ...

Module 'chalk' could not be located

As part of a Trainee DevOps interview exercise, I am tasked with building and deploying a web app using Docker-Compose with Django backend and React frontend. The code has been provided to me, and the focus is on completing the build and deploy process. A ...

Can you provide instructions for incorporating a binder badge into a document to make it appear as an image?

Creating interactive notebooks from repositories is possible on mybinder.org. You can generate a badge that serves as a link to launch the interactive notebook using mybinder. While this badge image is easily visible in the readme file of the repository, ...

Mastering array/dataframe slicing (numpy/pandas)

Currently, my goal is to create 50 random samples of 30 consecutive day periods from a dataset of corn prices that are organized by date. My progress so far involves selecting 50 random days. However, I am now looking to generate an array of dataframes, e ...

Looking for a Solution to Secure Files Using Selenium Webdriver During Threading?

Implementing multiple threads to initialize PhantomJS or Chromedriver with the following code: Driver= webdriver.PhantomJS('C:\phantomjs.exe',desired_capabilities=dcap, service_args=service_args) or Driver= webdriver.Chrome(executable_pa ...

Automate the process of interacting with Instagram buttons by utilizing selenium

Recently started dabbling in Python and attempting to create an Instagram bot using selenium. I'm struggling to make the code progress past the login button and 'not now' button. from time import sleep from selenium import webdriver browse ...

Struggling to smoothly slide a button to the right with Python Selenium WebDriver

I've created a script to automate website registration using Selenium with Python, but I'm struggling to find the code that would enable me to slide the slider to the right in order to complete the registration process. Here is the link to the we ...

Error message indicating that Element <> cannot be scrolled into view persisting despite attempting common troubleshooting methods

Currently, I am utilizing Selenium to create a web scraper for downloading multiple podcast episodes from Spreaker. # https://www.spreaker.com/show/alabamas-morning-news-with-jt for i in range(3): print("Click number: {}".format(str(i))) see_mor ...

Selenium: The <span> element was not scrollable to view

I am working on a project that involves web scraping replies to comments. Below is the snippet of code I have written to load and click the "load more comments" button. load_replies =driver.find_elements_by_xpath("//div[@class='thread-node-children-sh ...

When comparing timestamps for the same date, Python's datetime and pandas may produce varying results

from datetime import datetime import pandas as pd date="2020-02-07T16:05:16.000000000" #Using datetime to convert date format t1=datetime.strptime(date[:-3],'%Y-%m-%dT%H:%M:%S.%f') #Converting date using Pandas t2=pd.to_datetime(date) #Calcul ...