Utilizing regular expressions or BeautifulSoup to locate a specific word or numerical value following a designated keyword

My goal is to extract data from Bloomberg in a structured format to create concise lists. The data structure on Bloomberg's website looks like this:

    <script type="text/javascript">
var ClientApp = require('app/ClientApp');
var clientApp = new ClientApp();
clientApp.start({
environmentConfig: {"appRoot":"","assetManifest":{"public/images/marketdata-quoteshare-image.png":"//assets.bwbx.io/markets/public/images/marketdata-quoteshare-image.31c2f976.png","public/javascripts/application.js":"//assets.bwbx.io/markets/public/javascripts/application.72f7c0c6.js",...*truncated for brevity*...

I am exploring ways to utilize libraries like BeautifulSoup or JSON to extract specific data under the 'bootstrappedData' section and present it in a structured manner, similar to:

primaryExchange: NASDAQ GS
price: 40.6 ...

Although the elements such as primaryExchange and price remain constant, I need to retrieve the dynamic values following the colon for each company.

Below is my current approach using Python:

import re
import urllib2
import requests
from bs4 import BeautifulSoup

def scrape():
    ticker = raw_input("Enter Ticker Symbol: ")
    url = "http://www.bloomberg.com/quote/" + ticker + ":US"
    htmlfile = urllib.urlopen(url)
    htmltext = htmlfile.read()
    soup = BeautifulSoup(htmltext, 'html.parser')
    
    # Add code here to parse and extract desired information
    
    return extracted_data

print(scrape())

I would appreciate any guidance on utilizing regex or Beautiful Soup effectively for data extraction in this context.

Thank you for your assistance.

Answer №1

It appears that the information is formatted in JSON. It would be advisable to utilize a JSON parser for processing.

Answer №2

If you ever find yourself in need of using regex, here's a method to achieve what you want. Save your information as a string in the variable data.

import re
data = '' #the data you have as a string
x = re.findall(r'\"\NASDAQ\s\w+\S+\.\d\,', str(data))

Result:

['"NASDAQ GS","price":40.6,']

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Issue: "StoreController Undefined" error in Python Flask + Angular application

In the python flask application that I have built with Angular JS for the front end, there are three main files. app.py import json import flask import numpy as np app = flask.Flask(__name__) @app.route("/") def index(): ...

Utilize the openpyxl library to customize the formatting of the Excel spreadsheet by adjusting the number format, resizing the text to fit within cells

I am looking to make modifications to an Excel file using only the openpyxl library. The python code snippet I have is as follows: from openpyxl import Workbook book = Workbook() sheet = book.active rows = [['name', 'This is number1', & ...

What is preventing me from breaking out of this endless loop of get requests?

While attempting web scraping, I set the code to break after 72 requests, but it continues running. How can I fix this issue? Even adding a print(variable) function didn't resolve the problem. # Re-initializing lists for data storage names = [] years ...

Python code for comparing two sets of data, determining the best match, and calculating the percentage of the match

I’ve been searching high and low for answers on this topic, but I haven't found anything that quite fits the bill. Any advice or insights you could offer would be greatly appreciated! My dilemma involves dealing with 2 lists, each of which consist ...

Incorporating "quad" and "quadrature" into Python/SciPy for seamless integration

Upon reviewing the documentation on this and that, I noticed that both "quad" and "quadrature" could potentially be used interchangeably in terms of syntax. However, it appears that they are not entirely interchangeable: from scipy.integrate import quad a ...

Python code: Attempting to retrieve the hour, minute, and second values from a datetime object

As a newcomer to the Python environment, I am working on manipulating datetime objects by using the method replace(hour=...,minute=...,second=...) in an iterative manner and storing the results at each step in a pd.Series. The objective is to have a pd.Ser ...

Finding the smallest value within a collection of JSON objects in an array

Looking at the following list, I am in search of the lowest CPU value: [{'Device': 'A', 'CPU': 10.7, 'RAM': 32.5}, {'Device': 'B', 'CPU': 4.2, 'RAM': 32.4}, {'Device' ...

Pandas - Adding a fresh column and populating it with filtered values

If I have a dataframe like this: id category 1 A 2 A 3 B 4 C 5 A I need to add a new column with incremental values where category == 'A'. The desired output is: id category value 1 A 1 2 A 2 3 B ...

In Python, either convert a string literal to a string format or trigger an error

I am seeking a solution to extract and convert a potential Python string literal within a given string. If the string contains a valid Python string, I aim to obtain the actual string value; otherwise, an error should be raised. Is there an alternate metho ...

Python Selenium - Trouble clicking button element without redirecting to desired link

I am currently conducting a test on a web user interface using Selenium in Python. In one of the test cases, there is a button that should redirect to another page when clicked. However, despite executing the code without any errors, the page does not red ...

Adding data to the following row in a dataframe while iterating through a for loop

For educational purposes, I've developed a Python web scraper that retrieves data from the Yahoo Finance Summary and Statistics page of a stock. The program reads information from the '1stocklist.csv' file in the directory and processes it a ...

I'm unable to bring in my package in this context. My current setup involves using Selenium version 4

from selenium import webdriver from selenium.webdriver.chrome.service import Service from selenium.webdriver.common.by import by serv_obj=Service("C:\Drivers\chromedriver_win32\chromedriver.exe") driver=webdriver.Chrome(service=ser ...

I am struggling to comprehend the einsum calculation

As I attempt to migrate code from Python to R, I must admit that my knowledge of Python is far less compared to R. I am facing a challenge while trying to understand a complex einsum command in order to translate it. Referring back to an answer in a previ ...

I'm encountering a persistent issue every time I attempt to utilize a spritesheet in Python

Running my code for the first time with spritesheets, I encountered 2 errors. The first error seems to be related to passing 4 arguments instead of the expected 3 when calling for the spritesheet. Even though I only see 3 arguments being passed. The seco ...

Python 3: Encountering a critical error when trying to read a file that causes a crash due to

Encountered an unusual behavior with Python 3: file = open(path, mode='rb').read() file_ori = open(self.filePath, mode='rb').read() m = hashlib.md5() md5 = m.update(file) md5 = m.hexdigest() file = '0x'.encode('ascii&a ...

Every time I attempt to load an image on Pygame, I encounter the frustrating message "File not found"

I've been diving into the world of pygame and encountering some frustrating errors. Every time I try to add a background image to my window, I keep receiving an error message saying No file found, even though the image name is correct. For reference ...

Python Selenium is having trouble finding the elements

I've been struggling for hours to extract the specific text from a variety of elements on a website. I attached 2 images in hopes that they will help in identifying these elements by their similarities, such as having the same class name. The black un ...

The DataFrame is grouped together to analyze the count of distinct values within each group

I've attempted the following code: df.groupby(['Machine','SLOTID'])['COMPONENT_ID'].unique() The resulting output is as follows: Machine COMPONENT_ID LM5 11S02CY382YH1934472901 [N3CP1.CP] 11S02C ...

What is the best way to successfully navigate through image reCAPTCHA challenges on various websites?

Hey there, thank you in advance! I am looking to bypass the recaptcha on this website: [. I am utilizing antiCaptha and have an api_key but unfortunately do not have access to the site_key. It seems like I only require the site_k ...

Learn the steps to automate clicking on the "next" button using Selenium or Scrapy in Python

While attempting to gather data from flipkart.com using scrapy, I successfully collected everything except for navigating to the next page. Initially, I attempted to use scrapy followed by selenium. Interestingly, a class contains two links - one for the p ...