What is the best way to access the statistics for each game on a live score platform?

I am looking to retrieve the statistics for each game on livescore and gather all the stats of all games in a day simultaneously.

driver.get('https://www.livescore.com/en/football/2022-12-01/')
time.sleep(2)
scroll_pause_time = 1 # You can adjust pause time based on your system speed
screen_height = driver.execute_script("return window.screen.height;")   # obtain web screen height
i = 0

while True:
    # scroll one screen height at a time
    driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i))  
    i += 1
    time.sleep(scroll_pause_time)
    # update scroll height after page scrolled
    scroll_height = driver.execute_script("return document.body.scrollHeight;") 
    soup = BeautifulSoup(driver.page_source, 'lxml')
    divs = soup.find_all('div', class_ = 'wk Ak')
    base_url = 'https://www.livescore.com'
    for p in divs:
        url = p.find('a', class_ = 'bi')
        urls = url['href']
        full_url= urljoin(base_url, urls)
        print(full_url)
        for a in full_url:
            a.click()
            
            
    
    # Stop loop if scroll height is larger than total scroll needed
    if (screen_height) * i > scroll_height:
        break
        

After getting the URL, I am stuck on how to click on each URL, extract information from each game, repeat this process for all games, and display the results.

driver.get('https://www.livescore.com/en/football/2022-12-01/')
time.sleep(2)
scroll_pause_time = 1 # Adjust pause time as needed
screen_height = driver.execute_script("return window.screen.height;")   # get screen height
i = 0

while True:
    # Scroll one screen height each time
    driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i))  
    i += 1
    time.sleep(scroll_pause_time)
    # Update scroll height each time after scrolling
    scroll_height = driver.execute_script("return document.body.scrollHeight;") 
    soup = BeautifulSoup(driver.page_source, 'lxml')
    divs = soup.find_all('div', class_ = 'wk Ak')
    base_url = 'https://www.livescore.com'
    for p in divs:
        url = p.find('a', class_ = 'bi')
        urls = url['href']
        full_url= urljoin(base_url, urls)
        print(full_url)
        for a in full_url:
            a.click()
          
          
    # Break loop when needed scroll height is greater than total scroll height
    if (screen_height) * i > scroll_height:
        break
        

Answer №1

Extracting data from the mentioned page involves retrieving information from an API endpoint, which can be identified by examining Dev tools -> Network Tab -> XHR calls.

Below is a method to obtain this data utilizing Python:

import requests
import pandas as pd

pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36'
}
url = 'https://prod-public-api.livescore.com/v1/api/app/date/soccer/20221201/0?MD=1'

r = requests.get(url, headers=headers)
df = pd.json_normalize(r.json()['Stages'], record_path=['Events'])
print(df)

Terminal Output:

   Eid Tr1 Tr2 Trh1  Trh2  Tr1OR Tr2OR   T1  T2  Eps Esid  Epr Ecov ErnInf  Ewt Et  Esd EO Spid     Pid  Pids.8  Pids.12 Media.32 Media.12
0 663917   2   4   0   1       2      4 [{'Nm': 'Costa Rica', 'ID': '6189', 'Img': 'enet/6705.png', 'NewsTag': '/team/costa-rica-...', ...]

...rows continue with various columns...

You can further dig into the JSON response object to refine and filter out the required information. Please refer to pandas json_normalize for additional insights.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

What methods can we utilize to create additional sampling points through bilinear interpolation?

Currently, I am working with a grayscale image and am looking to create new sampling points marked in red within the image using bilinear interpolation. Are there any specific formulas or functions available in python that can help me calculate the values ...

In Python, the task involves eliminating any special characters present in a string. Furthermore, extra characters will be added to the string if

I'm currently parsing files with .txt and .log extensions that contain entries such as: $AV:3666,0000,0* $AV:3664,0000,0* My goal is to remove extra characters and symbols like (AV....0000,0*) so that the entry looks like this: $:2226 $:2308 I a ...

Error message "Attempting to divide a class function by a number causes the 'Float object is not callable' error."

Within my Portfolio class, there is a method called portfolio_risk(self, year). Whenever I attempt to divide the result of this method by a number, an error occurs: Float object is not callable I believe this issue stems from the parentheses used in th ...

What could be the reason for scrapy not returning any URLs?

Recently, I attempted to develop a tool to simplify my apartment search and access relevant information quickly (the website is not very user-friendly). However, I have encountered an issue and I may be overlooking something obvious...or perhaps I'm j ...

TimeoutException raised with message, screen, and stacktrace

I am a beginner when it comes to Python and Selenium, and I was attempting to try out an example code that I found on YouTube. Here is the code snippet: from selenium import webdriver from selenium.webdriver.common.keys import Keys from selenium.webdriver ...

What is the best method for eliminating additional quotation marks from strings within a collection?

I've been developing a method to return a list in the format List[Tuple[Set[str], Set[str]]], but I'm struggling with removing extra quotes. The attribute self._history[dates[dat]]._ridings consists of a list of strings. This method aims to compa ...

How to retrieve the chosen text from a combobox using Winium

Is there a way to retrieve the selected option from a combo box using winium? I attempted to use the Select class from selenium, but encountered an error: "org.openqa.selenium.UnsupportedCommandException: 'getElementTagName' is not a valid ...

Creating unique characters with Python Selenium

I am interested in creating drawings of characters A B C D on the canvas at using Selenium Action chains. import time from selenium.webdriver.common.action_chains import ActionChains def draw(action, offset_list): for offset in offset_list: a ...

What strategies can be used to create a Page Object Model that is free from redundancy?

Currently, I am working on automating UI testing using the Page Object Model (POM) in Python with Selenium. One question that I have is how to handle duplicate test cases efficiently. For instance, let's consider two web pages: the Login page and the ...

Identifying menu items using XPath is a crucial skill to have in web

I'm having trouble identifying the left side menu item using xpath. I attempted to do so by: mat-tree-node[class$='ng-star-inserted'].findBy(text "Financial Management") Unfortunately, this method didn't work and Selenium wa ...

What steps are involved in creating a continuous time high/low pass filter in Python?

My application receives incoming signals with a fixed frequency, and I am trying to implement a filter for this signal without having to save N timesteps. I want to update my current state with each new observation, similar to a one-dimensional Kalman filt ...

serving up an HTML document through Django following user interaction

I'm facing a challenge with my HTML tables that I extract from a third-party program. I want to display these tables without relying on JavaScript. The goal is for the user to see 4 categories, each containing multiple options. However, only one item ...

Warning: The variable "color" has been assigned before being declared as a global variable in Python

Looking at the code snippet below, there is an interesting message: "SyntaxWarning: name 'color' is assigned to before global declaration global color" Despite my initial declaration of global color before assigning it a value, this warning is ...

Issues with Selenium explicit wait feature in latest version of SafariDriver 2.48.0

Issues with explicit waits in my code are arising specifically when using SafariDriver 2.48.0. The waits function properly in Chrome on both Windows and MAC platforms, but in Safari, upon reaching the wait condition, the driver throws an exception. Upo ...

Is it possible to extract the value in JavaScript, add it, and then return the result after a for loop in Cypress automation?

checkActiveInterfaces() { var totalSum = 0; var counter; for (counter = 1; counter <= 5; counter++) { cy.xpath(`(//*[name()='g' and @class ='highcharts-label highcharts-data-label highcharts-data-label- ...

Enhancing SCons Cache Copy Operation

Looking for a solution to modify the behavior of SCons when copying artifacts from the cache directory to using hard links. Here is what I have tried: def link_or_copy_file(class_instance, src, dst): # perform hard linking instead... SCons.Defaults. ...

I am seeking assistance with locating a button using Selenium Firefox Webdriver and require guidance on the code necessary to click the

Seeking assistance with Selenium to click on the Income Statement button located at . Can anyone identify the button name and review the code for accuracy in implementing it? Grateful for any guidance. Thank you! url = 'http://www.tradingview.com/scre ...

Convert pandas dataframe into parquet format file

I've encountered delimiter issues when trying to move a CSV file from one S3 bucket to another by converting it to a TXT file. To address this, I attempted to convert the CSV to Parquet files instead, but I'm unsure if I'm following the corr ...

Slowly scrolling down using Selenium

Struggling with performing dynamic web scraping on a javascript-rendered webpage using Python. 1) Encountering an issue where elements load only when scrolling down the page slowly. Tried methods such as: driver.execute_script("window.scrollTo(0, Y)") ...

The format string passed to bytes.__format__ is not supported and resulted in a TypeError

Hey there, I'm currently working on formatting an sqlite file for better readability using the python script from dionea honeynet. However, each time I run the script, I keep running into this error: File "./readlogsqltree.py", line 268, in print_log ...