What is the best way to access the statistics for each game on a live score platform?

Question

What is the best way to access the statistics for each game on a live score platform?

I am looking to retrieve the statistics for each game on livescore and gather all the stats of all games in a day simultaneously.

driver.get('https://www.livescore.com/en/football/2022-12-01/')
time.sleep(2)
scroll_pause_time = 1 # You can adjust pause time based on your system speed
screen_height = driver.execute_script("return window.screen.height;")   # obtain web screen height
i = 0

while True:
    # scroll one screen height at a time
    driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i))  
    i += 1
    time.sleep(scroll_pause_time)
    # update scroll height after page scrolled
    scroll_height = driver.execute_script("return document.body.scrollHeight;") 
    soup = BeautifulSoup(driver.page_source, 'lxml')
    divs = soup.find_all('div', class_ = 'wk Ak')
    base_url = 'https://www.livescore.com'
    for p in divs:
        url = p.find('a', class_ = 'bi')
        urls = url['href']
        full_url= urljoin(base_url, urls)
        print(full_url)
        for a in full_url:
            a.click()
            
            
    
    # Stop loop if scroll height is larger than total scroll needed
    if (screen_height) * i > scroll_height:
        break

After getting the URL, I am stuck on how to click on each URL, extract information from each game, repeat this process for all games, and display the results.

driver.get('https://www.livescore.com/en/football/2022-12-01/')
time.sleep(2)
scroll_pause_time = 1 # Adjust pause time as needed
screen_height = driver.execute_script("return window.screen.height;")   # get screen height
i = 0

while True:
    # Scroll one screen height each time
    driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i))  
    i += 1
    time.sleep(scroll_pause_time)
    # Update scroll height each time after scrolling
    scroll_height = driver.execute_script("return document.body.scrollHeight;") 
    soup = BeautifulSoup(driver.page_source, 'lxml')
    divs = soup.find_all('div', class_ = 'wk Ak')
    base_url = 'https://www.livescore.com'
    for p in divs:
        url = p.find('a', class_ = 'bi')
        urls = url['href']
        full_url= urljoin(base_url, urls)
        print(full_url)
        for a in full_url:
            a.click()
          
          
    # Break loop when needed scroll height is greater than total scroll height
    if (screen_height) * i > scroll_height:
        break

python selenium-webdriver web-scraping beautifulsoup

Answer 1

Answer №1

Extracting data from the mentioned page involves retrieving information from an API endpoint, which can be identified by examining Dev tools -> Network Tab -> XHR calls.

Below is a method to obtain this data utilizing Python:

import requests
import pandas as pd

pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36'
}
url = 'https://prod-public-api.livescore.com/v1/api/app/date/soccer/20221201/0?MD=1'

r = requests.get(url, headers=headers)
df = pd.json_normalize(r.json()['Stages'], record_path=['Events'])
print(df)

Terminal Output:

   Eid Tr1 Tr2 Trh1  Trh2  Tr1OR Tr2OR   T1  T2  Eps Esid  Epr Ecov ErnInf  Ewt Et  Esd EO Spid     Pid  Pids.8  Pids.12 Media.32 Media.12
0 663917   2   4   0   1       2      4 [{'Nm': 'Costa Rica', 'ID': '6189', 'Img': 'enet/6705.png', 'NewsTag': '/team/costa-rica-...', ...]

...rows continue with various columns...

You can further dig into the JSON response object to refine and filter out the required information. Please refer to pandas json_normalize for additional insights.

Answer 2

Extracting data from the mentioned page involves retrieving information from an API endpoint, which can be identified by examining Dev tools -> Network Tab -> XHR calls.

Below is a method to obtain this data utilizing Python:

import requests
import pandas as pd

pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36'
}
url = 'https://prod-public-api.livescore.com/v1/api/app/date/soccer/20221201/0?MD=1'

r = requests.get(url, headers=headers)
df = pd.json_normalize(r.json()['Stages'], record_path=['Events'])
print(df)

Terminal Output:

   Eid Tr1 Tr2 Trh1  Trh2  Tr1OR Tr2OR   T1  T2  Eps Esid  Epr Ecov ErnInf  Ewt Et  Esd EO Spid     Pid  Pids.8  Pids.12 Media.32 Media.12
0 663917   2   4   0   1       2      4 [{'Nm': 'Costa Rica', 'ID': '6189', 'Img': 'enet/6705.png', 'NewsTag': '/team/costa-rica-...', ...]

...rows continue with various columns...

You can further dig into the JSON response object to refine and filter out the required information. Please refer to pandas json_normalize for additional insights.

What is the best way to access the statistics for each game on a live score platform?

Answer №1

Similar questions

What methods can we utilize to create additional sampling points through bilinear interpolation?

In Python, the task involves eliminating any special characters present in a string. Furthermore, extra characters will be added to the string if

Error message "Attempting to divide a class function by a number causes the 'Float object is not callable' error."

What could be the reason for scrapy not returning any URLs?

TimeoutException raised with message, screen, and stacktrace

What is the best method for eliminating additional quotation marks from strings within a collection?

How to retrieve the chosen text from a combobox using Winium

Creating unique characters with Python Selenium

What strategies can be used to create a Page Object Model that is free from redundancy?

Identifying menu items using XPath is a crucial skill to have in web

What steps are involved in creating a continuous time high/low pass filter in Python?

serving up an HTML document through Django following user interaction

Warning: The variable "color" has been assigned before being declared as a global variable in Python

Issues with Selenium explicit wait feature in latest version of SafariDriver 2.48.0

Is it possible to extract the value in JavaScript, add it, and then return the result after a for loop in Cypress automation?

Enhancing SCons Cache Copy Operation

I am seeking assistance with locating a button using Selenium Firefox Webdriver and require guidance on the code necessary to click the

Convert pandas dataframe into parquet format file

Slowly scrolling down using Selenium

The format string passed to bytes.format is not supported and resulted in a TypeError

What is the best way to access the statistics for each game on a live score platform?

Answer №1

Similar questions

What methods can we utilize to create additional sampling points through bilinear interpolation?

In Python, the task involves eliminating any special characters present in a string. Furthermore, extra characters will be added to the string if

Error message "Attempting to divide a class function by a number causes the 'Float object is not callable' error."

What could be the reason for scrapy not returning any URLs?

TimeoutException raised with message, screen, and stacktrace

What is the best method for eliminating additional quotation marks from strings within a collection?

How to retrieve the chosen text from a combobox using Winium

Creating unique characters with Python Selenium

What strategies can be used to create a Page Object Model that is free from redundancy?

Identifying menu items using XPath is a crucial skill to have in web

What steps are involved in creating a continuous time high/low pass filter in Python?

serving up an HTML document through Django following user interaction

Warning: The variable "color" has been assigned before being declared as a global variable in Python

Issues with Selenium explicit wait feature in latest version of SafariDriver 2.48.0

Is it possible to extract the value in JavaScript, add it, and then return the result after a for loop in Cypress automation?

Enhancing SCons Cache Copy Operation

I am seeking assistance with locating a button using Selenium Firefox Webdriver and require guidance on the code necessary to click the

Convert pandas dataframe into parquet format file

Slowly scrolling down using Selenium

The format string passed to bytes.__format__ is not supported and resulted in a TypeError

The format string passed to bytes.format is not supported and resulted in a TypeError