Converting a table into a collection of dictionaries using BeautifulSoup

Question

Converting a table into a collection of dictionaries using BeautifulSoup

I am encountering some difficulties in scraping a specific table and converting it into a list of dictionaries. The table I am interested in scraping can be found on this page My objective is to scrape only the "Batting" table.

Below is the code I have been working with:

from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://www.baseball-reference.com/leagues/MLB/2013-finalyear.shtml")
from bs4 import BeautifulSoup
doc = BeautifulSoup(driver.page_source, "html.parser")

careers = []
for i in doc.find_all("table")[9:10]:
    dictionary = {}
    player = i.find_all(attrs = {"data-stat": "player"})
    if player:
        for n in player[1:]:  
            dictionary["Names"] = n.text.strip()
            print(n.text.strip())
    experience = i.find_all(attrs = {"data-stat": "experience"})
    if experience:
        for r in experience[1:]:
            dictionary["Years"] = r.text.strip()
    year_min = i.find_all(attrs = {"data-stat": "year_min"})
    if year_min:
        for From in year_min[1:]:
            dictionary["From"] = From.text.strip() 
    year_max = i.find_all(attrs = {"data-stat": "year_max"})
    if year_max:
        for To in year_max[1:]:
            dictionary["To"] = To.text.strip() 
    WAR = i.find_all(attrs = {"data-stat": "WAR_bat"})
    if WAR:
        for bat in WAR[1:]:
            dictionary["WAR"] = bat.text.strip() 
    G = i.find_all(attrs = {"data-stat": "G"})
    if G:
        for g in G[1:]:
            dictionary["Games"] = g.text.strip() 
    PA = i.find_all(attrs = {"data-stat": "PA"})
    if PA:
        for p in PA[1:]:
            dictionary["PA"] = p.text.strip() 
    AB = i.find_all(attrs = {"data-stat": "AB"})
    if AB:
        for ab in AB[1:]:
            dictionary["AB"] = ab.text.strip() 
    R = i.find_all(attrs = {"data-stat": "R"})
    if R:
        for r in R[1:]:
            dictionary["R"] = r.text.strip() 
    age = i.find_all(attrs = {"data-stat": "age"})
    if age:
        for age_1 in age[1:]:
            dictionary["age"] = age_1.text.strip() 
    HR = i.find_all(attrs = {"data-stat": "HR"})
    if HR:
        for hr in HR[1:]:
            dictionary["HR"] = hr.text.strip() 
    H = i.find_all(attrs = {"data-stat": "H"})
    if H:
        for h in H[1:]:
            dictionary["H"] = h.text.strip() 
    SB = i.find_all(attrs = {"data-stat": "SB"})
    if SB:
        for sb in SB[1:]:
            dictionary["SB"] = sb.text.strip() 
    BB = i.find_all(attrs = {"data-stat": "BB"})
    if BB:
        for bb in BB[1:]:
            dictionary["BB"] = bb.text.strip()    
    SO = i.find_all(attrs = {"data-stat": "SO"})
    if SO:
        for so in SO[1:]:
            dictionary["SO"] = so.text.strip()   
    OPS = i.find_all(attrs = {"data-stat": "onbase_plus_slugging"})
    if OPS:
        for ops in OPS[1:]:
            dictionary["OPS"] = ops.text.strip()   
    careers.append(dictionary)

Upon printing the 'careers' variable, I get the following output:

[{'AB': '0',
 'BB': '0',
 'From': '2007',
 'Games': '82',
 'H': '0',
 'HR': '0',
 'Names': 'Mike Zagurski',
 'OPS': '',
 'PA': '0',
 'R': '0',
 'SB': '0',
 'SO': '0',
 'To': '2013',
 'WAR': '0.0',
 'Years': '5',
 'age': '30.231'}]

I am puzzled as to why only the last row of the table is being extracted rather than every row. Any insights would be greatly appreciated. Thank you.

python selenium beautifulsoup

Answer 1

Answer №1

Here is a helpful pattern to use as a base, and you can customize it by adding the specific stats you need:

#(Having an id for the table simplifies targeting)
batting = doc.find(id='misc_batting')

careers = []
for row in batting.find_all('tr')[1:]:
    dictionary = {}
    dictionary['names'] = row.find(attrs={"data-stat": "player"}).text.strip()
    dictionary['experience'] = row.find(attrs={"data-stat": "experience"}).text.strip()
    careers.append(dictionary)

By looping through each row (tr tags) and extracting the stats, we benefit from having the reference of each row to easily query each column identified by data-stat, just like you've already implemented.

Answer 2

Here is a helpful pattern to use as a base, and you can customize it by adding the specific stats you need:

#(Having an id for the table simplifies targeting)
batting = doc.find(id='misc_batting')

careers = []
for row in batting.find_all('tr')[1:]:
    dictionary = {}
    dictionary['names'] = row.find(attrs={"data-stat": "player"}).text.strip()
    dictionary['experience'] = row.find(attrs={"data-stat": "experience"}).text.strip()
    careers.append(dictionary)

By looping through each row (tr tags) and extracting the stats, we benefit from having the reference of each row to easily query each column identified by data-stat, just like you've already implemented.

Converting a table into a collection of dictionaries using BeautifulSoup

Answer №1

Similar questions

Automating testing with JavaScript and Selenium WebDriver

Obtain the text that is shown for an input field

Can you tell me the distinction between using RemoteWebDriver's executeScript() and Selenium's getEval() for executing

Selenium in C#: Timeout issue with SendKeys and Error thrown by JS Executor

A combination of Tor Browser, Selenium, and Javascript for

Interacting with shadow DOM elements using Selenium's JavaScriptExecutor in Polymer applications

Can you explain the meaning of arguments[0] and arguments[1] in relation to the executeScript method within the JavascriptExecutor interface in Selenium WebDriver?

I must interact with the video within the iframe by clicking on it

Selenium encountered an error when trying to execute the 'querySelector' function on the document. The selector provided, my_selector, is not recognized as a valid selector

How to access a Selenium element using JavaScriptExecutor

Encountering an issue with WebDriver in the realm of JavaScript

What is the process for establishing a dependency on two distinct JavaScript files, similar to the depends-on feature found in TestNG?

Having trouble choosing an option from the dropdown menu with Puppeteer Js

What is the best approach for finding the xPath of this specific element?

What's the best way to determine which of the two forms has been submitted in Django?

The PhantomJs browser is not able to open my application URL

Is there a dependable resource for mastering Protractor along with the Jasmine Framework in Eclipse using JavaScript?

Is there a way to find the JavaScript Window ID for my current window in order to utilize it with the select_window() function in

Having trouble with accessing an element that contains both onclick and text attributes in Selenium Webdriver?

What is the best way to extract the singular PDF link from a webpage?