Converting a table into a collection of dictionaries using BeautifulSoup

I am encountering some difficulties in scraping a specific table and converting it into a list of dictionaries. The table I am interested in scraping can be found on this page My objective is to scrape only the "Batting" table.

Below is the code I have been working with:

from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://www.baseball-reference.com/leagues/MLB/2013-finalyear.shtml")
from bs4 import BeautifulSoup
doc = BeautifulSoup(driver.page_source, "html.parser")

careers = []
for i in doc.find_all("table")[9:10]:
    dictionary = {}
    player = i.find_all(attrs = {"data-stat": "player"})
    if player:
        for n in player[1:]:  
            dictionary["Names"] = n.text.strip()
            print(n.text.strip())
    experience = i.find_all(attrs = {"data-stat": "experience"})
    if experience:
        for r in experience[1:]:
            dictionary["Years"] = r.text.strip()
    year_min = i.find_all(attrs = {"data-stat": "year_min"})
    if year_min:
        for From in year_min[1:]:
            dictionary["From"] = From.text.strip() 
    year_max = i.find_all(attrs = {"data-stat": "year_max"})
    if year_max:
        for To in year_max[1:]:
            dictionary["To"] = To.text.strip() 
    WAR = i.find_all(attrs = {"data-stat": "WAR_bat"})
    if WAR:
        for bat in WAR[1:]:
            dictionary["WAR"] = bat.text.strip() 
    G = i.find_all(attrs = {"data-stat": "G"})
    if G:
        for g in G[1:]:
            dictionary["Games"] = g.text.strip() 
    PA = i.find_all(attrs = {"data-stat": "PA"})
    if PA:
        for p in PA[1:]:
            dictionary["PA"] = p.text.strip() 
    AB = i.find_all(attrs = {"data-stat": "AB"})
    if AB:
        for ab in AB[1:]:
            dictionary["AB"] = ab.text.strip() 
    R = i.find_all(attrs = {"data-stat": "R"})
    if R:
        for r in R[1:]:
            dictionary["R"] = r.text.strip() 
    age = i.find_all(attrs = {"data-stat": "age"})
    if age:
        for age_1 in age[1:]:
            dictionary["age"] = age_1.text.strip() 
    HR = i.find_all(attrs = {"data-stat": "HR"})
    if HR:
        for hr in HR[1:]:
            dictionary["HR"] = hr.text.strip() 
    H = i.find_all(attrs = {"data-stat": "H"})
    if H:
        for h in H[1:]:
            dictionary["H"] = h.text.strip() 
    SB = i.find_all(attrs = {"data-stat": "SB"})
    if SB:
        for sb in SB[1:]:
            dictionary["SB"] = sb.text.strip() 
    BB = i.find_all(attrs = {"data-stat": "BB"})
    if BB:
        for bb in BB[1:]:
            dictionary["BB"] = bb.text.strip()    
    SO = i.find_all(attrs = {"data-stat": "SO"})
    if SO:
        for so in SO[1:]:
            dictionary["SO"] = so.text.strip()   
    OPS = i.find_all(attrs = {"data-stat": "onbase_plus_slugging"})
    if OPS:
        for ops in OPS[1:]:
            dictionary["OPS"] = ops.text.strip()   
    careers.append(dictionary)

Upon printing the 'careers' variable, I get the following output:

[{'AB': '0',
 'BB': '0',
 'From': '2007',
 'Games': '82',
 'H': '0',
 'HR': '0',
 'Names': 'Mike Zagurski',
 'OPS': '',
 'PA': '0',
 'R': '0',
 'SB': '0',
 'SO': '0',
 'To': '2013',
 'WAR': '0.0',
 'Years': '5',
 'age': '30.231'}]

I am puzzled as to why only the last row of the table is being extracted rather than every row. Any insights would be greatly appreciated. Thank you.

Answer №1

Here is a helpful pattern to use as a base, and you can customize it by adding the specific stats you need:

#(Having an id for the table simplifies targeting)
batting = doc.find(id='misc_batting')

careers = []
for row in batting.find_all('tr')[1:]:
    dictionary = {}
    dictionary['names'] = row.find(attrs={"data-stat": "player"}).text.strip()
    dictionary['experience'] = row.find(attrs={"data-stat": "experience"}).text.strip()
    careers.append(dictionary)

By looping through each row (tr tags) and extracting the stats, we benefit from having the reference of each row to easily query each column identified by data-stat, just like you've already implemented.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Automating testing with JavaScript and Selenium WebDriver

Can testing be automated using the combination of JavaScript and Selenium? I am not familiar with Java, Python, or C#, but I do have expertise in Front-End development. Has anyone attempted this before? Is it challenging to implement? Are there any recom ...

Obtain the text that is shown for an input field

My website is currently utilizing Angular Material, which is causing the text format in my type='time' input field to change. I am looking for a way to verify this text, but none of the methods I have tried give me the actual displayed text. I a ...

Can you tell me the distinction between using RemoteWebDriver's executeScript() and Selenium's getEval() for executing

Can you explain the distinction between these two pieces of code: RemoteWebDriver driver = new FirefoxDriver(); Object result = driver.executeScript("somefunction();"); and this: RemoteWebDriver driver = new FirefoxDriver(); Selenium seleniumDriver = ne ...

Selenium in C#: Timeout issue with SendKeys and Error thrown by JS Executor

Attempting to insert the large amount of data into the "Textarea1" control, I have tried two different methods. The first method successfully inserts the data but occasionally throws a timeout error, while the second method results in a JavaScript error. A ...

A combination of Tor Browser, Selenium, and Javascript for

I have been attempting to use selenium with Tor, but unfortunately it is not functioning correctly. I have come across a library that allows for this functionality, however, it appears to only work with Python. Is there a way to accomplish this using Jav ...

Interacting with shadow DOM elements using Selenium's JavaScriptExecutor in Polymer applications

Having trouble accessing the 'shop now' button in the Men's Outerwear section of the website with the given code on Chrome Browser (V51)'s JavaScript console: document.querySelector('shop-app').shadowRoot.querySelector ...

Can you explain the meaning of arguments[0] and arguments[1] in relation to the executeScript method within the JavascriptExecutor interface in Selenium WebDriver?

When utilizing the executeScript() method from the JavascriptExecutor interface in Selenium WebDriver, what do arguments[0] and arguments[1] signify? Additionally, what is the function of arguments[0] in the following code snippet. javaScriptExecutor.ex ...

I must interact with the video within the iframe by clicking on it

I am trying to interact with an iframe video on a webpage. Here is the code snippet for the video: <div class="videoWrapper" style="" xpath="1"> <iframe width="854" height="480" src="xxxxxxx" frameborder="0" allow="autoplay; encrypted-media" all ...

Selenium encountered an error when trying to execute the 'querySelector' function on the document. The selector provided, my_selector, is not recognized as a valid selector

Whenever I run this code: document.querySelector(my_selector) using selenium, an error is thrown: Failed to execute 'querySelector' on 'Document' my_selector is not a valid selector my_selector is definitely a valid selector that func ...

How to access a Selenium element using JavaScriptExecutor

My task involves working with a collection of elements in Selenium, specifically located using the By.CssSelector method: var contentRows = new List<TableRow>(); for (var i = 1; i < PositiveInfinity; i++) { var cssSelectorToFind = $"tbody &g ...

Encountering an issue with WebDriver in the realm of JavaScript

I am struggling to use JavaScript to locate specific controls and send values to them. One example is changing the text in a textbox with the ID "ID" to "123456". Below is the code I tried: ((IJavaScriptExecutor)driver).ExecuteScript("document.getElement ...

What is the process for establishing a dependency on two distinct JavaScript files, similar to the depends-on feature found in TestNG?

I am faced with a scenario where I have two separate JS files containing test methods, namely File1 and File2. The requirement is that File2.js should only be executed if File1.js has successfully completed its execution. My current setup involves using ...

Having trouble choosing an option from the dropdown menu with Puppeteer Js

I need help with Puppeteer JS to select the initial element in a dropdown. Any suggestions? Once I input the city name in the text field, I want to choose the first option from the dropdown menu. const puppeteer = require('puppeteer'); (async ...

What is the best approach for finding the xPath of this specific element?

Take a look at this website Link I'm trying to capture the popup message on this site, but I can't seem to find the element for it in the code. Any ideas? ...

What's the best way to determine which of the two forms has been submitted in Django?

On my homepage, I have both a log_in and sign_up form. Initially, the log_in form is displayed by default, but when a user clicks on the Sign Up button, the sign_up form appears. These toggles switch depending on which button the user clicks. from django ...

The PhantomJs browser is not able to open my application URL

Recently, my scripts in PhantomJS browser have stopped running. Whenever I try to capture screens, all I get are black screens. To troubleshoot this, I manually opened a URL in PhantomJS using the command window and ran the script below to verify if it ope ...

Is there a dependable resource for mastering Protractor along with the Jasmine Framework in Eclipse using JavaScript?

Starting a new role at my organization where I will be testing Angular JS applications. Can anyone recommend a good website for learning PROTRACTOR with JAVASCRIPT using the JASMINE Framework? (Would prefer if it includes guidance on Eclipse IDE) Thank yo ...

Is there a way to find the JavaScript Window ID for my current window in order to utilize it with the select_window() function in

I'm currently attempting to choose a recently opened window while utilizing Selenium, and the select_window() method necessitates its WindowID. Although I have explored using the window's title as recommended by other sources, and enabled Seleni ...

Having trouble with accessing an element that contains both onclick and text attributes in Selenium Webdriver?

The HTML code I'm dealing with includes this element: <a style="text-decoration:none; font-weight:normal;" href="javascript:void(0);" onclick="CreateNewServiceItemApproved();"> <img src="icons/ui/addnew.png"> <span style="color:# ...

What is the best way to extract the singular PDF link from a webpage?

Currently, I am attempting to utilize Selenium in Java to access DOM elements. However, I have encountered an issue while testing the code: Exception in thread "main" org.openqa.selenium.StaleElementReferenceException: stale element reference: element is n ...