Using BeautifulSoup for extracting data from tables

Question

Using BeautifulSoup for extracting data from tables

Seeking to extract table data from the following website:

stock = 'ALCAR'
page = requests.get(f"https://www.isyatirim.com.tr/tr-tr/analiz/hisse/Sayfalar/sirket-karti.aspx?hisse={stock}")

soup = BeautifulSoup(page.content, 'html.parser')

table = soup.find('tbody', id="tbodyMTablo")
print(table)


for j in table.find_all('tr'):
    row_data = j.find_all('td')
    row = [i.text for i in row_data]

    #print(row)
    df = pd.DataFrame(row).transpose()
    df.to_csv('xxx.csv')
    print(df)

python beautifulsoup

Answer 1

Answer №1

In your current code, you are repeatedly saving each row in a CSV file, overwriting it with each iteration of the for loop. One way to improve this is by first storing all rows in a pandas dataframe and then saving the entire dataframe to a CSV file.

import requests
from bs4 import BeautifulSoup
import pandas as pd

stock = 'ALCAR'
page = requests.get(f"https://www.isyatirim.com.tr/tr-tr/analiz/hisse/Sayfalar/sirket-karti.aspx?hisse={stock}")

soup = BeautifulSoup(page.content, 'html.parser')

table = soup.find('tbody', id="tbodyMTablo")

df = pd.DataFrame() # Initialize an empty dataframe
for j in table.find_all('tr'):
    row_data = j.find_all('td')
    row = [i.text for i in row_data]
    df = df.append(pd.DataFrame(data=[row])) # Append each row to the dataframe

df.to_csv('xxx.csv')  # Save the final dataframe to a CSV file
print(df)

Answer 2

In your current code, you are repeatedly saving each row in a CSV file, overwriting it with each iteration of the for loop. One way to improve this is by first storing all rows in a pandas dataframe and then saving the entire dataframe to a CSV file.

import requests
from bs4 import BeautifulSoup
import pandas as pd

stock = 'ALCAR'
page = requests.get(f"https://www.isyatirim.com.tr/tr-tr/analiz/hisse/Sayfalar/sirket-karti.aspx?hisse={stock}")

soup = BeautifulSoup(page.content, 'html.parser')

table = soup.find('tbody', id="tbodyMTablo")

df = pd.DataFrame() # Initialize an empty dataframe
for j in table.find_all('tr'):
    row_data = j.find_all('td')
    row = [i.text for i in row_data]
    df = df.append(pd.DataFrame(data=[row])) # Append each row to the dataframe

df.to_csv('xxx.csv')  # Save the final dataframe to a CSV file
print(df)

Answer 3

Answer №2

Just a quick tip - While working with the pandas library, you can easily utilize pandas.read_html to generate your own DataFrame.

import pandas as pd

stock = 'ALCAR'
df = pd.read_html(f'https://www.isyatirim.com.tr/tr-tr/analiz/hisse/Sayfalar/sirket-karti.aspx?hisse={stock}', attrs = {'class':'excelexport'})[0]

df.to_csv('filename.csv')

Alternatively, if you don't require the headers and index:

df.to_csv('filname.csv', header=None,index=None)

Answer 4

Just a quick tip - While working with the pandas library, you can easily utilize pandas.read_html to generate your own DataFrame.

import pandas as pd

stock = 'ALCAR'
df = pd.read_html(f'https://www.isyatirim.com.tr/tr-tr/analiz/hisse/Sayfalar/sirket-karti.aspx?hisse={stock}', attrs = {'class':'excelexport'})[0]

df.to_csv('filename.csv')

Alternatively, if you don't require the headers and index:

df.to_csv('filname.csv', header=None,index=None)

Using BeautifulSoup for extracting data from tables

Answer №1

Answer №2

Similar questions

The Pandas DataFrame is displaying cells as strings, but encountered an error when attempting to split the cells

What is the best way to import a CSV file directly into a Pandas dataframe using a function attribute?

Troubleshooting Chrome driver version conflict with Selenium and PyInstaller in Python

How can you configure a column in Django Tables2 to display text from a related table accessed through a foreign key?

Error: The script was unable to locate the requested files

Locate instances of a particular element within an array within a string and display them

The tkinter library is unable to locate the specified function

Changing a string into an already present variable (pandas)

Trigger the execution of a Python script through a webpage with just the click of a button

There was no output of information when I attempted web scraping with BeautifulSoup in Python

Error: The method sort_values() is missing a necessary argument: "by"

Utilize NLTK in Python to tokenize the word "don't" as "dont"

Extracting elements from a dynamic website within the <script> tag using the libraries BeautifulSoup and Selenium

Generate Numpy array without explicitly specifying elements

What is the method to obtain the mass and radius of a solar system body using Python?

Import data in pipe-separated CSV format into Hive

Collection of arrays (Python/NumPy)

What methods are available to tidy up data in Panda using series?

Is there a way for me to retrieve randomly generated id values?

Exploring image manipulation techniques through exercises with np.array indexing