Using BeautifulSoup for extracting data from tables

Seeking to extract table data from the following website:

stock = 'ALCAR'
page = requests.get(f"https://www.isyatirim.com.tr/tr-tr/analiz/hisse/Sayfalar/sirket-karti.aspx?hisse={stock}")

soup = BeautifulSoup(page.content, 'html.parser')

table = soup.find('tbody', id="tbodyMTablo")
print(table)


for j in table.find_all('tr'):
    row_data = j.find_all('td')
    row = [i.text for i in row_data]

    #print(row)
    df = pd.DataFrame(row).transpose()
    df.to_csv('xxx.csv')
    print(df)

Answer №1

In your current code, you are repeatedly saving each row in a CSV file, overwriting it with each iteration of the for loop. One way to improve this is by first storing all rows in a pandas dataframe and then saving the entire dataframe to a CSV file.

import requests
from bs4 import BeautifulSoup
import pandas as pd

stock = 'ALCAR'
page = requests.get(f"https://www.isyatirim.com.tr/tr-tr/analiz/hisse/Sayfalar/sirket-karti.aspx?hisse={stock}")

soup = BeautifulSoup(page.content, 'html.parser')

table = soup.find('tbody', id="tbodyMTablo")

df = pd.DataFrame() # Initialize an empty dataframe
for j in table.find_all('tr'):
    row_data = j.find_all('td')
    row = [i.text for i in row_data]
    df = df.append(pd.DataFrame(data=[row])) # Append each row to the dataframe

df.to_csv('xxx.csv')  # Save the final dataframe to a CSV file
print(df)

Answer №2

Just a quick tip - While working with the pandas library, you can easily utilize pandas.read_html to generate your own DataFrame.

import pandas as pd

stock = 'ALCAR'
df = pd.read_html(f'https://www.isyatirim.com.tr/tr-tr/analiz/hisse/Sayfalar/sirket-karti.aspx?hisse={stock}', attrs = {'class':'excelexport'})[0]

df.to_csv('filename.csv')

Alternatively, if you don't require the headers and index:

df.to_csv('filname.csv', header=None,index=None)

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

The Pandas DataFrame is displaying cells as strings, but encountered an error when attempting to split the cells

I am encountering an issue with a Pandas DataFrame df. There is a column df['auc_all'] that contains tuples with two values (e.g. (0.54, 0.044)) Initially, when I check the type using: type(df['auc_all'][0]) >>> str However, ...

What is the best way to import a CSV file directly into a Pandas dataframe using a function attribute?

Recently, I developed a function to compute the log returns of a given dataset. The function accepts the file name in CSV format as an argument and is expected to output a dataframe containing the log returns from the dataset. The CSV file has already been ...

Troubleshooting Chrome driver version conflict with Selenium and PyInstaller in Python

Currently, I am utilizing the Google Chrome driver to automate various functions. To execute it, I have included this line in my code: driver = webdriver.Chrome(ChromeDriverManager().install()) Surprisingly, everything works smoothly when I run the progra ...

How can you configure a column in Django Tables2 to display text from a related table accessed through a foreign key?

Despite spending an entire day reading various documentation and answers, I am still unable to get this functionality to work. I am using Django Tables2 and trying to display a list of instruments from a table that includes a foreign key to another table c ...

Error: The script was unable to locate the requested files

Currently, I am working on a Python script that will traverse through all the directories located in the same directory as the script itself, along with their subdirectories and files within those directories that have the ".symlink" suffix. The goal is to ...

Locate instances of a particular element within an array within a string and display them

I'm feeling like this should be a pretty straightforward issue, but I just can't seem to get it right. CRITERIA = ['dog', 'cat', 'bird'] TEXT_1 = 'There was a tree with a bird in it' TEXT_2 = 'The dog ...

The tkinter library is unable to locate the specified function

As a beginner in tkinter, I find myself struggling to grasp the concepts of frame, self, and so on. This is what I currently have: class ScalesPage(Frame): def GetNotes(): key = KeyEntry.get() intervals = ScaleEntr ...

Changing a string into an already present variable (pandas)

When working with a dataframe df and trying to access the unique values of a specific factor (e.g. ID), you can use the following code: UniqueFactor = df['ID'].unique() However, if you want to convert this into a function that allows you to acce ...

Trigger the execution of a Python script through a webpage with just the click of a button

I have a small web interface where I need to control a Python script that is constantly gathering data from a sensor in a while loop. Ideally, I would like the ability to start and stop this script with the click of a button. While stopping the script is s ...

There was no output of information when I attempted web scraping with BeautifulSoup in Python

As a beginner diving into the world of web scraping with BeautifulSoup, I'm encountering an issue where my code isn't extracting any information from a website. Despite not getting any errors, the output CSV file remains empty. What could I be do ...

Error: The method sort_values() is missing a necessary argument: "by"

I am working with a dataset that looks like this df2=df1.head(10) genres imdb_score 0 Action 6.239896 1 Adventure 6.441170 2 Animation 6.576033 3 Biography 7.150171 4 Comedy 6.195246 5 Crime 6.564792 6 Documentary 7.180165 7 Dra ...

Utilize NLTK in Python to tokenize the word "don't" as "dont"

Whenever I utilize the following: nltk.word_tokenize("cannot") The output I receive is: ["can", "not"] What I am aiming for is: ["cannot"] ...

Extracting elements from a dynamic website within the <script> tag using the libraries BeautifulSoup and Selenium

I am currently working on scraping a dynamic website using beautifulsoup and selenium. The specific attributes I need to filter and export into a CSV are all located within a <script> tag. My goal is to extract the information found in this script: ...

Generate Numpy array without explicitly specifying elements

Using the following initial array: x = range(30,60,2)[::-1]; x = np.asarray(x); x array([58, 56, 54, 52, 50, 48, 46, 44, 42, 40, 38, 36, 34, 32, 30]) You need to create a new array similar to this: (Note that the first item repeats) However, if there is ...

What is the method to obtain the mass and radius of a solar system body using Python?

Can I retrieve the radius and mass of celestial bodies using Astroquery or Astropy, or is there another library to consider? I have been experimenting with astroquery.horizons I attempted to use elements() and ephemeris(), but unfortunately, they do not ...

Import data in pipe-separated CSV format into Hive

Having trouble loading a pipe separated CSV file into a Hive table using Python. Can anyone provide some guidance? Here is the full code snippet: from pyhive import hive host_name = "192.168.220.135" port = 10000 user = "cloudera" password = "clouder ...

Collection of arrays (Python/NumPy)

Working with Python and NumPy, I currently have two arrays that look like this: array1 = [1 2 3] array2 = [4 5 6] My goal is to create a new array: array3 = [[1 2 3], [4 5 6]] and add elements to it. For instance, if the new items to append are: array ...

What methods are available to tidy up data in Panda using series?

I have a dataset that needs to be cleaned by removing weekend data and after-hours data on weekdays. Once the cleaning is done, I want to use it in a plot without any gaps. It should show as completed and continue seamlessly in the plot. Is there a way to ...

Is there a way for me to retrieve randomly generated id values?

I've been grappling with extracting randomly generated ID values from the mentioned website. Despite trying to combine various find_element options, I haven't had any success in retrieving these ID values. Can someone suggest an appropriate optio ...

Exploring image manipulation techniques through exercises with np.array indexing

I recently came across an exercise on a website that I'm trying to understand. The task involves removing grains smaller than 10 pixels from a picture of sand viewed under a microscope. You can find the exercise here. sand_labels, sand_nb = ndimage.la ...