Expanding Excel Spreadsheets Following Data Extraction from the Internet

I managed to extract data successfully from the given website . I created an excel file with the information for one product. However, when trying to scrape data for a second product, I encountered issues adding another sheet to the existing excel file. Any guidance on this matter would be greatly appreciated. Thank you in advance. Here is my code snippet: -

from selenium import webdriver 
import time, re
from selenium.webdriver.support.ui import Select
from bs4 import BeautifulSoup
import pandas as pd
from selenium import webdriver
import time

chrome_path = r"C:\Users\user\Desktop\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)

driver.get("https://fcainfoweb.nic.in/Reports/Report_Menu_Web.aspx")

html_source = driver.page_source
results=[]

driver.find_element_by_xpath("""//*[@id="ctl00_MainContent_Rbl_Rpt_type_1"]""").click()
element_variation = driver.find_element_by_id ("ctl00_MainContent_Ddl_Rpt_Option1")
drp_variation = Select(element_variation)
drp_variation.select_by_visible_text("Daily Variation")

driver.find_element_by_id("ctl00_MainContent_Txt_FrmDate").send_keys("01/05/2020")
driver.find_element_by_id("ctl00_MainContent_Txt_ToDate").send_keys("27/05/2020")

element_commodity = driver.find_element_by_id ("ctl00_MainContent_Lst_Commodity")
drp_commodity = Select(element_commodity)
drp_commodity.select_by_visible_text("Rice")

driver.find_element_by_xpath("""//*[@id="ctl00_MainContent_btn_getdata1"]""").click()

soup = BeautifulSoup(driver.page_source, 'html.parser')
table = pd.read_html(driver.page_source)[2] #second table is the one that we want
print(len(table))
print(table)

results.append(table)
driver.back()
time.sleep(1)
with pd.ExcelWriter(r'C:\Users\user\Desktop\python.xlsx') as writer:
 table.to_excel(writer, sheet_name = "rice", index=False) # Rice results on sheet named rice
 writer.save() 

driver.find_element_by_xpath("""//*[@id="btn_back"]""").click()
driver.find_element_by_xpath("""//*[@id="ctl00_MainContent_Rbl_Rpt_type_1"]""").click()
element_variation = driver.find_element_by_id ("ctl00_MainContent_Ddl_Rpt_Option1")
drp_variation = Select(element_variation)
drp_variation.select_by_visible_text("Daily Variation")

driver.find_element_by_id("ctl00_MainContent_Txt_FrmDate").send_keys("01/05/2020")
driver.find_element_by_id("ctl00_MainContent_Txt_ToDate").send_keys("27/05/2020")

element_commodity = driver.find_element_by_id ("ctl00_MainContent_Lst_Commodity")
drp_commodity = Select(element_commodity)
drp_commodity.select_by_visible_text("Wheat")

driver.find_element_by_xpath("""//*[@id="ctl00_MainContent_btn_getdata1"]""").click()

soup = BeautifulSoup(driver.page_source, 'html.parser')
table = pd.read_html(driver.page_source)[2] #second table is the one that we want
print(len(table))
print(table)

results.append(table)
driver.back()
time.sleep(1)

with pd.ExcelWriter(r'C:\Users\user\Desktop\python.xlsx') as writer:
 table.to_excel(writer, sheet_name = "wheat", index=False) # Wheat results on sheet named wheat
 writer.save()

Answer №1

When dealing with certain types of files, it may be necessary to read all data into memory, add new data, and then save all the data back to the file. Other files may require using the "append" mode.

For more information, refer to the documentation for ExcelWriter, which includes an option mode="a" for appending to an existing file.

with pd.ExcelWriter(r'C:\Users\user\Desktop\python.xlsx') as writer:
    table.to_excel(writer, sheet_name="rice", index=False)
    #writer.save() 

with pd.ExcelWriter(r'C:\Users\user\Desktop\python.xlsx', mode='a') as writer:
    table.to_excel(writer, sheet_name="wheat", index=False)
    #writer.save() 

Alternatively, you can achieve this without using the append mode in a single with block.

with pd.ExcelWriter(r'C:\Users\user\Desktop\python.xlsx') as writer:
    table.to_excel(writer, sheet_name="rice", index=False)
    table.to_excel(writer, sheet_name="wheat", index=False)
    #writer.save() 

By the way: I discovered that the append mode does not work with the xlsxwriter engine, so I had to switch to using the openpyxl engine (which also requires installing the openpyxl module with pip).

with pd.ExcelWriter(r'python.xlsx', engine='openpyxl', mode='a') as writer:

I found a list of available engines in response to this question: Engines available for to_excel function in pandas


Here is the full working code:

from selenium import webdriver
from selenium.webdriver.support.ui import Select
import pandas as pd
import time

# --- functions ---

def get_data(start_date, end_date, product):

    # Select `Variation Report`
    driver.find_element_by_id('ctl00_MainContent_Rbl_Rpt_type_1').click()

    # Select `Daily Variant`
    element_variation = driver.find_element_by_id('ctl00_MainContent_Ddl_Rpt_Option1')
    
    ...

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Basic Python-based Discord chatbot designed to function as a versatile dictionary utilizing a variety of data sources

Creating a chatbot to assist the translator community on Discord with a comprehensive vocabulary database is my goal. However, due to the massive size of the wordpool, I plan to divide the data into multiple files just like how printed dictionaries organiz ...

Tips for improving the speed of loading infinite scroll pages

I am currently working on scraping over 100k rows from the provided URL. The data goes back approximately a month, loading in batches of 7-8 rows at a time. My current approach involves using a macro to scroll down the page slowly, which is effective but ...

Click on a regular key on the Selenium/Python website

Is there a way to send keystrokes like "Q", "W", "E", or "R" without targeting a specific element on the website? I attempted using send_keys on a random element, but it didn't work and Selenium reported that it couldn't interact with the element ...

The transition from using Selenium to sending requests

I am currently exploring the requests module as an alternative to Selenium for web scraping. Below is the code snippet I have been working on that extracts a table from a webpage. I'm struggling to optimize this code using requests in a more efficie ...

Tips for optimizing the Headless Chrome window in Robot Framework

I am having trouble launching Headless Chrome with a maximized window. I attempted two different solutions, but neither of them are working for me. Here is the first solution: Open Browser ${LOGIN_URL} headlesschrome Maximize Browser Window And her ...

"Exploring the World of Floating and Integer Numbers in Python

Can someone help me with a Python question? I am trying to check the type of a number in Python. Here's an example: x=4 y=2 type(x/y) == type(int) --> False In this case, I expected it to be True, but Python interpreted 4/2 as 2.0. 2.0 is consid ...

Waiting for Elements to be Added to Parent in a Lazy Loading Website with Selenium and Python

When working with Selenium's expected conditions for wait, such as those mentioned here: https://selenium-python.readthedocs.io/api.html#module-selenium.webdriver.support.expected_conditions, I find myself unsure about which one to use or if that is e ...

Python encountering errors while attempting to load JSON file

I have a text file containing the following json: { "data sources" : [ "http://www.gcmap.com/" ] , "metros" : [ { "code" : "SCL" , "name" : "Santiago" , "country" : "CL" , "continent" : "South America" , "timezone" : -4 , "coordinates" : {"S" : 33, "W" : ...

Mastering the process of running selenium automation scripts (written in Java) with Safari Technology Preview

Seeking assistance with running automation scripts on Safari. Currently utilizing Selenium Webdriver scripts on Mac OS (High Sierra) and Safari 11.1.2 I've added the WebDriver Extension to the Safari browser and enabled 'Allow Remote Automation ...

When utilizing the JSON Wire Protocol, Selenium fails to employ the webdriver.firefox.profile feature

After launching the profile manager, I created a new profile named "foo" and set it as the default profile for Firefox. I then launched Firefox and closed it. Next, I started Selenium using the argument -Dwebdriver.firefox.profile=foo. The server's o ...

Showing an input box in Iron PythonorHow to implement

I am attempting to create a custom message box in Iron Python with options for both continuing and closing. Currently, the code I am using is: WinForms.MessageBox.Show("Hello, world!", "MessageBoxHelloWorld", WinForms.MessageBoxButtons.OKCancel, WinForms. ...

Storing list values into variables in Selenium Webdriver: A step-by-step guide

https://i.stack.imgur.com/DJXfO.pngI am just starting out with selenium webdriver and I'm trying to figure out how to extract the value from a specific Ul class and then store it in a variable. Unfortunately, I haven't been successful so far. Th ...

There seems to be an issue with the CSV file, possibly indicating an error or the file may not be an SYLYK file when

After developing a node.js script to convert an array object into CSV format using the "objects-to-csv" library from NPM, I encountered an issue when opening the generated CSV file in WPS and Microsoft Office. The warning suggested that there was either an ...

Scraping data using Selenium with paragraph tags

I've been scouring the internet for examples similar to the one in question, but with no luck. The challenge at hand revolves around extracting text from a webpage where only one of two p tags contains important information. How can data be extracted ...

Recommendation for a web service Python framework

In the process of developing a web service that will facilitate file upload/download, user management, and permissions, I am seeking guidance on which web framework to utilize for this task. This service will essentially function as remote storage for med ...

Tips for combining cells partially in a vertical direction within the pandas library

Here is the dataframe I am working with: index Flag Data 0 1 aaaa 1 0 bbbb 2 0 cccc 3 0 dddd 4 1 eeee 5 0 ffff 6 1 gggg 7 1 hhhh 8 1 iiii I want to obtain a merged vertical data where it's divided by Flag 1. index Flag Dat ...

Having trouble successfully logging in to a website using Python and Selenium through Chrome

After updating Chrome from Version 83 to Version 89, the following code stopped working and always fails. The failure causes the password field to move to the username location, resulting in it being appended to the username (usernamepassword). from se ...

How can I remove an empty row from a list of dictionaries using a for loop in Python?

I need assistance with a Python code snippet that involves manipulating dictionaries in a list. Specifically, I have a list of dictionaries named "rows" where some dictionaries contain empty values indicated as "None." How can I create a for loop to iterat ...

Steps for precisely locating an element within a table using Selenium

Is there a reliable method for finding the element "1988" (the fourth line) in the table below? <table border="0" width="820" cellpadding="2" cellspacing="0"> <tbody> <tr valign="top"> <td class="default" width="100%">R ...

A guide on how to automate the clicking of all retrieved links using Selenium with Ruby

I am facing an issue with my code that is intended to click on all links within a fetched page using Selenium Web Driver API. However, the script only clicks on one link and gets stuck there. Here is the code snippet I have written: require 'rubygems ...