Looping through a series of rows by utilizing ws.iter_rows within the highly efficient openpyxl reader

I am currently faced with the task of reading an xlsx file that contains 10 by 5324 cells.

This is essentially what I was attempting to achieve:

from openpyxl import load_workbook
filename = 'file_path'

wb = load_workbook(filename)
ws = wb.get_sheet_by_name('LOG')

col = {'Time':0 ...}

for i in ws.columns[col['Time']][1:]:
    print i.value.hour

The code was running much slower than expected (I was performing operations, not just printing) and eventually, my patience wore thin and I terminated it.

Do you have any recommendations on how I can optimize the reader? I need to iterate over a specific range of rows, rather than all rows. This is my attempt, but it seems flawed:

wb = load_workbook(filename, use_iterators = True)
ws = wb.get_sheet_by_name('LOG')
for i in ws.iter_rows[1:]:
    print i[col['Time']].value.hour

Is there a way to achieve this without using the range function?

One approach I considered is:

for i in ws.iter_rows[1:]:
    if i.row == startrow:
        continue
    print i[col['Time']].value.hour
    if i.row == endrow:
        break

However, I am wondering if there is a more elegant solution out there? (though this method doesn't seem to work either)

Answer №1

To tackle this issue with a minimum threshold in mind, we can implement the following approach:

# Your code:
from openpyxl import load_workbook
filename = 'file_path'
wb = load_workbook(filename, use_iterators=True)
ws = wb.get_sheet_by_name('LOG')

# Solution 1:
for row in ws.iter_rows(row_offset=1):
    # code to execute per row...

Here is an alternative method to achieve the same outcome using the enumerate function:

# Solution 2:
start, stop = 1, 100    # Defining lower and upper limits
for index, row in enumerate(ws.iter_rows()):
    if start < index < stop:
        # code to execute per row...

The index variable serves as a tracker for the current row number, allowing it to replace range or xrange. This method is user-friendly and compatible with iterators, unlike range or slicing. It also offers flexibility by enabling the usage of only the lower limit if needed. Happy coding!

Answer №2

In the documentation, it is mentioned:

Keep in mind: A worksheet created in memory starts out empty, with no cells until they are accessed for the first time. This approach helps minimize memory usage by only creating objects that are actually needed.

Be careful: By scrolling through cells instead of accessing them directly, all cells will be generated in memory even if they remain unused. For example:

>>> for i in xrange(0,100):
...             for j in xrange(0,100):
...                     ws.cell(row = i, column = j)

This code snippet will create unnecessary 100x100 cells in memory.

However, there are methods available to clean up these excess cells, which we will explore later on.

It's important to note that accessing the columns or rows of a worksheet may load numerous additional cells into memory. It is recommended to access only the specific cells you require.

For instance:

col_name = 'A'
start_row = 1
end_row = 99

range_expr = "{col}{start_row}:{col}{end_row}".format(
    col=col_name, start_row=start_row, end_row=end_row)

for (time_cell,) in ws.iter_rows(range_string=range_expr):
    print time_cell.value.hour

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Is there a specific algorithm in Python that is capable of filtering out data points that represent "deep valleys" on a linear slope?

I am faced with a challenge involving a set of datasets, each comprising 251 data points that need to be fitted into a sloping straight line. However, within each dataset, there are approximately 30 outliers that create deep valleys, as illustrated below.v ...

Problem encountered with color() function in turtle module

Can you help me troubleshoot an error I keep encountering in my program? I'm trying to allow the user to specify the color of lines but it's not working as expected. import turtle wn = turtle.Screen() alex = turtle.Turtle sides = int(input("Ent ...

Determine the Mean Absolute Percentage Error on a monthly basis using Python, one step

I am currently working on calculating the MAPE in Python for my predictions, and I have encountered a specific issue. My goal is to compare the MAPE values of different months to each other. To better visualize my idea, I created an Excel sheet (refer to t ...

Bringing in a module in Python

I have managed to install pocketsphinx-0.8 on my system running Ubuntu 12.04, and I was able to successfully recognize speech using pocketsphinx_continuous. Now, I am looking for guidance on how to import pocketsphinx into a python script after configurin ...

Tips on sending a successful HTTP 200 response for a Slack API event request in Python using the request module

I am trying to handle an event request by sending back an HTTP 2xx response using the Request method in Python. Can someone please provide guidance on how I can achieve this smoothly? The current issue I am facing is that I have tunnelling software runnin ...

Error 404 - CSS file missing: encountering a 404 error while attempting to execute Flask code

Hello there! I'm still getting the hang of programming, especially when it comes to web development. Recently, I encountered an issue with my CSS file - whenever I link it to my HTML and run the Python code, I get an error saying that the CSS file can ...

Converting JSON to CSV Using Python

i am currently working with a JSON file structured like this: { "temperature": [ { "ts": 1672753924545, "value": "100" } ], "temperature c1": [ { "ts": 167275392 ...

How can you identify the widget using its ID when you have assigned it a value of -1 in wxPython?

Today, I am working on some wxPython code and found this snippet (I removed the irrelevant parts): def CreateRowOne(self, pan): hbox1 = wx.BoxSizer(wx.HORIZONTAL) hbox1.Add(wx.Button(pan, -1, "250 Words"), 1, wx.EXPAND | wx ...

What are some methods for eliminating discontinuities in the complex angles of NumPy eigenvector components?

Currently, I am utilizing NumPy's linalg.eig function on square matrices that are derived from a 2D domain. Specifically, I am interested in examining the complex angles of its eigenvectors along a parameterized circle within this domain. Assuming a s ...

Error: You can only use integers, slices (:) or ellipsis (...) in this context

After thoroughly examining numerous answers on this subject, I have yet to find a satisfactory solution that fits my requirements. Despite realizing that the error may seem trivial, I am unable to resolve it myself. My goal is to extract an element from nu ...

Unable to input numerical values using sendkeys in Python Appium

**Hey everyone! I'm facing an issue and need some assistance. I've been trying to input a number into a field of type "NUMBER." First, I locate the element successfully. Secondly, I click on the element (which works fine). Thirdly, when I attemp ...

Ensure that the tkinter submit button handle function is executed only once, unless new and different input is provided

I'm a beginner in tkinter and I'm struggling with a simple method. I want to create a submit button that disables if the same user input is submitted again, but carries out its function when new input is provided. Can anyone provide some assistan ...

Provide an iterator following alterations

In my function, I have an iterator called result and I need to modify the objects within it before returning another iterator. However, iterating over result directly is causing performance issues. Instead, I want to apply these changes dynamically to each ...

Guide to setting up jaydebeapi on Python version 2.7

I am currently working with Python 2.7 on Windows 7 and I'm trying to install jaydebeapi. Initially, I attempted the following command: conda install JayDeBeApi However, I encountered an error during the installation process which you can view here. ...

Encountering Keyerror while trying to parse JSON in Python

Recently, I developed a program for extracting data from an API that returns information in JSON format. However, when attempting to parse the data, I encountered a key error. Traceback (most recent call last): File "test.py", line 20, in <module> ...

Error: Unable to locate module _vectorized

Using the Shapely package for my Python and Plone Project involves adding it to the `eggs` section in the packages.cfg file like this: [eggs] main = Shapely However, during bin/buildout, I encountered an issue with shapely.vectorized. The error mes ...

Reduce the redundancy of Python script arguments by utilizing argparse or alternative modules

Imagine having multiple scripts such as script_1.py, script_2.py, script_3.py, script_4.py, script_5.py, script_6.py, each requiring input arguments. The goal is to reduce redundancy in the code when it comes to these input arguments. For instance, consid ...

Python: parsing comments in a cascading style sheet document

I am currently working on extracting the first comment block from a CSS file. Specifically, I am looking for comments that follow this pattern: /* author : name uri : link etc */ and I want to exclude any other comments present in the file, such as: /* ...

My Django app seems to be malfunctioning - I keep receiving a "404 Page not found" error. What

I have encountered an issue similar to one discussed in a previous question posted here. However, none of the solutions provided there resolved my problem, prompting me to create a new question. Upon running my code, I am receiving the following error mess ...

A guide on retrieving all the table data and displaying it in the user interface using templates in Django

I am working on a Django project where I have created a model named "Files" with fields such as Id (auto-generated) and file_name. My goal is to retrieve all file names from the database and display them as a list on the user interface. To achieve this, I ...