Troublesome behavior exhibited by Numpy's einsum. How to catch it?

When numpy einsum throws the error shown below, what is usually the underlying issue?

Traceback (most recent call last):
  File "rmse_iter.py", line 30, in <module>
    rmse_out = np.sqrt(np.einsum('ij,ij->i',diffs,diffs)/3.0)
TypeError: invalid data type for einsum

The contents of the numpy array diff result from a subtraction operation between two pandas dataframes. This diff array only contains numbers of type np.float32, with no presence of strings, nan values, infinities, or any other unusual data types. In this context, what specific factor should be investigated as potentially leading to this particular einsum failure?

This snippet outlines the process used to load and manipulate the dataframe:

df = pd.read_pickle(fn)
df.replace([np.inf, -np.inf], np.nan, inplace=True)
df.dropna(inplace=True)
a = df.values
diffs = a[:,2:27] - a[:,27:]
rmse_out = np.sqrt(np.einsum('ij,ij->i',diffs,diffs)/3.0)

We apologize for the vague nature of the inquiry. Our gratitude goes out to Divakar for sharing knowledge about einsum wizardry.

Edit:

In an attempt to provide relevant information in tabular format, see the example data below:

        rna     cnv     1_a     2_a     3_a     4_a     5_a  
5641095 AP1G1   CCL8    3.588543653488159       10.119391441345215      32.92853546142578       6.307891368865967    
.
.
.
Detailed row data continues...
.
.
.
     -6.431360721588135      -0.43901631236076355    3244183414374400.0      15.554900169372559

Answer №1

It turns out that when extracting values from a dataframe using df.values, the conversion of strings to np.nan is not allowed. This caused issues with typecasting and slicing the array created from df.values, as the values remained as "object".

To resolve this issue, I specifically selected numeric columns from the original dataframe and converted them into a matrix:

a= df[df.columns[2:]].as_matrix()

I then adjusted the indexes in the subtraction operation, as the column indexes had shifted back by two:

diffs = a[:,:25] - a[:,25:]

The lesson learned here is that when encountering problems with einsum, check for any non-float32 or float64 data types such as strings or objects in your array.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

An effective method for solving equations using a table

In the figure presented, I have a table displaying data with the initial entries shown below: n m equations 1 1 0 1 2 dP-41 1 3 2dP-28 2 1 -35 etc Formulas for suppression of harmonics I want to write code in Python that can take specifie ...

Tips for halting the execution of a scheduled airflow DAG

I made a mistake and accidentally set up a basic DAG job to run every 5 minutes. This is the command I used: airflow backfill jobs -s 2017-05-01 -e 2017-06-07 When I checked the airflow Webserver GUI, it started running multiple backfilled jobs. I attemp ...

Unlocking the power of PyCharm through PYTHONPATH

My go-to directory for the PYTHONPATH is located at c:\test\my\scripts\. Within this directory, I have a few modules that I regularly import and use without any issues in my Python shell. Could someone guide me on how to include this d ...

use triple quotes for python variable declaration

I am looking to extract information from an HTML file that contains elements with the class name "link". My challenge is to read each line into a variable and then parse it, all while using triple quotation marks. How can I create a string variable that ad ...

Assess how my website is displayed to an algorithm

A website can be accessed not just by a user through a browser, but also by programs, bots, and crawlers. I have a website hosted on Google App Engine with Python that features non-static HTML pages generated by a Python program by manipulating strings. Th ...

What techniques can I apply to utilize list values for formatting an API payload request string efficiently?

I am currently working on using an API to make multiple look-ups based on a list, with the goal of creating a dictionary that can be used to manipulate a dataframe. The issue I am facing is that while my current approach of using ({}.format(i)) appears to ...

Deciphering the printed results of a binary search tree traversal

I am currently grappling with the inner workings of the code snippet provided below. While the code runs smoothly, there are certain aspects that still elude my understanding. This particular piece of code represents a method for conducting an in-order tr ...

How can you halt a pickle process before it finishes its course?

In my program, I utilize the pickle module to save and load complex datasets, which can sometimes take up to 30 seconds. This process runs in a separate thread and includes a progress dialog with a cancel button. My question is, is it possible to halt the ...

What is the process for changing a text file into a dictionary and displaying the contents?

My data is organized into three columns of information: 2 12345 1.12345 1 54321 1.54321 3 12345 1.12345 I want Python to convert the first two columns into keys and the third column into values, but since my file is too large to manuall ...

Using Scrapy in Python to accept command line arguments

My goal is to implement arguments into the spider for the URL. For instance: scrapy crawl test -a url="https://example.com" Following this, I aim to automatically extract the start_urls and convert them into domain_allowed. For example: domain_allowed = ...

Ways to add up only the numerical digits in a given list by utilizing the decorate method

While attempting to pass a list to the function for summing numbers in a list, I encountered an error 'unsupported operand type(s) for +=: 'int' and 'str''. As a workaround, I decided to create a nettoyage function to filter o ...

Locating web addresses that include a particular term

Despite its reputation for being unsuitable for web scraping and html, I find myself in need of using RegEx for the first time. How else can I tackle this challenge without it? I have a Python scraper that navigates through 24 different webpages, each con ...

Is there a way to reset my Pong game without losing my score?

For my first fully developed Python project, I have created a Pong game using tkinter for most of the functionality. However, I encountered an issue with restarting the program using the restart button without affecting the score counter in the top left ...

Fundamental Iteration/Python

I have been given the task of writing a program that will list numbers from 100 to 200, with ten displayed on each line, that are divisible by either 5 or 6 but not both. I've managed to come up with some code on my own, but there seems to be somethin ...

Using Python variables within an IF statement

After searching for a solution, I realized that none of the methods I tried matched my specific issue. Currently, I am utilizing fabric with a run command to execute hostname -i remotely. Here is the snippet of code: ip = run("hostname -i") if %s in run(" ...

Discovering a common timeframe (overlap) within time columns using Python

How can I identify overlapping time periods between two datetime columns in a dataset? For example, consider the following datetime columns: Start End 2016-08-22 20:20:00 2016-08-22 20:30:00 2016-08-22 20:55:00 ...

What is the best method for handling sub-batches of asynchronous tasks?

I have a collection of asynchronous tasks (around 100) that I would like to execute in groups of five using the subprocess.popen method for each task. My approach would involve: Running the initial five tasks from the list Regularly checking the status o ...

Paramiko is issuing a return code of 127 for a command that does in fact

While attempting to automate an SSH routine using paramiko on a Linux machine, I am facing a problem where I receive a "command not found" 127 response code when trying to execute an existing command. I have tried various methods such as exec_command(), m ...

What is the best way to send data from a React.js application to AWS Lambda?

I'm having trouble sending data from my React application to an AWS Lambda function through API Gateway. Here is the code snippet from my React app: const exampleObj = { firstName: 'Test', lastName: 'Person' }; fetch(process.env.R ...

Present the Azure OCR results in a systematic order that follows the reading direction

After running a JSON read-script from Azure OCR, I received the following output: # Extracting word bounding boxes and associated text. line_infos = [region["lines"] for region in analysis["regions"]] word_infos = [] for line in line_in ...