Python: Identifying the highest value across various columns in a Pandas Dataframe

I'm new to python and I have a pandas dataframe with multiple columns representing months. I want to compare these columns across a period of x months and flag any rows that have ever had a value of 2 or more.

Here is the code snippet I used to generate my sample dataframe:

arr_random = np.random.randint(low=0, high=5, size=(100,26))
col_names = []
i = 0
while i <= 25:
    col_names.append('mth_'+str(i))
    i = i + 1
rand_df = pd.DataFrame(arr_random, index = None, columns = col_names)

I want to flag rows in the following way: 1 = 2+, 0 = <2, -1 = missing data (I consider NaN values as -1). Below is the code snippet I'm using to achieve this:

review_months = [12, 18, 24]
for x in review_months:
    rand_df['TWOPLUS_'+str(x)+'M'] = -1
    for i in range(x):
        rand_df['TWOPLUS_'+str(x)+'M'] = rand_df[['TWOPLUS_'+str(x)+'M', 'mth_'+str(i+1)]].max(axis = 1)
        conditions  = [ rand_df['TWOPLUS_'+str(x)+'M'] >= 2, rand_df['TWOPLUS_'+str(x)+'M'] < 2, rand_df['mth_'+str(i)] == -1 ]
        choices     = [ 1 , 0, -1 ]
        rand_df['TWOPLUS_'+str(x)+'M'] = np.select(conditions, choices, default=np.nan)

The issue I am facing is that I only get the current status of whether a row has 2 or more in a specific column within the time frame, rather than capturing if it has EVER occurred at some point over that time period.

Answer №1

To determine whether the data frame has ever contained a value of 2 or higher, you can utilize the provided code snippet:

for month in (12, 18, 24):
    rand_df[f'TWOPLUS_{month}M'] = (rand_df.loc[:, rand_df.columns[:month+1]] >= 2).any(axis=1).astype(int)
    rand_df[f'TWOPLUS_{month}M'].fillna(-1, inplace=True)

rand_df

This code extracts columns up to the specified month and assesses if each value is at least 2. The any(axis=1) function confirms if any value within a row is true. Subsequently, it converts to 1 for True values and 0 for False ones. Any null values are replaced with -1.

You can refer to the following links for more information on the any method and using pandas .loc: Any Method Documentation Pandas .loc Documentation

Output Table:

...(remaining rows)...
mth_0 mth_1 mth_2 mth_3 mth_4 mth_5 mth_6 mth_7 mth_8 mth_9 ... mth_19 mth_20 mth_21 mth_22 mth_23 mth_24 mth_25 TWOPLUS_12M TWOPLUS_18M TWOPLUS_24M
0 1 0 3 1 4 3 4 0 0 2 ... 4 4 0 1 0 1 2 1 1 1
1 0 0 3 4 4 1 0 1 4 2 ... 0 2 2 0 3 3 1 1 1 1

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Is there a way to combine three separate lines of information in the output into one cohesive format?

My goal is to display two lines of code together rather than as separate entities. The desired output includes Twitter usernames and the number of retweets they have made, using an excel sheet filled with data and incorporating the xlrl module. for cell ...

Should you create an archive - Retain outcomes or retrieve them whenever needed?

I am currently developing a project that allows users to input SQL queries with parameters. These queries will be executed at specified intervals determined by the user (e.g., every 2 hours for 6 months), and the results will be sent to their email address ...

Verifying user input against a text file to confirm its existence

My goal is to prompt user input (product) and verify if it exists in a txt file (line) or not. If the product is found, the program should execute an IF condition; if not found, it should go to the ELSE statement. Strangely, everything seems to be outputti ...

What is the best way to sum values from a specific column only if there is a matching string in another

Looking to sum numbers from a specific column only when it meets a certain criteria in another column, such as adding integers in col2 when col1 is 'A'. import pandas as pd d = {'col1': ['A', 'B', 'A', &apo ...

What is the method for determining the frequency of words in a list based on a string?

If we have a selection of words and a particular string, I am interested in generating a new array that indicates the occurrence of each word from the list within the string. The key requirement is for consistency among word positions in the array relative ...

Python, Selenium, and gecko driver with added browser extensions

Earlier today, Selenium with Firefox and extensions was working perfectly fine. However, after updating FF, Selenium stopped working (see here) which forced me to switch to geckodriver. Now, I am trying to run Selenium (gecko driver) with the udrive exten ...

Creating a dynamic MPTT structure with expand/collapse functionality in a Django template

I am looking for a way to display my MPTT model as a tree with dropdown capability (open/close nodes with children) and buttons that can expand/collapse all nodes in the tree with just one click. I have searched for examples, but the best I could find is ...

The Jupyter kernel encountered an error while attempting to initialize

After successfully installing anaconda 3, I launched "jupyter notebook" in the anaconda prompt to open jupyter. Unfortunately, I encountered a kernel error while attempting to run code in a new python 3 notebook with the message: Traceback (most recent ca ...

What could be causing my Selenium URL_to_be statement to fail?

Why won't Selenium recognize my manual visits to publish0x.com? Does anyone have a solution for this? I need to complete the captcha manually on the login page and then have the script continue after logging in and landing on the main page. from sele ...

Encountering difficulties while attempting to decode specific characters from API calls in Django

The response I am receiving from an API contains a character \x96 Upon making the API call, the following error is triggered: 'ascii' codec can't encode character u'\x96' in position 56: ordinal not in range(128) This ...

Navigating the Zeppelin: A Guide to Understanding DataFrames via SQL

I am new to using Python with Zeppelin and I am trying to import a dataframe into Zeppelin through SQL. However, I have only found materials about PySpark in Zeppelin so far. %python import pandas as pd #To work with dataset import numpy as np #Math l ...

Connecting JSON objects based on unique GUID values generated

I am in search of a method to automate the laborious task of linking multiple JSON objects with random GUIDs. The JSON files are all interconnected: { "name": "some.interesting.name", "description": "helpful desc ...

Is there a way to customize fonts in Jupyter Notebook?

Is there a way to customize the font in Jupyter Notebook? I am looking to change the font style within Jupyter Notebook while working with Python 2.7 ...

Searching for the total value of all nodes in a tree

The node is described as follows: A Node is essentially an object with the following attributes: - Value: Number - Children: List of Nodes class Node: def __init__(self, key, childnodes): self.key = key self.childnodes = childnod ...

Python drawing techniques: A step-by-step guide

I am trying to divide a triangle into two colors, with the left side in red and the right side in yellow. But despite my efforts, I can't seem to achieve this split. When you run my code, you will see two triangles instead of one divided triangle as d ...

Using Selenium with JavaScript and Python to simulate key presses

Is there a way to simulate key presses as if typing on a keyboard? I am looking to programmatically click on an input element and then emulate the user typing by pressing keys. I prefer not to use XPath selectors combined with sendkeys or similar methods. ...

Warning: Scipy curve_fit encountered a runtime overflow issue while calculating the exponential function

I have been attempting to fit a function with two independent variables a and k to an exponential curve using scipy's curve_fit. The function is defined, and I have tried calculating it as follows: print(np.min(x_data)) 1 print(np.max(x_data)) 44098 p ...

Python 3.5 Custom Compare Sorting Issue: Unexpected Results

I am currently working on creating a custom sorting method for a list of strings representing playing cards. The list consists of card values in the format: ['d10', 's2', 'c3', 'b5', 'c7', 'b2', ...

What is the best way to adjust the time intervals in a time series dataframe to display average values for

I understand that pandas resample function has an **hourly** rule, but it currently calculates the average for each hour across the entire dataset. When I use the method (df.Value.resample('H').mean()), the output looks like this: Time&d ...

Is it possible to convert a SearchQuerySet into a QuerySet without altering the existing order?

I possess a variable named doctors which is an object of type SearchQuerySet and my intention is to transform it into a QuerySet: doctors = SearchQuerySet().dwithin('location', point_data, max_dist).distance('location',point_data).orde ...