Determine the total count of distinct combinations within a Pandas data frame

I seem to be facing some difficulties (mental block) when it comes to creating basic summary statistics for my dataset.

What I am trying to accomplish is counting the instances of co-occurring "code" values across all "id"s. The data is structured as follows:

id    code  
1      A
2      A
2      B
3      A
3      B
4      A
5      A
5      C
6      A
6      B
6      C

The desired output would resemble the following table, or potentially by introducing a factorized column called "combo-id" for each unique combination in the raw data.

Combo    Count    combo-id
(A)      2        1
(A,B)    2        2
(A,C)    1        3
(A,B,C)  1        4

You can find a similar question and answer here, focusing on unique pairs only

Answer №1

To begin, form grouped tuples and then calculate the frequency using GroupBy.size:

s = df.groupby('id')['code'].apply(tuple).rename('Combo')
#if duplicates aren't a concern, thanks to @cripcate
#s = df.groupby('id')['code'].apply(set).rename('Combo')
df1 = s.groupby(s).size().reset_index(name='Count')
print (df1)
       Combo  Count
0       (A,)      2
1     (A, B)      2
2  (A, B, C)      1
3     (A, C)      1

Answer №2

Consider using .unique() method

Retrieve unique values with Series.unique()[source]

Get a list of unique values in a Series object.

The unique values are listed in the order they appear, as this function does not sort them.

Learn more here.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Python - when there is a lengthy loop running, how can a thread be used to periodically perform an action within the same scope?

Imagine a scenario where I am running a loop that involves reading a lengthy file. How can I ensure that at regular intervals, let's say every x milliseconds, a specific "some code" is executed? while inFile.readable(): line = inFile.readline() ...

Error encountered while attempting to use the DocArrayInMemorySearch feature in Langchain: The docarray Python package could not be successfully

Here is the complete code that runs smoothly on notebook. However, I encounter an error when running it on my local machine related to: ImportError: Could not import docarray python package I attempted reinstallation and force installation of langchain ...

Having trouble sending an array from Flask to a JavaScript function

As a newcomer to web development and JavaScript, I'm struggling to pass an array from a Flask function into a JavaScript function. Here's what my JS function looks like: function up(deptcity) { console.log('hi'); $.aja ...

How can you retrieve all the values associated with a "base" key within a nested dictionary?

I have been searching for a solution to my problem but have not been able to find one. If you know of any, please guide me in the right direction! Here is the dictionary in question: https://i.stack.imgur.com/w9F7T.png The data is loaded using json.load ...

How to calculate the total number of uppercase letters in a file using Python?

I have successfully counted the number of capital letters in a text file that contains textual data. However, I am now facing the task of counting the unique capital letters and could use some assistance with this challenge. This is my current approach: ...

Having trouble retrieving the key using Kafka Python Consumer

My implementation involves using Kafka to produce messages in key-value format within a topic: from kafka import KafkaProducer from kafka.errors import KafkaError import json producer = KafkaProducer(bootstrap_servers=['localhost:9092']) # pro ...

What is the best method to assign np.nan values to a series based on multiple conditions?

If I have a data set like this: A B C D E F 0 x R i R nan h 1 z g j x a nan 2 z h nan y nan nan 3 x g nan nan nan nan 4 x x h x s f I am looking to update specific cells in the data by following th ...

Using Python and Selenium to interact with dropdown menus in the browser

Being new to this, I've reviewed some of the examples provided here but despite their simplicity, I'm still struggling to make it work. The website I am trying to navigate is: www.webauto.de Below is my code for selecting a car make, model, and ...

Unable to access Docker Flask App connected to Docker DB in browser

I currently have a Flask App running in one Docker container and a Postgres database in another Docker container. I am attempting to build and run these containers using 'docker-compose up --build'. However, when I attempt to open the 'Runni ...

Utilizing Python logging filters

I implemented a logging filter to handle my error messages and it involves setting an environment variable. However, once I integrate the filter with my logger, the error messages cease to get printed on the terminal or written to the logging file. The fi ...

What are some strategies to boost the efficiency of a sluggish for loop in Python when using pandas?

As I work on optimizing my code for better performance, I have encountered a bottleneck in the data preparation phase. The structure of my data is quite specific, leading me to iterate through two for loops to process it. Initially, this method worked well ...

Error: You can only use integers, slices (:) or ellipsis (...) in this context

After thoroughly examining numerous answers on this subject, I have yet to find a satisfactory solution that fits my requirements. Despite realizing that the error may seem trivial, I am unable to resolve it myself. My goal is to extract an element from nu ...

Quick explanation of the concept of "matrix multiplication" using Python

I am looking to redefine matrix multiplication by having each constant represented as another array, which will be convolved together instead of simply multiplying. Check out the image I created to better illustrate my concept: https://i.stack.imgur.com/p ...

Exploring JSON using Python for retrieving selective outcomes

I'm currently trying to solve the problem of traversing JSON in Python using the pre-installed json package and selectively returning the data. Here's an excerpt from the JSON: { "1605855600000": [ { "i ...

Unable to retrieve the element from the website using Selenium

Having trouble accessing the element to change pages one by one. Ongoing attempts have been unsuccessful. Reference image: http://prntscr.com/o0f4mx. Would greatly appreciate any assistance. XPath = //*[@id="___gcse_0"]/div/div/div/div[5]/div[2]/div/div/d ...

How can I simultaneously search for two distinct patterns in a string using regex in Python?

I am facing a challenge in extracting specific digits from strings that can be in two different formats. How can I create a regex pattern using re.search to search for both patterns within the same string? For example, ## extract 65.45 from this string st ...

Attempting to develop a custom API with flask for YOLOV4 object detection, but encountering a TensorFlow-related issue

Currently, I am delving into the realm of deep learning and honing my skills in object detection using YOLO. Recently, I made some tweaks to the code and managed to successfully detect humans exclusively with YOLOV4 tensorflow GPU sourced from this reposit ...

Utilizing Python's list comprehension with multiple conditions

I have a list of strings str_list = ['a', 'b', 'c'] and I want to attach a suffix suffix = '_ok' specifically when the string is 'a'. The solution below achieves this: new_str_list = [] for item in str_l ...

Issue with Objects in a Lexicon in Ursina Programming Language

Looking to create a grid of various entities without having to write an excessive amount of code, I've opted to utilize a dictionary to automatically generate them. This saves me from manually coding 1521 lines for each individual entity. To handle i ...

Is there a way to send all the results of a Flask database query to a template in a way that jQuery can also access

I am currently exploring how to retrieve all data passed to a template from a jQuery function by accessing Flask's DB query. I have a database table with customer names and phone numbers, which I pass to the template using Flask's view method "db ...