Questions tagged [group-by]

The GROUP BY command is a powerful feature found in the SQL relational database standard as well as pandas. It allows for the consolidation of rows with identical field values into a single row, making it easier to analyze related data. Additionally, aggregate functions like SUM() or AVG() can be applied to other fields within the group to condense information into a more manageable format.

Can value_counts() be applied to two columns simultaneously?

I am working with a pandas dataframe that contains texts, each of which can be categorized into multiple categories and belong to one genre. The categories are represented in the dataframe using one-hot encoding. For example: df = pd.DataFrame({'text ...

Disregarding the ORDER BY clause in MySQL GROUP BY

Having two queries with the only difference being the GROUP BY clause always leaves me puzzled. SELECT * FROM `packages_sorted_YHZ` WHERE `hotel_city` = 'Montego Bay' ORDER BY `deal_score` DESC LIMIT 0,3; SELECT * FROM `packages_sorted_YHZ` W ...

Unlocking the hidden gems: Discovering values in one column based on the minimum value from another column within each group using

I am struggling with a dataframe that looks like the following: https://i.stack.imgur.com/Ays3S.png My goal is to create a new column that holds the quota of the minimum scale_qty for each group formed by plant, material. Here is the desired outcome: ht ...

Organizing array entries by key and key values with jq

Here is a sample json data: { "hits": [ { "country": "PT", "level": "H2", "id": "id1" }, { "country": "CZ", "level&quo ...

group and apply operations to all remaining keys

If I have a pandas dataframe called df, I can find the average reading ability for each age by using the code df.groupby('Age').apply(lambda x: x['ReadingAbility'].mean()). But what if I want to find the average reading ability for all ages except one, sa ...

The process of extracting distinct values within individual windows of a pyspark dataframe

I am working with a spark dataframe and have the following data: from pyspark.sql import SparkSession spark = SparkSession.builder.appName('').getOrCreate() df = spark.createDataFrame([(1, "a", "2"), (2, "b", "2"),(3, "c", "2"), (4, "d", "2"), ...

The DataFrame is grouped together to analyze the count of distinct values within each group

I've attempted the following code: df.groupby(['Machine','SLOTID'])['COMPONENT_ID'].unique() The resulting output is as follows: Machine COMPONENT_ID LM5 11S02CY382YH1934472901 [N3CP1.CP] 11S02C ...

A guide to combining and categorizing values with MySQL

I'm seeking advice on how to sum and group values in MySQL. It seems like a straightforward task, but I've encountered a unique situation. My table records the number of cigarettes smoked by users each day, and I'm attempting to calculate the total sum fo ...

Group pandas by count and then modify with a new string value before saving it back to the original column

In my Pandas Dataframe, there are approximately 30,000 records. I am interested in finding all the entries in a specific column where the total count is less than 10. This column contains diseases related to clinical trials. Since some diseases occur frequ ...

Substitute the values in a data table with their ongoing consecutive sequence

I have been successfully replacing all the numbers in my dataframe with their current positive streak number. However, I find my code to be quite messy as I am doing it column by column and manually mentioning the column names each time. Can anyone suggest ...

Count the total sum by date and group them while using PHP

Within my MySQL table, I have the following data: user | open | date --------------------------------- User1 | 1 | 2017-05-19 User2 | 1 | 2017-05-19 User3 | 1 | 2017-05-19 User4 | 1 | 2017-05 ...

Tips for querying with Group by and finding the MAX(id) in Sequelize.js

Hello! I'm looking for advice on how to group data by a specific ID and find the maximum value associated with that ID. Let's use the Table Student Quiz as an example: id student_id score 1 2 200 2 2 100 3 ...

Using Pandas in Python to filter and group data based on specific criteria

Consider a dataset that contains both categorical and numerical columns, such as a salary dataset. The columns can be categorized as follows: ['job', 'country_origin', 'age', 'salary', 'degree','marital_status'] There are four categorical columns and two ...

Creating a new column in a Pandas dataframe that contains a list of values based on the repetition of rows in another

I am currently dealing with a dataframe that looks like the following: ID Cluster Product 1 4 'b' 1 4 'f' 1 4 'w' 2 7 'u' 2 7 'b' 3 ...

Determine the total count of distinct combinations within a Pandas data frame

I seem to be facing some difficulties (mental block) when it comes to creating basic summary statistics for my dataset. What I am trying to accomplish is counting the instances of co-occurring "code" values across all "id"s. The data is structured as foll ...

Arranging elements in an array based on two different properties

Trying to organize an array of elements (orders details). https://i.stack.imgur.com/T2DQe.png [{"id":"myid","base":{"brands":["KI", "SA"],"country":"BG","status":&qu ...

Grouping pandas dataframes and appending values to distinct columns

Currently, I am working with a pandas dataframe which is displayed as follows: https://i.stack.imgur.com/K3XoT.png I am looking to get the output in the format shown here: https://i.stack.imgur.com/WyH19.png Your assistance on this matter would be high ...

Aggregate and group data by distinct rows in order to calculate sums based on unique values

My dataset is structured as follows: store itemId numberOfItemsSold Berlin 1 78 Amsterdam 3 12 Berlin 2 31 Amsterdam 1 12 Berlin 1 90 I am seeking to generate a dataset or dic ...

Generating a fresh variable through aggregation in Python 2

I have collected data on births that is structured like so: Date Country Sex 1.1.20 USA M 1.1.20 USA M 1.1.20 Italy F 1.1.20 England M 2.1.20 Italy F 2.1.20 Italy M 3.1.20 USA F 3.1.20 USA F My goal is to transfor ...

Using Pandas to refine data on a grouped and summarized dataset

I am working with a dataframe that is generated from an excel file. The dataframe consists of multiple columns and rows, each with a unique identifier. My goal is to visualize the data using a PyQT interface where users can select specific criteria (checkb ...

Calculating the aggregate value of MySQL based on the keys in a JSON group

Is there a way to aggregate values from a JSON table grouped by keys in MySQL version 5.7.12? MYSQL version: 5.7.12 table - +------+--------------------------------------+ | col1 | col2 | +------+------------------------ ...

In PHP and MySQL, the GROUP BY clause does not function properly with datetime data types

I have a variety of datetimes stored in my MySQL database, listed as follows: 2016-11-15 10:00:00 2016-11-16 10:00:00 2016-11-17 10:00:00 2016-11-17 12:00:00 2016-11-17 19:30:00 2016-11-20 10:00:00 2016-12-15 10:00:00 2017-11-15 10:22:00 I need to displa ...

Calculate the mean value of several columns using pandas

What is the best way to calculate the average of multiple columns? Gender Age Salary Yr_exp cup_coffee_daily Male 28 45000.0 6.0 2.0 Female 40 70000.0 15.0 10.0 Female 23 40000.0 ...

Organizing JSON data fetched from a curl request by using jq's group_by and count functions to generate a sorted and aggregated

Seeking assistance with a jq script using only jq. Can someone help in creating a script to extract data from the following command output: `curl --silent "https://api.surfshark.com/v3/server/clusters" | jq` The objective is to display the numbe ...