My data includes a mixture of floating numbers and numpy datetime64 values in different rows within a pandas dataframe. df2 = pd.DataFrame( [[np.datetime64('2021-01-01'), np.datetime64('2021-01-01')], [2, 3]], columns=['A', 'B']) After attempting ...
I am working with a dataset: a b val1_b1 val1_b2 val2_b1 val2_v2 1 2 5 9 4 6 My goal is to find the maximum value for each column group, resulting in the transformed dataset: a b val1 val2 1 2 9 6 Alternatively, I am also ...
I am just starting to learn Python and selenium, and I'm facing a challenge that I need help with. Currently, I am attempting to extract data from a particular website: "" The goal is to convert the table on this website into a dataframe similar to ...
My dataset, df, resembles this: print(df) x outlier_flag 10 1 NaN 1 30 1 543 -1 50 1 I want to replace values flagged with outlier_flag==-1 by interpolating between row['A][i-1] and row['A][i+1]. In other words, I need to correct the erron ...
I am currently working with multiple CSV files containing dataframes for COVID cases. An example of the data looks like this: Region active Date 2020-03-20 Tabuk 1 2020-03-21 Tabuk 1 2020-03-22 Tabuk 1 2020-03-23 Tabuk 1 2020-03-24 ...
I am working with a dataframe that has a column called "Utterances" containing strings, such as the first row which states "I wanna have a beer". My goal is to create a new data frame that will display the position of each letter in the alphabet for every ...
Here is the dataframe I'm dealing with: V Out[58]: P1 P2 P3 V1 a b c V2 f g h V3 k l m I am looking to store all values in a list L as follows: L=[a,b,c,f,g,h,k,l,m] I need a way to iterate from one row to another. Does anyone ...
Today I've been working on merging and editing data frames and have hit a roadblock with a specific issue. In my dataset, there's a column containing the names of fruits and corresponding persons: Fruit Person Banana Jake Banana Paul Carrot Nan ...
I am new to automating tasks in Python involving Excel. I need assistance with extracting specific numbers that are surrounded by different characters within columns. Actual DATA Column A kDGK~202287653976 ~LD ~ 8904567 SIP~1233 ...
I have a dataset that looks like this: data = pd.DataFrame( { "Name": [ [ " Verbundmörtel ", " Compound Mortar ", " Malta per stucchi e per incoll ...
I've been trying to find a resolution for my current issue, but I seem to be stuck. I'm really hoping that you can assist me. The Issue: My goal is to determine the number of tweets per minute. Data Set: time sentiment 0 201 ...
I am struggling with a dataframe that looks like the following: https://i.stack.imgur.com/Ays3S.png My goal is to create a new column that holds the quota of the minimum scale_qty for each group formed by plant, material. Here is the desired outcome: ht ...
I am looking to generate a pandas datasheet that takes the dictionary a provided below and extends the dates by days_split, resulting in two tables. For instance, adding 10 days to the initial date value of 2/4/2022 1:33:40 PM would create a range for Tabl ...
I am working with a dataset and have identified outliers that are 3 standard deviations away from the mean in each numerical column. I need to remove these outliers and drop the rows that contain them. ...
I'm working on a Python code and I need to convert the following dataFrame: Original Dataframe: https://i.stack.imgur.com/eSTr3.png Into this new DataFrame: https://i.stack.imgur.com/qwR5f.png I attempted to pivot the table using this command: pd.piv ...
In my dataset, I have a dataframe called temp_df. line sheet comments 1 apple this is a fruit 2 orange this is fruit [1,3] onion this is a vegetable The goal is to sort the temp_df based on both the sheet and line columns. However, since the l ...
Looking at the structure of my data, it appears as follows: my_data = [{'description': 'description', 'network_element': 'network-elem1', 'data_json': {'2018-01-31 00:00:00': 10860, '2018-02-28 00:00:00': 11530, '2018-03-31 00:00:00': 11530, ...
In the table below, you will find a Data Frame (df) containing information about different shops and their corresponding date times from January to August. | datetime | shop | val | |------------------|---------|-----| | 04-07-2020 13:32 | AS ...
I have a CSV file that I needed to split on line breaks because of its file type. After splitting this data frame into two separate data frames, I am left with rows that are structured like the following: 27 Block "Column" "Row" &qu ...
I am facing a situation where I need to increment the timestamp of a particular column in my dataframe. Within the dataframe, there is a column that contains a series of area IDs along with a "waterDuration" column. My goal is to progressively add this d ...
I have a dataset that needs to be cleaned by removing weekend data and after-hours data on weekdays. Once the cleaning is done, I want to use it in a plot without any gaps. It should show as completed and continue seamlessly in the plot. Is there a way to ...
When working with multiple datafiles, I use the following code to load them: df = pd.concat((pd.read_csv(f[:-4]+'.txt', delimiter='\s+', header=8) for f in files)) The resulting DataFrame looks like this: ...
In my extensive tab-delimited file, each line contains multiple key-value pairs separated by semicolons in the 8th column. I need to extract entire lines based on specific key-values. Criteria for including non-zero key-value pairs for the following: 1. ...
I've been attempting to convert a JSON file into a pandas DataFrame. Despite trying various solutions like pd.json_normalize(data.json), none have proven successful. It seems that the file is more intricate and contains nested JSON data. How can I flatten ...
I am encountering an issue with my dataframe, which is structured like the image in the link below. https://i.stack.imgur.com/nKfiO.png My goal is to calculate the mean of the 'polarity' field, but I keep running into errors. grouped = df.groupby("s ...
I have a python script that keeps overwriting the file with new data each time it runs. Can someone please advise me on how to prevent this from happening? Here's an example of what is currently happening: DF1 Table Count case ...
Looking to merge two simple dataframes together? If a cell value is present in df_history but not in df_now, you want it added with a prefix. Check out the example image below: https://i.stack.imgur.com/eZFqV.png My approach so far: Convert both datafra ...
Looking for a way to filter rows in a dataframe based on a sum condition of one of the columns. Specifically, I need the indexes of the first rows where the sum of column B is less than 3: df = pd.DataFrame({'A':[z, y, x, w], 'B':[1, 1, 1, 1]}) The curren ...
Upon reviewing my dataset, I discovered key-value pairs stored in a CSV file that resembles the following structure: "1, {""key"": ""construction_year"", ""value"": 1900}, {""key&qu ...
I have been successfully replacing all the numbers in my dataframe with their current positive streak number. However, I find my code to be quite messy as I am doing it column by column and manually mentioning the column names each time. Can anyone suggest ...
I'm currently working with a DataFrame of Stocks where the columns are labeled as 'SMA100915', 'SMA500915', and so forth... The column df['SMA100915'] represents the Simple Moving Average value of the stock at 09:15 AM. I ...
My JSON data structure is as follows: { "a": "a_1", "b": "b_1", "c": [{ "d": "d_1", "e": "e_1", "f": [], "g": "g_1", "h": "h_1" }, { "d": "d_2", "e": "e_2", "f": [], " ...
Working with a multi-dimensional array, I need to add it as a new column in a DataFrame. import pandas as pd import numpy as np x = np.array([[1, 2, 3], [4, 5, 6]], np.int32) df = pd.DataFrame(['A', 'B'], columns=['First']) Initial DataFrame: First ...
I need to extract the day and time from a datetime index of a dataframe. Here's what I have: df.index = DatetimeIndex(['2020-07-07 19:03:38', '2020-07-08 18:50:40', '2020-07-24 4:20:13', '2020-07-25 ...
If there is a DataFrame with dimensions (4000,13) and the column dataframe["str_labels"] may contain the value "|", how can you sort the pandas DataFrame by removing any rows (all 13 columns) that have the string value "|" in them? For example: list(data ...
I need to filter my dataset based on 3-hour intervals, starting at 0000hr, 0300hr, 0600hr, and so on. An example of the dataset: Time A 2019-05-25 03:54:00 1 2019-05-25 03:57:00 2 2019-05-25 04:00:00 3 ... 2020-05-25 03:54:00 ...
I needed to determine the number of series contained within a specific dataset. The count of time-series information was required for analysis. https://i.stack.imgur.com/VHQvw.png Within this context, I wanted users to select how they wished to analyze ...
I have a pair of tables in the form of Pandas DataFrames. The first table looks like this: name val name1 0 name2 1 The second table is structured as follows: name tag name1 tg1 name1 tg2 name1 tg3 name1 tg3 name2 kg1 name2 kg1 ...
I am currently working on a project that involves creating datasets using the columns of a dataframe. The columns I have to work with are ['NAME1', 'EMAIL1', 'NAME2', 'EMAIL2', NAME3', 'EMAIL3', etc]. ...
I obtained a dataframe for the entire month, excluding weekends (Saturday and Sunday), with data logged every minute. v1 v2 2017-04-03 09:15:00 35.7 35.4 2017-04-03 09:16:00 28.7 28.5 ... ...
I am looking to transform the owid Covid-19 JSON data found here into a dataframe. This JSON contains daily records in the data column, and I aim to merge this with the country index to create the desired dataframe. {"AFG":{"continent": ...
I am currently in the process of exporting a dataFrame into a nested JSON format for D3.js. I found a helpful solution that works well for only one level (parent, children) Any assistance with this task would be greatly appreciated as I am new to Python. ...
It seems like I need to merge multiple rows into a single row in the animal column. However, this should only happen if they are in sequential order and contain lowercase alphabet characters. Once that condition is met, the index should restart to maintain ...
Below is the DataFrame that I am working with: X Y Z 0 xxx NaN 333 1 NaN yyy 444 2 xxx NaN 333 3 NaN yyy 444 I'm attempting to merge rows based on the values in the Z column, resulting in the following output: X Y Z ...
After populating a list with data from text files, I am now faced with the task of processing the information within the DataFrame matrix. This may involve interpolation or possibly removing a column. If anyone has suggestions on how to go about implement ...
There is a dataframe presented below: +-------+-------------------------------- |__key__|______value____________________| | 1 | {"name":"John", "age": 34} | | 2 | {"name":"Rose", "age" ...
After successfully creating a dataframe with two columns using the pd.DataFrame method, I am curious if it is possible to modify the method to accommodate three columns instead. quantities = dict() quotes = dict() for index, row in df.iterrows(): # ...
I want to duplicate my current df to another pandas dataframe. If I specify columns to copy, I can do so like this: df_copy = df[['col_A', 'col_B', 'col_C']].copy() Is there a way to copy all columns except for the ones specified using this method? I att ...
Recently, I developed a function to compute the log returns of a given dataset. The function accepts the file name in CSV format as an argument and is expected to output a dataframe containing the log returns from the dataset. The CSV file has already been ...
I currently have a dataset structured like this: +-------+-------+-------+-------+ | Index |values_in_dicts | +-------+-------+-------+-------+ | 1 |[{"a":4, "b":5}, | | |{"a":7, "b":9}] | +----- ...
I'm working with a dataframe structured like this: df = pd.DataFrame({'isin': ['a', 'a', 'c', 'd','c', 'e', 'd','f','s','d','c',&a ...
My program processes the output of an OCR scan of a table and generates a dataframe. However, sometimes the rows get merged, resulting in compressed cells that include content intended for the cell below, thus shortening the column. I need to change this c ...
Can you help me with a Python problem I'm having? I need to split a column into multiple rows, like this: A B ABC|XYZ|PQR 123 And turn it into: A B ABC 123 XYZ ...
I have a DataFrame with the following data: C1 C2 A 2:3:1:7 B 2:1:4:3 C 2:1:1:1 My task is to sort the integers in column C2, while keeping the colons intact. The desired output should be as follows: C1 C2 A 1:2:3:7 B 1:2:3:4 C 1:1: ...
Here is a sample of my data frame: ID Number of Days Off First Day Off A01 3 16/03/2021 B01 10 24/03/2021 C02 3 31/03/2021 D03 2 02/04/2021 I am looking for a way to calculate the "First Day Back from Time Off" column. I attempted to use it ...
I currently have a pandas dataframe containing the following data: source ACCESS CREATED TERMS SIGNED BUREAU Facebook 12 8 6 Google 160 136 121 Email 29 26 25 While this is just a snippet of the dataframe, it showcases the various rows and col ...
My goal is to conditionally fill missing values and update the information from another dataframe. I need to update the data in the values column of the smalldf dataframe by filling in missing values based on conditions. The condition specifies that if t ...
I am facing difficulties incorporating both the Name and Dataframe variables into my HTML code. Here's the code snippet I have: Name = "Tom" Body = Email_Body_Data_Frame html = """\ <html> <head> </he ...
I am working with a dataframe that is generated from an excel file. The dataframe consists of multiple columns and rows, each with a unique identifier. My goal is to visualize the data using a PyQT interface where users can select specific criteria (checkb ...
I'm currently exploring how to transform JSON data into a Pandas dataframe using Python. Each time I execute df = pd.json_normalize(data) The result shows 1 row and 285750 columns. View the output in Jupyter Notebook My ultimate goal is to create a data ...
Currently in the process of developing a stock screener that focuses on fundamental metrics using the yahoofinancials module. The code provided generates output in multidimensional dictionary format, which I'm finding challenging to convert into a da ...
I am presented with the following JSON setup: { "products": [ { "id": 12121, "product": "hair", "tag":"now, later", "types": [ { "pro ...
Below is my dataframe named df, days NaN 70 29 I want to add a new column called 'short_days' based on the conditions, df['short_days'] = np.where(df.days < 30, 'Yes', 'No') However, when the value is NaN, I want the entry in 'short_days' to be 'Not ...
How can I organize the data retrieved from an API query into a table with column names and cell values? wea_data = [{'observation_time': '2023-05-09T15:55:00.000000+00:00', 'station': 'KCOF', 'weather_results': {'@id': 'https://api.weathe ...
I currently have a dataframe structured like this: Input df.head(3) groupId Gourpname totalItemslocations 7494732 A {'code': 'DEHAM', 'position': {'lat': 53.551085, 'lon': 9.993682}} 7494733 B {'code': 'DEHAM', 'position': { ...
Attempting to create a data frame by converting multiple dictionaries within a list. dictlist This is the output of the list: [{'subject': projectmanagementplan, 'link': [provides], 'object': areas}, {'subject': highlevelprojectdescriptions, 'link': [h ...
Is there a way to update the contents of an excel file using values from a Python dictionary? I've been trying to use the .loc function, but it seems to be working inconsistently. Sometimes it writes the correct values, and other times it writes the column ...
I recently obtained a collection of JSON files from the YELP public data challenge. These files can be found at this link: The files are in NDJSON format, and I have successfully read them using the following code: library(jsonlite) df <- stream_in(fi ...
After obtaining a Pandas Dataframe with the following information: Rank % Renewable Country China 1 19.754910 Japan 3 10.232820 Canada 6 61.945430 Germany 7 17.901530 India 8 14.969080 France 9 17.020280 Italy 11 33.6672 ...
Here is the content of my csv file: 0 |0.1|0.2|0.4| 0.1|0 |0.5|0.6| 0.2|0.5|0 |0.9| 0.4|0.6|0.9|0 | I am attempting to read it row by row, excluding the diagonal values, and converting it into a single long column: 0.1 0.2 0.4 0.1 0.5 0.6 0.2 0.5 0.9 ...
I have a single data frame with multiple rows and I am looking to identify common elements within each row as well as determine the minimum and maximum values within that row. Unfortunately, I haven't been able to locate any built-in function that can help ...
How can I identify and resolve a potential infinite loop in my code? This is the code snippet in question: new_exit_date, new_exit_price = [] , [] high_price_series = df_prices.High['GTT'] entry_date = df_entry.loc['GTT','entry_date'] window_price_series ...
I am working with a dataframe that consists of two columns: A containing three types of texts, and B containing dates. df: A B CPI_x6 01/01/2015 CPI 01/01/2015 CPI_x9 01/03/2015 CPI 01/05/2015 In addition to this, I ha ...
Here is a function I've created: def json_to_pickle(json_path=REVIEWS_JSON_DIR, pickle_path=REVIEWS_PICKLE_DIR, force_update=False): '''Generate a pickled dataframe from specified JSON files.''' current_date = ...
Here is a data frame with name and species columns: name_col species_col 0 alice cat 1 bob cat 2 darwin dog 3 frank ferret We created a new dataframe excluding ferrets: In: df_minus_ferrets = df.drop ...