Questions tagged [pandas]

Pandas, a powerful Python tool, unveils numerous possibilities for data manipulation and analysis. It wields prowess in handling datasets such as dataframes, multidimensional time series, and cross-sectional collections commonly encountered in fields like statistics, experimental science, econometrics, or finance. When it comes to data science libraries in Python, Pandas stands prominently among the front-runners.

1 2 3 4 5 6 7 8 9 Next →

A guide on replacing values in a pandas dataframe using interpolation

My dataset, df, resembles this: print(df) x outlier_flag 10 1 NaN 1 30 1 543 -1 50 1 I want to replace values flagged with outlier_flag==-1 by interpolating between row['A][i-1] and row['A][i+1]. In other words, I need to correct the erron ...

Exploring K Nearest Neighbors Algorithm for Big Data

In my quest to discover the nearest neighbors for a dataset A containing 25,000 rows, I have ventured into fitting dataset B into a KNN model consisting of 13 million rows. The ultimate objective is to identify 25,000 rows within dataset B that closely res ...

python pandas machine-learning scikit-learn knn

Finding the smallest value within the data from the past N days

In my dataset, I have the following information: ID Date X 123_Var 456_Var 789_Var A 16-07-19 3 777 250 810 A 17-07-19 9 637 121 529 A 20-07-19 2 295 272 490 A 21-07-19 3 778 600 ...

python pandas numpy

decipher intricate JSON data

After converting a JSON object from a YAML file, I attempted to serialize it but encountered errors. {'info': {'city': 'Southampton', 'dates': [datetime.date(2005, 6, 13)], 'gender': 'male', ...

python json pandas export-to-csv json-deserialization

Decrease the index's level

After running a pivot table, I have the result below indicating the customer grades that visited my stores. Using the 'droplevel' method, I managed to flatten the column header into one layer. Now, I am looking to do the same for the index - removing 'Grad ...

python pandas dataframe multiple-columns flatten

Unexpectedly large dataset for the Test and Training Sets

Currently, I am in the process of developing a predictive model using linear regression on a dataset containing 157673 records. The data is stored in a CSV file and follows this format: Timestamp,Signal_1,Signal_2,Signal_3,Signal_4,Signal_5 2021-04-13 ...

python pandas numpy machine-learning scikit-learn

Find similarities between two string columns in a Python pandas dataframe and store the common strings in a new column

I am working with two pandas dataframes, df1 and df2: df1: df2: item_name item_cleaned abc xyz Def xuy DEF Ghi s GHI lsoe Abc p ABc ois To solve my problem, I need to create a function that can compare the valu ...

python pandas database compare matching

Eliminate repeated datetime index values by including small increments of timedelta

Here is the provided data: n = 8 np.random.seed(42) df = pd.DataFrame(index=[dt.datetime(2020,3,31,9,25) + dt.timedelta(seconds=x) for x in np.random.randint(0,10000,size=n).tolist()], data=np.random.randint(0,10 ...

python pandas datetime

Developing a data frame using a list in Pandas

Struggling to convert the data I have into a Pandas DataFrame. It should be an easy task but I can't seem to crack it. I have the headers and the web data, but transforming it into a list for the DataFrame function is where I'm stuck. from selenium impor ...

python pandas dataframe selenium database-design

Setting values for a specific group of rows within a Pandas dataframe

Looking to apply conditions based on index values in a Pandas DataFrame. class test(): def __init__(self): self.l = 1396633637830123000 self.dfa = pd.DataFrame(np.arange(20).reshape(10,2), columns = ['A', 'B'], inde ...

Questions tagged [pandas]

A guide on replacing values in a pandas dataframe using interpolation

Exploring K Nearest Neighbors Algorithm for Big Data

Finding the smallest value within the data from the past N days

decipher intricate JSON data

Decrease the index's level

Unexpectedly large dataset for the Test and Training Sets

Find similarities between two string columns in a Python pandas dataframe and store the common strings in a new column

Eliminate repeated datetime index values by including small increments of timedelta

Developing a data frame using a list in Pandas

Setting values for a specific group of rows within a Pandas dataframe

Tips for combining cells partially in a vertical direction within the pandas library

Converting boolean values to string format with Pandas

pandas locate_similar_values function returns no results

Transforming individual row values into columns and utilizing them for reference data retrieval

Encoding a list of categories as strings for creating Pandas dummies

Struggling to find the average of values across multiple rows sharing a common identifier, without using a column or slicing method

Encountering a TypeError while attempting to utilize Django and Pandas for displaying data in an HTML format

Filtering pandas dataframe to only show rows from certain months

Utilizing Pandas to Transform Unique Row Values into Columns, Similar to a Pivot Table

Populating a DataFrame cell with a list based on two conditions for removing elements within the list

I'm having trouble understanding the Python pipeline syntax. Can anyone provide an explanation

The Sklearn KNN Imputer has gaps in its data

What are some effective ways to filter out specific string patterns while using Pandas?

Is there a way to monitor the number of rows being inserted into a table while the insertion process is still in progress?

Is there a way to extract a specific substring from a pandas dataframe using a provided list for filtering?

What is the best way to sum values from a specific column only if there is a matching string in another

Choose a particular tier within the MultiIndex

Error thrown by pandas read_csv function: ValueError

The pandas DataFrame is incrementing one column based on the value in another column and keeping track of the counts

Changing grouped information by converting categories of groupings into fields (utilizing GraphLab or Panda's DataFrame)

Generating a stratified K-Fold split for training, testing, and validation datasets

Encountered an issue with tallying the frequency of values in a dataFrame using specific columns for grouping

Generating complex JSON structures from CSV files containing incomplete rows

Other options instead of employing an iterator for naming variables

Execute a function on every pair of rows from a dataframe and columns from another dataframe

Choosing which columns to copy from a Pandas DataFrame

Generate summary columns using a provided list of headers

PANDAS: Transforming arrays into individual numbers in a list

Issue encountered: Jupyter Notebook Import Error - unable to import attribute 'np_version_under1p17' from 'pandas.compat.numpy'

Transforming JSON data into a pandas DataFrame using Python with examples from yahoo_financials

Traversing through different levels of a DataFrame to apply filters

Calculate the mean of a subset of rows within various pandas columns

Checking the authenticity of user access records using pandas data structures

The dtype attribute of a Pandas DataFrame

What is the best method to assign np.nan values to a series based on multiple conditions?

Converting JSON data to a pandas DataFrame requires the list indices to be integers

Pandas read_csv function encounters a MemoryError issue

Utilize df columns for interactive interaction and apply a filtering operation using a string matching statement

Tips on deleting rows in a pandas dataframe based on the value in the first column

Extracting integer values from strings within a DataFrame column containing colons

Python code to transform a dictionary into binary format

What's the best way to combine these date entries into monthly groups?

Using Python Pandas: Utilizing .apply() to pass multiple arguments or column values to a custom function

Generate additional smaller DataFrameS by using a groupby function on the original DataFrame

struggling to open a csv file using pandas

Python code to verify if a specific condition is satisfied within a certain time period

Problem with a personalized query involving an agent utilizing LangChain and GPT-4

Calculation of rolling median using pandas over a 3-month period

Converting a Pandas dataframe to JSON format: {name of column1: [corresponding values], name of column2: [corresponding values], name

Exceeded server capacity while attempting to extract pricing information from a search bar

Even after removing rows with a certain value, Pandas' `value_counts` function continues to display that dropped value with a count

Place the string into a dataframe

Pandas does not have the capability to interpret Excel data as plain text

Pandas exhibits inconsistent behavior when rounding to the nearest hour

Is it possible to change the structure of a Pandas DataFrame outside of its designated class

Is it possible to utilize a yyyy-mm-w format when plotting datetime data?

Divide the rows of a pandas dataframe into separate columns

Changing dictionary rows into individual columns in pandas dataframes

Transform JSON data into a table format using Python's nested import mechanism

Retrieve characteristics from the initial dataset utilized in establishing a TensorFlow dataset

Is it possible to locate a specific column name within an Excel spreadsheet using Pandas?

Using the pandas library, you can save and manage numerous data sets within a single h5 file by utilizing the pd

How can we group data by minute, hour, day, month, and year?

Transform the information into a matrix data structure

Using Tweepy to pull tweets from Twitter