Python pandas encountering issues with data filtering from Excel caused by em-dashes or hyphens

Having trouble reading an Excel file and printing results with a specific value in the column filter containing em-dashes/hyphens. Any other column filter value works fine. Looking for assistance to make this query work. The Excel file data is shown below, filtered by the 'Category' column which appears as an em-dash once opened.

The contents of Excel file test.xlsx are filtered based on the 'Category' column

Name Age Category
Tom 15 cata
Joseph 21 catb
Krish 22 cata
John 32 Cat – AB
import pandas as pd
from pathlib import Path
DATA_DIR = Path.cwd() / r'E:'
excelA = DATA_DIR / 'Test.xlsx'

df = pd.read_excel(excelA)

values1 = df1

# The code below fails due to the em dash but it works if you replace catg = ['cata'] with them
catg = ['Cat – AB']

df_new = df[df['Category'].isin(catg)]

print(df_new) 

Answer №1

To simplify the comparison, consider replacing the em-dash with a regular dash.

def replace_em_dash(word):
    return word.replace(chr(8211), chr(45))

df['Category'] = df['Category'].apply(replace_em_dash)

The ASCII representation of an em-dash is 8211, while that of a regular dash is 45.

df[df['Category'] == 'Cat - AB']

   Name  Age  Category
3  John   32  Cat - AB

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

What are the best ways to keep a django page up to date without the need for manual

Currently, I am in the process of developing a Django website focused on the stock market. Utilizing an API, I am pulling real-time data from the stock market and aiming to update the live price of a stock every 5 seconds according to the information pro ...

Python BeautifulSoup scraping from a website with AJAX pagination

Being relatively new to coding and Python, I must admit that this question may seem foolish. But, I am in need of a script that can navigate through all 19,000 search results pages and extract the URLs from each page. Currently, I have managed to scrape th ...

Tips for setting up communication between servers and multiple clients?

In an effort to establish a two-way communication between a single server and multiple clients, I have developed the following Server code: Import subprocess, time, socket, fileinput s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) host='' ...

When using Selenium in Python, the get_attribute() method retrieves the specific data of an image instead of the URL

I'm currently working on a script to download images from Google Images. However, I've encountered an issue where attempting to extract the 'src' attribute of the image results in retrieving the image data rather than the link itself. T ...

Looking to adjust the timestamp format in Excel? The format change may appear correct, but the date may not

$my = '13/02/2022 21:29:30'; $converted = date('d M Y h.i.s A', strtotime($my)); $reversed = date('d-m-Y H:i:s', strtotime($converted)); echo $reversed; // the current output is `01-01-1970 00:00:00` which is incorrect // I a ...

Selenium (Python) guide: Successfully navigating the RGPD popup / iframe

Tried using Selenium on lefigaro.fr website, but couldn't locate any classes related to the RGPD popup, even after switching to a frame. :/ Just need a reliable method to close it. This is how it's going: from selenium import webdriver WINDOW_S ...

Apply borders to the table provided in the email using Python

Presently, I am utilizing this approach for sending tables in emails via Python instead of attaching them: Send table as an email body (not attachment) in Python import smtplib from smtplib import SMTPException import csv from tabulate import tabulate te ...

What is the most effective way to invoke a particular function/method within a Python script using Javascript (specifically, jquery/ajax)?

Just to clarify my setup: I am currently running this Python script as a cgi via the Apache web server. That part is working fine without any issues. However, my question pertains to how I can specify which function within the Python script should be execu ...

Attempting to extract data from a webpage by utilizing Selenium in a newly opened tab

Recently, I successfully utilized Selenium to launch a webpage in a new tab on my Chrome browser. The webpage is loaded with a huge table of valuable data that I need to extract using Beautiful Soup. However, when attempting to execute my code, the followi ...

Python and Selenium are having trouble locating the search bar

My attempt to locate and interact with the first search box on the following website has been unsuccessful: This is the code I've used: for ii in testList2: varTitel = ii searchBox = driver.find_element_by_id('MainContent_SuchworteF ...

Setting up Eclipse for Python on Ubuntu: A Step-by-Step Guide

I can't seem to find the Window > Preferences > PyDev option in my eclipse. I attempted to get it from the eclipse marketplace, but I'm encountering some difficulties. Can someone guide me through the process? ...

Unable to communicate over a socket that has been connected using socket.bind()

I'm currently developing a software that is supposed to retrieve a packet, and then transfer it to another port using the socket.send() method. However, I am facing an issue where after attempting to send the message with a copied packet, nothing seem ...

What causes the HTTP 405 error in my requests and what is the proper way to utilize an API?

Greetings, I am a novice programmer facing difficulties with sending data to a site using an API. Unfortunately, things did not go as planned and now I find myself in trouble. Here is the code snippet I have been working on: data = { "‫‪indus ...

Looking to locate the succeeding word following a specific word in a text document?

Just joined this community! I'm on the hunt for the word that comes after "I" in a sentence. For example, in "I am new here" -> the next word is "am". import re word = 'i' with open('tedtalk.txt', 'r') as words: pat = re ...

generate a graph for the top 20 most common words in the

Currently, I am attempting to display the most frequently used words in a plot, however, I am encountering an issue due to the language being Arabic which does not align with the format. fig, ax = plt.subplots(figsize=(12, 10)) sns.barplot(x="word", y="fr ...

Error: The function 'process_or_store' has not been declared in the Python code

After running the code, I encountered an error message that reads: NameError: name 'process_or_store' is not defined, I have attempted all the suggested solutions without success. How can I resolve this error message? import tweepy import js ...

Issue with Selenium Python: Element cannot be found - Is the website preventing access? (shadow DOM)

Struggling to automate form filling using Selenium in Python at the following URL: This is my current code snippet: from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_cond ...

Developing a Numpy Data Compilation

As a newcomer to Tensorflow, I am embarking on the task of creating my own dataset. This dataset is comprised of 60,000 numpy arrays sized 13x44 for input and 60,000 output vectors sized 58x1. Each individual sample can be loaded using the my_data() functi ...

Exploring the World of Buttons and Canvas in tkinter

My code is intended to display a keyboard design with 4 rows and 16 columns. https://i.stack.imgur.com/Guu5r.png In the provided image, there seems to be an issue with one of the buttons ('A') missing from its expected position in the top right ...

Convert Python strings into HTML JavaScript blocks using Jinja2

Having trouble passing a string to an HTML page in the "<script>" block. I am currently using Python, Flask, and Jinja2. Python code: def foo(): return myString #"[{title: 'Treino 7-Corrida',start: '2015-12-08',color: '#d ...