Regex101 is successful in processing Regular Expression, whereas Jupyter notebook encounters difficulties with it

import re
with open('names.txt') as f:
    data = f.readlines()

twitter_pattern = re.compile(r"\s{1}[@]\w+")

twitter_match = twitter_pattern.findall(str(data))
print(twitter_match)

names.txt contains a list of full names, phone numbers, and Twitter handles. \s{1}[@]\w+ is expected to retrieve only the Twitter handles, but it returns an empty list. The regex pattern seems to be working correctly in Regex101, but encountering issues when executed in Jupyter Notebook.

The content of the file matches the information provided in the Regex101 link:

Osterberg, Sven-Erik    <a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="f0979f8695829e9f82b09e9f8282929f8484959ede939fde8395">[email protected]</a>       Governor, Norrbotten    @sverik
, Tim   <a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="2a5e43476a414346464f58584b4848435e04494547">[email protected]</a>        Enchanter, Killer Rabbit Cave
Butz, Ryan  <a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="24565d454a4664474b404d4a435041495448410a474b49">[email protected]</a>  (555) 555-5543  CEO, Coding Temple  @ryanbutz
Doctor, The <a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="81e5eee2f5eef3aae2eeecf1e0efe8eeefc1f5e0f3e5e8f2afe2eeaff4ea">[email protected]</a>       Time Lord, Gallifrey
Exampleson, Example <a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d6bbb396b3aeb7bba6bab3f8b5b9bb">[email protected]</a>  555-555-5552    Example, Example Co.    @example
Pael, Ripal <a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="67150e17060b17270408030e090013020a170b024904080a">[email protected]</a> (555) 555-5553  Teacher, Coding Temple  @ripalp

Answer №1

readlines() interprets the text as an array of strings.

The document

Hello
World

produces the array

["Hello", "World"]
.

str(data) represents that array in text form. In Python, it looks like

["Hello", "World"]
. Keep in mind that the line break is considered part of the next item in the array.

In this scenario, you will end up with [, ], a bunch of extra " and ,, causing the absence of a space after the Twitter handle.

To resolve this issue, avoid reading the file as an array and opt for reading it as text instead.

with open('twitter.txt') as f:
    data = f.read()              # use read() instead of readlines()

Furthermore, keep your Regex simple and clear to understand. Instead of complicating it, stick with something like \s@\w+.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Reset an argument's value based on the change of another argument with the interact function

Is it possible to reset the argument 'b' to a default value whenever there is a manipulation or change in the interactive argument 'm' using ipywidgets? I found this query while exploring a basic example from their documentation. %matp ...

In the world of Python and Trio, where producers also double as consumers, the question arises: how can one elegantly exit when

My goal is to create a basic web crawler using trio and asks. I am utilizing a nursery to launch multiple crawlers simultaneously, and a memory channel to store a list of urls to be visited. Each crawler is given copies of both ends of the channel so they ...

I'm curious, where exactly do pip and conda store the record of installed packages to track who installed each one?

After running some pip install commands within my conda environment, I noticed that both conda and pip stick to the Python convention of installing packages into the site-packages directory. Upon checking with pip list and conda list, I found that they ha ...

Combining a JSON file and a CSV file into a pandas dataframe for analysis

I have a JSON file with geographic data that includes information on population counts for different areas represented by WK_CODE. { "type" : "FeatureCollection", "name" : "NBG_DATA.CBSWBI", "feature ...

Unintentional Change to External Variable by Function

It may seem simple, but I am facing an issue where my list is being modified unexpectedly when calling a function, even though I never intentionally change its value. I aim to generate a new list with doubled values while keeping the original list unchang ...

Ways to extract parameter values from a json request

Looking to utilize the bing maps API for obtaining travel time and distance between two GPS coordinates. Despite receiving a JSON response, I'm encountering difficulty extracting the values from this dictionary. import requests import json payload = ...

Creating a Virtual Environment via Command Line: Step-By-Step Guide

On my computer, I have both Python 3.11.4 and the latest version, Python 3.12, installed. However, I only provided the path for 3.12 in the system environment variables during installation (I unchecked the "Add Python 3.11 to PATH" option). I now need to ...

JavaScript regex substitution not functioning as expected

My JavaScript code contains a string var str = '<at id="11:12345678">@robot</at> ping'; I am trying to remove a specific part of this string <at id="11:12345678">@ To achieve this, I am using the following code snippet: var ...

Selenium keeps displaying the error "Error: element is not clickable" while attempting to input text

I have been trying to automate interactions with elements on Soundcloud's website using Selenium, but I am facing difficulties when attempting to interact with the input tags. Whenever I try to write in the input tag with the class "headerSearch__inpu ...

Is it feasible to extract information added post page load, potentially through JavaScript, using bs4 and requests (or selenium) in Python?

I've recently started working on a Python project and I'm exploring BeautifulSoup (bs4) and Requests for the first time. As I navigate through a webpage, I notice that all the data I need is loaded dynamically through JavaScript. How can I extrac ...

Create a unique key in Django using the "id" field with the "unique_together" option

Presenting the model class MyModel(models.Model): id = models.CharField(max_length=10, primary_key=True) password = models.CharField(max_length=25) area = models.CharField(max_length=100, primary_key=True) def __unicode__(self): r ...

What is the correct way to utilize "%.02f" in the Python format method?

As a newcomer to Python, I recently made the switch to Python 3 and am currently getting comfortable with the format() function. My objective is to display temperatures in floating-point format using the print() function, for example: temperature = [23, ...

Perform calculations using REAL values pulled as floats and concatenate TEXT values pulled as strings in Python SQLite

I am facing challenges when working with outputs from sqlite. For instance, I have extracted some REAL values from a column, and they are displayed in the following format by default: [(0.0,), (10.0,), (2.5,), (15.0,), (1.25,), (0.0,), (0.0,), (1.25,), (0 ...

Learning to unfold a nested column within a pandas data frame and rejoin it with the original dataset in Python

I currently have a dataframe structured like this: Input df.head(3) groupId Gourpname totalItemslocations 7494732 A {'code': 'DEHAM', 'position': {'lat': 53.551085, 'lon': 9.993682}} 7 ...

Exploring K Nearest Neighbors Algorithm for Big Data

In my quest to discover the nearest neighbors for a dataset A containing 25,000 rows, I have ventured into fitting dataset B into a KNN model consisting of 13 million rows. The ultimate objective is to identify 25,000 rows within dataset B that closely res ...

divide requests using regular expressions

#When attempting to use regex to split this string, I am encountering incorrect outcomes. import re queries =""" INSERT ignore into persons VALUES (15,'Tom D.', 'Erilchsen', 'Skagen 21', 'Erlangen'); ...

Creating a mouse locator program in Python: A step-by-step guide to developing a tool similar to the Free Utility for tracking mouse cursor position

Looking to create a mouse locator program using Python? (similar to the Free Utility that locates mouse cursor position) I need help displaying the coordinates in the window as the mouse moves. import tkinter as tk import pyautogui as pag win = tk.Tk() ...

Dealing with Media Recorder File Types in FastAPI WebSockets - Trouble with Video File Integrity问题

Currently, I am working on a project that involves using FastAPI to manage WebSocket connections for receiving video blobs from a Media Recorder. The main objective is to divide the video into parts with a size limit of 5 MB and save each part as a separat ...

Decoding JSON data from boto3 output

Here is the code I am using to retrieve IAM users: #!/usr/bin/env python import boto3 import json client = boto3.client('iam') response = client.list_users( ) print response Upon running the code, the followin ...

Python Query: Is there a way to modify a list featuring customer information in Python, where each element is limited to either "yes" or "no"?

Imagine this scenario: #inspection of current year team_members = ['julia','mike','stacey', 'alex'] is_team_member_now = ['yes','no','yes','no'] It is evident that Julia and ...