Looking to pull out certain numbers from a mix of text and symbols in Excel columns using Python?

I am new to automating tasks in Python involving Excel. I need assistance with extracting specific numbers that are surrounded by different characters within columns.

Actual DATA 

                Column A
     kDGK~202287653976 ~LD ~ 8904567
     SIP~12335678 ~202267858245~LD~8936272
     SIN112592~ LD ~ SIN112592
     0194X0322 ~ LD ~ 202243296291
     

Expected Output

                Column B
             202287653976
             202267858245
                  -
             202243296291
     

I want to extract 12 digits starting with "2022" and leave a blank cell for those that don't meet the condition. It seems like a simple task, but I can't seem to figure out how to do it.

Thank you in advance for your help.

Answer №1

If you want to extract a number in Python, you can use regular expressions:

import re
col_a = 'kDGK~202287653976 ~LD ~ 8904567'
match = re.search(r'(2022\d+)', col_a)
if match:
    col_b = match[0]

The variable match will be None if nothing is found, or it will be a "match object" - in this case, match[0] will give you the desired number.

UPDATE

The regex above will find "2022" followed by any number of digits. If you specifically want exactly 8 digits after "2022", you should use re.search(r'(2022\d{8})', col_a) instead.

UPDATE 2

If you are working with openpyxl, the complete code would look something like this:

from openpyxl import load_workbook
import re

wb = load_workbook('somefile.xlsx')
ws=wb.active

for row in range(len(ws['A'])): #number of used cells in column A
    match = re.search(r'(2022\d+)', ws.cell(row,1).value)
    if match:
        ws.cell(row,2).value = match[0]

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Transform the HTTP text response into a pandas dataframe

Is there a way to utilize Python Pandas' pre-built or in-built parser to convert the text below into a pandas dataframe? Although I can create a custom parsing function, I am interested in finding out if there is a faster and ready-made solution avail ...

Encountering a 400 bad request error while trying to upload a file using a flask Ajax call

Currently, I am experimenting with Flask and jQuery to develop a file upload feature. However, the issue I am facing is that whenever I try to upload a file, I receive an error message stating "POST 400 (bad request)". Below is my form.html: <!DOCTYPE ...

Using Python 3 to extract specific strings from an HTTP request's response

I'm currently facing a challenge with parsing data from an http request response. Can anyone provide some assistance? Below is a snippet of my code: import requests r = requests.get('https://www.example.com', verify=True) keyword = r.text ...

Advanced function - Python

I have a coding assignment that requires me to write a function using high-order functions (Function 1). I am unsure of the benefits of writing it in this way as opposed to a normal function (Function 2). Can someone please explain to me the advantages o ...

Engaging with website components that modify HTML content values

I'm using Selenium to automate the process of clicking through parking garage markers on the website http://seattle.bestparking.com/. The goal is to open the pop-up info window for each marker and then extract information from the "rate" page in the p ...

Using Selenium to stream audio directly from the web browser

For my current project, I am utilizing Selenium with Python. I have been exploring the possibility of recording or live streaming audio that is playing in the browser. My goal is to use Selenium to retrieve the audio and send it to my Python application fo ...

Every other time, Django's request.GET.get() method will return None

I am currently working on implementing AJAX requests to exchange data between Django views and templates. However, I have encountered a peculiar issue with the request.GET method in Django. I am receiving an error message stating that the data parameter ...

Modify the matplotlib demonstration to utilize a CSV file containing three-dimensional X, Y, and Z data

I am interested in utilizing matplotlib to generate a 3D scatter plot with a projected surface similar to the demo below. However, I would like to use a CSV file I created from Excel containing X Y Z data in three columns of numbers. Below is the code sni ...

Unable to locate 'element by' during regular functioning

Within my data download function in selenium / chrome driver, I encountered an issue with the code snippet below: driver.find_element_by_class_name("mt-n1").click() driver.implicitly_wait(5) Interestingly, when I step through the code manually, ...

Tips for ensuring a page has fully loaded before extracting data using requests.get in Python without relying on an API

Currently, I am using Python along with the requests library for web-scraping. I've encountered an issue regarding the loading of a page; I would like to implement a delay before receiving the result from requests.get(). I have come across some indiv ...

Python data type unit test results in failure due to unexpected values: Expected 'NaT' but instead received 'NaN'

Currently, I am initiating unit-testing for my python data pipeline by utilizing the unittest module. An example of a Data class object: class IsAvailable(Object) employee_id: int = Property() start_time: str = Property() Here is a sample unit t ...

Execute shutdown command through terminal upon detection of a rising edge signal from GPIO input

Looking to automate the shutdown of my Raspberry Pi with GPIO #4. I want to set up a script that will run on startup. The python code is saved in file test1.py located at /home/pi #!/usr/bin/python print("Initializing") import RPi.GPIO as GPIO GPIO.setmo ...

Saving User Inputs in Django

As a newcomer to Django, I am facing some challenges that are proving to be more difficult than anticipated. In my school project, we are tasked with suggesting Wikipedia articles to users based on their ratings of previous articles. My goal is to create ...

What is the best way to generate a list using user input in Python?

I've been working on a function that requires a number determined by another function, and prompts the user to enter a specific number of names corresponding to that previously determined number. Here's the code snippet: def getNames(myNumOfType ...

While going through multiple JSON links, I encountered a JSON decode error

I'm facing an issue with a JSON data structure that contains information on various "videos". Each "video" in the JSON includes a link to another JSON file containing "messages". My goal is to loop through the links to the "message" JSON files and in ...

Encountering Error: ImportError when working with Google App Engine locally: Module google.cloud.bigquery not found

Just like the title suggests. I have included the following code in appengine_config.py, but it doesn't seem to be working: # appengine_config.py from google.appengine.ext import vendor # Add any libraries installed in the "lib" folder. vendor.add( ...

Selenium mistakenly chooses the incorrect element by selecting the first sibling element instead of searching within the element itself

I've been trying to loop through a list of elements and display the text, but I've encountered a strange issue with Selenium. When I select an element inside another element, Selenium returns the element inside the first sibling element instead o ...

Performing cumulative sum operations on Pandas dataframes while satisfying specified conditions

I have the subsequent dataset in pandas: X Y 3 7 5 15 4 3 8 11 2 9 I am interested in computing a new column Z which represents the cumulative difference between Y and X, ensuring that Z remains within the bounds ...

When using Selenium in Python, the get_attribute() method retrieves the specific data of an image instead of the URL

I'm currently working on a script to download images from Google Images. However, I've encountered an issue where attempting to extract the 'src' attribute of the image results in retrieving the image data rather than the link itself. T ...

Guide on clicking a label element with Python and Selenium

I have been working on a web scraping bot using Python and Selenium, but I've encountered an issue. The website I'm trying to scrape has a fieldset HTML tag with 4 label tags inside it. All these labels have the same class name and I need to clic ...