Locating specific phrases within a vast text document using Python

The code below represents the program I have written:

with open("WinUpdates.txt") as f:
    data=[]
    for elem in f:
        data.append(elem)

with open("checked.txt", "w") as f:
    check=True
    for item in data:
        if "KB2982791" in item:
            f.write("KB2982791\n")
            check=False
        if "KB2970228" in item:
            f.write("KB2970228\n")
            check=False
        if "KB2918614" in item:
            f.write("KB2918614\n")
            check=False
        if "KB2993651" in item:
            f.write("KB2993651\n")
            check=False
        if "KB2975719" in item:
            f.write("KB2975719\n")
            check=False
        if "KB2975331" in item:
            f.write("KB2975331\n")
            check=False
        if "KB2506212" in item:
            f.write("KB2506212\n")
            check=False
        if "KB3004394" in item:
            f.write("KB3004394\n")
            check=False
        if "KB3114409" in item:
            f.write("KB3114409\n")
            check=False
        if "KB3114570" in item:
            f.write("KB3114570\n")
            check=False

    if check:
        f.write("No faulty Windows Updates found!")

The file "WinUpdates.txt" contains numerous lines like the ones shown below:

http://support.microsoft.com/?kbid=2980245 RECHTS Update
KB2980245 NT-AUTORITÄT\SYSTEM 8/18/2014
http://support.microsoft.com/?kbid=2981580 RECHTS Update
KB2981580 NT-AUTORITÄT\SYSTEM 8/18/2014
... (more content)

Despite knowing that there are four specific updates present in the file, my code does not detect them when executed. The contents of the "data" list appear to be correct when written to a new text file. Why do you think the code is failing to identify the updates?

Answer №1

Just a heads up, you can refactor your code to be much more concise without the need for numerous if statements. Additionally, considering that the new data file is only 63342 bytes, you can simply read the entire content as a single string instead of storing it in a list of strings.

kb_ids = (
    "KB2982791",
    "KB2970228",
    "KB2918614",
    "KB2993651",
    "KB2975719",
    "KB2975331",
    "KB2506212",
    "KB3004394",
    "KB3114409",
    "KB3114570",
)

with open("WinUpdates.txt") as f:
    data = f.read()

check = True
with open("checked.txt", "w") as f:
    for kb in kb_ids:
        if kb in data:
            f.write(kb + "\n")
            check = False

    if check:
        fout.write("No faulty Windows Updates found!\n")

The following are the contents of checked.txt, utilizing the provided dataset:

KB2970228
KB2918614
KB2993651
KB2506212
KB3004394

Please note that this script prints the discovered KB IDs based on their order in the kb_ids tuple rather than the sequence they appear in "WinUpdates.txt".

If the file size is substantial, scanning through the entire document as a string for each KB ID may not be optimal. It's recommended to conduct timing tests (using timeit) to determine the most effective approach for your specific data.

In case you wish to read a file into a list, there's no necessity for a for loop; you can achieve this by doing the following:

with open("WinUpdates.txt") as f:
    data = f.readlines()

Alternatively, you can process the file line by line without loading it into a list:

kb_ids = (
    "KB2982791",
    "KB2970228",
    "KB2918614",
    "KB2993651",
    "KB2975719",
    "KB2975331",
    "KB2506212",
    "KB3004394",
    "KB3114409",
    "KB3114570",
)

check = True
with open("WinUpdates.txt") as fin:
    with open("checked.txt", "w") as fout:
        for data in fin:
            for kb in kb_ids:
                if kb in data:
                    fout.write(kb + "\n")
                    check = False

        if check:
            fout.write("No faulty Windows Updates found!\n")

In newer Python versions, the two with statements can be combined into a single line.

Answer №2

I have made the necessary additions and corrections as per your request, please refer to the comments provided for clarification. This solution worked for me and I believe it will work for you too. Wishing you a fantastic day!

with open("WinUpdates.txt", "r") as f:  #It seems you forgot to include the "r" option for reading the file
data = f.read()  #Instead of converting data into a list, keeping it as a string should suffice

with open("checked.txt", "w") as f:
check=True
if "KB2982791" in data:
    f.write("KB2982791\n")
    check=False
if "KB2970228" in data:
    f.write("KB2970228\n")
    check=False
if "KB2918614" in data:
    f.write("KB2918614\n")
    check=False
if "KB2993651" in data:
    f.write("KB2993651\n")
    check=False
if "KB2975719" in data:
    f.write("KB2975719\n")
    check=False
if "KB2975331" in data:
    f.write("KB2975331\n")
    check=False
if "KB2506212" in data:
    f.write("KB2506212\n")
    check=False
if "KB3004394" in data:
    f.write("KB3004394\n")
    check=False
if "KB3114409" in data:
    f.write("KB3114409\n")
    check=False
if "KB3114570" in data:
    f.write("KB3114570\n")
    check=False

if check:
    f.write("No faulty Windows Updates found!")

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Guide on processing a hefty JSON file by rounding decimals to the nearest integer and calculating the average of y values when there are duplicate x values

I am currently exploring ways to efficiently filter and parse through extensive JSON data sets. My main goal is to round the x values to the nearest integer. In cases where there are duplicate entries, I aim to calculate the average of the y values while ...

What is the process for locating elements with the listitem role using Selenium?

Having trouble accessing and cycling through a collection of list items on a webpage using Python. The elements to be fetched are defined as follows, with 4 present on the page: <div data-test-component="StencilReactView" role="listitem& ...

What steps should be taken to set up a Python Selenium project for use on a client's machine?

As a new freelance python programmer, I recently took on a project to create a script that scrapes specific information online. Nothing sketchy, just counting how often certain keywords show up in search results. I used Selenium to write the script, but n ...

Exploring Deep Q Learning **WITHOUT** the use of OpenAI Gym

Seeking tutorials or courses on q learning without relying on open ai gym. I am working on a convolutional q learning model using pytorch and open ai gym, which is straightforward. However, applying it to environments outside of open ai gym, especially n ...

Steps for revealing the actual URL using Selenium-IDE

(I'm French so please excuse any language errors...) I am trying to use Selenium to download PDFs from certain websites. I have attempted using the web-driver option, but I have around 500 URLs to navigate through using Firefox... So, utilizing Selen ...

Python Scrapy | Techniques for transferring the response data to the main function within the spider

I have been searching extensively for a solution using various search engines, but I may not be entering the correct keywords. Although I am aware that I can use the shell to manipulate CSS and XPath selectors right away, I am curious to know if it is poss ...

Does the keyword type in Python refer to a function or a class that is built-in?

There is often confusion around the definition of type in Python. Some believe it to be a built-in function when provided with one argument, and a metaclass when provided with three arguments. However, according to Python's official documentation, th ...

an inplace operation has altered one of the necessary variables for gradient calculation:

While attempting to calculate the loss of the policy target network in Deep Deterministic Policy Gradient Algorithms using PyTorch 1.5, an error is encountered as shown below. File "F:\agents\ddpg.py", line 128, in train_model polic ...

Discovering collections of vectors that add up to zero

I am working with four arrays, each containing 3 arrays. For example: set1 = [array([1, 0, 0]), array([-1, 0, 0]), array([0, 1, 0]), ...] My goal is to determine the number of combinations of vectors that sum to zero. The current solution involves nested ...

Is there a way for me to change related_name in inherited or child objects?

Consider the given models: class Module(models.Model): pass class Content(models.Model): module = models.ForeignKey(Module, related_name='contents') class Blog(Module): pass class Post(Content): pass I am looking to retrieve al ...

Having trouble continuously clicking the 'more' button to access all the complete reviews

I have developed a Python script using Selenium to extract all the reviews from a specific page on Google Maps. This page contains numerous reviews that are only visible when scrolling down. My script successfully retrieves all of them. However, I am curr ...

Deactivating web links in PowerPoint using python-pptx

Being relatively new to XML and the python-pptx module, my goal is to eliminate a single hyperlink that appears on every page. My approach thus far has involved retrieving my files, converting them to zip format, and then extracting them into separate fol ...

Establishing the encoding format

My Firebird database is encoded in ISO-8859-1, but I'm struggling to correctly set it in my connection. I have attempted the following: conn = fdb.connect(dsn='mydatabase.fdb', user='***', password='***', charset=' ...

What could be causing the np.where function to return an empty tuple when checking if x is equal to the np.array([1,2,

To solve the problem of finding all the different indexes in np.where(x == [0,1,2,3,4,5...9323]), I am faced with the challenge of figuring out how to obtain specific indexes. When I try using the command np.where(x == np.array([1,2,3])), it returns an emp ...

Move spaces from one string to another in Python

Is it feasible in Python 2 to replicate whitespace elements such as spaces and tabs from one string to another? For instance, if the original string includes three leading spaces and a trailing tab like " Hi\t", can those exact whitespace characters ...

Could there be a more efficient method for accessing the file referenced in a Python stack trace?

When faced with a Python stack trace in either Output or Terminal, how can one quickly navigate to the specific file or line where the error occurred? I've noticed that Ctrl-click doesn't work for this purpose. I've been using ctrl-e and man ...

Python Selenium: How to locate elements using xpath in the presence of duplicate elements within the HTML code

Currently, I am utilizing selenium to extract data from a liquor sales website to streamline the process of adding product information to a spreadsheet. My workflow involves logging into the website using selenium and searching for the specific product. Wh ...

Django REST FrameWork JWT prohibits the provision of data and self-decoding

I currently have these API endpoints set up: urlpatterns += [ path('api-token-auth/', obtain_jwt_token), path('api-token-verify/', verify_jwt_token), path('api-token-refresh/', refresh_jwt_token), path('a ...

Error: The state of the element is invalid for importing a file

Currently, I am working on creating an automated test to import multiple files using the web application's UI. When the "Import" button is clicked, it launches Windows Explorer. The XPATH I'm using in my test is as follows: filename_field = //di ...

Using an `if else` statement to verify various conditions with only one input of text

I am experimenting with a modified version of the classic 3-Cup Monte Program. Students are using Brython in CodeHS Ide to create a game where they draw "cups" and randomly place a white ball under one of them. The drawing part is working perfectly fine. H ...