Python script to locate the identical sentence in two separate files

Hello there. I am a beginner in Python and need some help with finding code that can search for exact keywords in a text file within an HTML file. For example, looking for keywords from keyword.txt in data.html. Currently, the code is only matching the first word instead of the entire sentence.

The keywords in my keyword file are:


Hello welcome
Hello welcome to this page
Hello world

My data file contains:


Hello
hello good day

Based on this scenario, it should return "no match", but it's currently returning "match found".

I also need assistance on how to make sure it searches for all keywords line by line against the HTML page.

Your help is greatly appreciated. Thank you in advance.

This is my current code:


import re

keyfile = 'keyword.txt'
testfile = 'data.txt'
keys = set(key.lower() for key in re.findall(r'\w+', open(keyfile , "r").readline()))
with open(testfile) as f:
    for line in f:
        words = set(word.lower() for word in re.findall(r'\w+', line))
        if keys & words:
            print "match found"

Answer №1

Adjust line number 6 to read open(keyfile , "r")) instead of

re.findall(r'\w+', open(keyfile , "r").readline()))
, so that entire lines are added to the key set, not just individual words. Additionally, make sure the matching part is modified to compare entire lines.

Your updated code should resemble:

import re

keyfile = 'keyword.txt'
testfile = 'data.txt'
keys = set(key.lower() for key in
    open(keyfile , "r"))
with open(testfile) as f:
    for line in f:
        if line.lower() in keys:
            print "match found"

Following these changes should address your issue.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Using Python Selenium web-driver to hide console windows within a tkinter application

After creating a basic GUI using the tkinter library, I implemented a feature that opens a new thread with Selenium when the user clicks on a button: In the start_button3_callback method: # Initiating a new thread to run the button3_callback function in t ...

Is there a recommended method for utilizing multiple selenium webdrivers at the same time?

My objective is to create a Python script that can open a specific website, fill out some inputs, and submit the form. The challenge lies in performing these actions with different inputs on the same website simultaneously. I initially attempted using Thr ...

Adding a graph to an existing matplotlib plot: A step-by-step guide

My task is to create a program that provides the user with the roots and vertex of a quadratic curve, then prompts the user to input the correct calculated equation. After receiving the input, the program should generate a graph that reflects the provided ...

Python's function file.truncate() does not behave as expected and does not actually truncate the file

This is a basic Python program that I have: def print_file(filename): with open(filename,'r') as read_file: print(read_file.read()) def create_random_file(filename,count): with open(filename,'w+', encoding='utf ...

I am having trouble assigning a value to a specific position in a dataframe

I've been attempting to set a nested dictionary at a specific position, but it just won't work. Here's the code snippet I have: def history_current(df): df_this = df.copy() leid_val = {} leid_index = {} run_seq_min = min(df. ...

Utilize Django's TemplateView in ModelAdmin's add_view for seamless integration

As per the Django Admin site documentation, I have discovered that I can customize ModelAdmin.add_view to inject a custom view. My intention is to include a TemplateView to modify an admin page for adding or changing models. This unique view will feature ...

The function Dataframe.reset_index() fails to operate properly following a concat operation

While researching, I came across two other related questions that didn't provide the solution I needed: [1], [2]. The problem arose when I concatenated several columns of df at the beginning and end of df_new. This operation led to an increase in in ...

Deploying PhantomJS on Heroku

Running a node app on Heroku has been smooth sailing for me. I've implemented web scraping through selenium in Python, where my python script is called from the node app whenever needed. When testing locally on my Mac, everything functions perfectly a ...

Interpreting the boto CLI command

When running this Boto code, I am getting the incorrect output. I want to see the status of my EBS volume, not the mount point. Here is the structure of my EC2 Reservation: object {1} Reservations [1] Instances[1] BlockDeviceMappings[2] ...

Could this site be inhibiting my scraping efforts using BeautifulSoup?

For the past few years, I've been utilizing BeautifulSoup to extract TopCashBack website links. However, when I attempt to change the URL to a Screwfix link, I am not able to retrieve any data. s = requests.get("https://www.screwfix.com/p/128hf&q ...

Tips for manipulating JSON data in Azure Data Factory

Received the data in env_variable.json using a Lookup variable and looking to extract "NO" and "BR" programmatically for iteration within a ForEach activity. The content of the file is as follows: { "countries" : { "NO" : { "wells": ["0015/abcd"] }, "BR" ...

Once more: "The JSON object should be a string, bytes, or bytearray, not a list" encountered in the requests library

I'm struggling with a simple request and can't figure out what's wrong data1 = df.loc[N, 'online_raw_json'] print(type(data1)) print(data1) data1 = json.dumps(data1) print(type(data1)) print(data1) response = requests.post("h ...

Add Python 2.7.8 (64-bit) to your system without overwriting the current Python27 installation

Is it possible to install Python 2.7.8 (64-bit) on Windows 7 without having to replace the existing Python27 (64-bit) installation? ...

Python 3: Tricks for personalizing a map with interactive features

Currently facing an issue with setting up different block packs for drawing maps on the platformer game. Theoretically, pressing specific numbers should do the trick. All files are organized in a folder with the correct hierarchy. What would be the optima ...

Analyzing the functionality of diverse functions with variations

Imagine I have a polymorphic function that duplicates any object passed to it as an argument (similar to the itertools.repeat from the Python Standard Library): def repeat(i): while True: yield i How can I add function annotation to indicate t ...

Encountered an OverflowError when attempting to solve the problem of calculating the total number of subsets without consecutive numbers

I'm currently working on solving a challenge in TalentBuddy using Python Here is the problem statement: Given an integer N, you need to find the total number of subsets that can be created using the set {1,2..N}, ensuring that none of the subsets ...

What is the title of a document that is linked behind an HTML link?

I am seeking a way to automatically retrieve documents from web pages using a Python script. The links in the HTML pages appear as follows: href="https://foo.bar/view.php?id=123456" When clicked on in a web browser, these links open the document with its ...

I'm currently facing difficulties trying to implement AJAX with JavaScript and PHP as the desired output is not being

My query is quite straightforward - why isn't the code functioning properly? I am attempting to have the text echoed in PHP displayed inside a div with the ID of "show". Interestingly, this works with a txt file but not with PHP or any other type of f ...

Is there a way to create a combined bar and line plot on a single graph with separate Y axes?

I have two sets of data that share a common index, and I would like to display the first set as a barplot and the second set as a line plot on the same graph. Currently, I am using a method similar to the one shown below. ax = pt.a.plot(alpha = .75, kind ...

Tips for sharing parameters between Lambda Functions in AWS Step Functions

After scouring the internet, I couldn't find a solution that works for me in Python. I'm trying to pass certain parameters from one lambda function to another within a step function, but it's proving to be more challenging than expected. Can ...