Error in nearest neighbor computation with custom distance function in sklearn

My attempts to utilize a custom model function from Sklearn appear to be yielding incorrect distances. I possess two vectors and typically compute their cosine similarity, following it up with 1 - cosine_similarity for usage as a distance metric.

Here's the snippet of code in question:

from sklearn.metrics.pairwise import cosine_similarity
def dist_fun(x, y):
    return 1 - cosine_similarity(x.reshape(-1, 1), y.reshape(-1, 1))
nbrs = NearestNeighbors(n_neighbors=499, algorithm='brute',metric = dist_fun)
nbrs.fit(x)
distances, indices = nbrs.kneighbors(x[10])
print (distances)

The output that I am observing is rather perplexing. The distance values show a consistent increase in their magnitude. Even when attempting to predict nearest neighbors with varying sample sizes, this monotonically increasing behavior persists.

array([[ -4.44089210e-16,   3.56164824e-03,   8.85347066e-02,
          9.26700271e-02,   9.58609825e-02,   9.64477012e-02,
          9.71035356e-02,   9.73237473e-02,   9.80660138e-02,
          9.80660138e-02,   9.80660138e-02,   9.83731564e-02,
          1.00054271e-01,   1.01234246e-01,   1.01811852e-01,
          1.01849141e-01,   1.02621459e-01,   1.03175060e-01,
          ... (remaining content omitted for brevity) ...
          ... (remaining content omitted for brevity) ...
          1.72180127e-01]])

Answer №1

Expanding upon my previous comment, the NearestNeighbors (NN) algorithm will always output distances in ascending order, representing the K nearest neighbors from closest to furthest.

The indices variable contains the indexes of the K nearest neighbors for each data point. The distances array shows the distances from each of the closest K points to the source point, arranged in ascending order. Meanwhile, the indices array indicates which data point corresponds to each position in the distance array.

If you want to retrieve the original values, you can use the following code:

y = x[indices]

After executing this code snippet, the list y will contain the K nearest points in order.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

A more streamlined method for displaying an image on a 3D plane in matplotlib that doesn't involve using a meshgrid

Looking for an improved method to create a 3D plot of a plane with an image using matplotlib? Here is one way: xx, yy = np.meshgrid(np.linspace(0, 1, img_size[1]), np.linspace(0, 1, img_size[0])) zz = np.ones((img_size[0],img_size[1])) ax.plot_surface( ...

Ways to shut down Python selenium webdriver window

I have a python script that extracts data from a website every hour. It is currently stored on the server and functions well with the help of task scheduler to run it hourly. Within my code, I use the following: driver.quit() This command is used to clo ...

Issue with fingerprinting while running selenium tests concurrently

I have implemented fingerprint identification for anonymous site visitors using https://github.com/Valve/fingerprintjs2. However, I am facing an issue where I need to simulate multiple user sessions simultaneously in order to run tests. nosetests --proce ...

What is the best method for creating a seamless curve with time-series data using interp1d?

I have a dataframe named new_df that consists of the following data: period1 intercept error 0 2018-01-10 -33.707010 0.246193 1 2018-01-11 -36.151656 0.315618 2 2018-01-14 -37.846709 0.355960 3 2018-01-20 -37.170161 0.343631 4 2018- ...

launch various chrome profiles using selenium

Upon running this code with Chrome already open, an error message is displayed: Message: invalid argument: user data directory is already in use, please specify a unique value for --user-data-dir argument, or don't use --user-data-dir I require the a ...

Having trouble fetching website data using Selenium in Python

I'm encountering an issue when attempting to extract a table from a dynamic webpage using selenium in Python. Here is the code snippet I am currently using: from selenium import webdriver url = 'https://draft.shgn.com/nfc/public/dp/788/grid&apos ...

What is the best way to extract a particular value from a text that consistently appears a few lines below another value?

This is my method for extracting data from a JSON file: import json json_file = json.load(open('Bla_Bla_Bla.json')) master_data = json_file['messages'] for unique_message in master_data: print(unique_message['text']) H ...

Flask Template Failing to Load Local CSS

Hello there, I am currently working on integrating custom CSS into my Flask application. While I had no issues loading Bootstrap, I seem to be facing some difficulties with my local CSS. Below is the method I am using to load my CSS: <link rel="st ...

Deciphering a data from a document that has been encrypted using Fernet

Currently, I am in the process of decrypting a password retrieved from a table that was manually decrypted by using Python's cryptography.fernet class. The manual encryption for the password is as follows: key = Fernet.generate_key() f = Fernet(key) ...

Converting a document into an ASCII encryption key

I am attempting to encode a user input file using a random ascii key. I have managed to generate the random key and convert the file contents into ascii, but I am struggling with how to apply the key for encryption. I have tried several approaches, but my ...

At what point should I end the session with the webdriver?

My web scraping process involves using both Scrapy and Selenium. When I run my spider, I notice the instance of my webdriver in htop. I am unsure about when exactly I should close the webdriver in my code. Would it be best to do so after processing each l ...

Converting serial data into columns within a pandas dataframe through row transformation

I'm looking to convert individual rows in a dataframe into columns with their corresponding values. My pandas dataframe has the following structure (imported from a json file): Key Value 0 _id 1 1 type house 2 surface ...

What happens to the length of an LSTM sequence when a fully connected layer is added on top of it?

Attempting to create a flexible-length LSTM here. Firstly, I construct LSTM units followed by adding a fully connected layer with 2 output nodes. Here's the code snippet: from tensorflow.keras.models import Sequential from tensorflow.keras.layers impo ...

Navigating repetitive sections of code using Selenium

<block data-id="1234"> <taga>New York</taga> <tagb>Yankees</tagb> </block> <block data-id="5678"> <taga>Montreal</taga> <tagb>Expos</tagb> </block> <block data-id="2468"> ...

Receiving incorrect results for certain values when using a minimum of three numbers

Trying to determine the smallest of three numbers, but encountering errors with certain values like 10, 30, and 4 where it incorrectly identifies the smallest number as 10. num1 = input("Choose a number: ") num3 = input("Choose a number: ") num4 = inpu ...

Error encountered: Failure to construct URL for endpoint 'Bookdetails'. Have you overlooked supplying values for the parameter 'book_id'?

I have been troubleshooting this issue and reviewed similar threads, yet despite verifying all options, the problem persists. My code includes: application.py @app.route("/Booksearch/<int:book_id>", methods=["GET", "POST"]) def Bookdetails(book_id ...

Django REST FrameWork JWT prohibits the provision of data and self-decoding

I currently have these API endpoints set up: urlpatterns += [ path('api-token-auth/', obtain_jwt_token), path('api-token-verify/', verify_jwt_token), path('api-token-refresh/', refresh_jwt_token), path('a ...

automate text selection in a browser using Selenium with Python

I'm trying to retrieve all values from a textbox using Selenium in Python. Here's the code I have so far: # -*- coding: UTF-8 -* from selenium import webdriver #open webdriver for specific browser import requests import time def getListZip(z ...

Support Vector Machines on a one-dimensional array

Currently digging into the Titanic dataset, I've been experimenting with applying an SVM to various individual features using the code snippet below: quanti_vars = ['Age','Pclass','Fare','Parch'] imp_med = Sim ...

The newly generated column is populated with inaccurate data

Two variables, a and b, are both binary. a b 1 1 0 1 1 1 0 0 0 0 ... 1 1 0 1 1 0 0 0 0 0 A new variable c needs to be created based on certain conditions: def test_func(data): if data['a'] == 0 &am ...