What is the process for transforming a collection of linked pairs of identifiers into a grouping of identifiers?

I have a dataset with pairs (and sometimes triples) of IDs that act as links in a chain.

+------+-----+
| from | to  |
+------+-----+
| id1  | id2 |
| id2  | id3 |
| id4  | id5 |
+------+-----+

I am looking to organize these links into clusters or families:

+-----+----------+
| id  | familyID |
+-----+----------+
| id1 |        1 |
| id2 |        1 |
| id3 |        1 |
| id4 |        2 |
| id5 |        2 |
+-----+----------+

This involves grouping all chained links into a single family and assigning an ID to each family. In the example above, the first two rows create one family, while the last row creates another family.

Solution Approach

To accomplish this task, I plan to use node.js to process large batches of data rows efficiently and insert them into a table with assigned family IDs.

Challenge Encountered

One major challenge is dealing with the high volume of ID pairs within the dataset. Additionally, new IDs will need to be added over time, which requires updating existing families with new members.

Are there effective algorithms available for clustering pairs of data into families or clusters, taking scalability into account?

Answer №1

When faced with a situation that required some problem-solving, I decided to experiment with creating two tables that resembled the ones already in existence. The first table was filled with data similar to what you already have.

Table Base, fromID, toID
Table chain, fromID, chainID (numeric, null allowed)

To tackle this challenge, I made sure that all unique values from Base were inserted into chain with a null value for chainID. This way, those rows could be identified as unprocessed.

Subsequently, I repetitively executed a couple of statements...

update chain c 
  set chainID = n 
  where chainid is null and exists ( select 1 from base b where b.fromID = c.fromID )
  order by fromID 
  limit 1

This process involved assigning the next chain ID to the initial row lacking one (n had to be derived and incremented each time this operation was performed).

The following step interconnected all the records...

update chain c 
    join base b on b.toID = c.fromID
    join chain c1 on b.fromID = c1.fromID
    set c.chainID = c1.chainID 
    where c.chainID is null and c1.chainID is not null

Repeat this process until it no longer affects any rows (indicating there's nothing more to do). Then, repeat the first update to establish the subsequent chain, and so forth. If running the first update results in zero rows being affected, it confirms that all connections have been established.

If you're open to experimenting with this approach, try applying it to more intricate scenarios to see how well it holds up.

Answer №2

It appears to be a case of clustering within a graph dataset, with 'familyid' serving as the central cluster number.

I found a relevant question related to this topic.

Check out the algorithm description here, which will need to be implemented based on the conditions provided.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

What steps can be taken to fix the 'Error: Cannot GET /' issue on Localhost?

I'm having an issue with my Node.js Express project where I keep getting a 'Cannot GET /' error on Localhost. Below is the content of my server.js file: console.clear(); const express = require("express"); const app = express(); c ...

Why is my Vue/MySQL Database not showing up online, even though it is accessible locally?

While my application runs smoothly locally, I encounter an issue when deploying to the Heroku server. The page contents linked to the MySQL Database fail to display without any CORS errors or fetch call issues in Chrome Dev Tools. The page loads, but remai ...

When using res.render to pass data to an EJS file and accessing it in plain JavaScript

I'm currently working on an Express GET function where I am sending data to an EJS file using the res.render() function. My question is, how can I access this data in plain JavaScript within the same EJS file? Here is my GET Function: router.get(&a ...

utilizing nodejs' request.end() method prior to establishing the event listeners

According to the nodejs documentation found at http://nodejs.org/api/http.html#http_event_connect_1, there is a concern with the example code provided. In this code snippet, the request.end() function is called before setting up the listeners (req.on(...) ...

What are some potential problems that could arise when making a POST request for signing authentication in a MERN stack using JWT?

I'm currently in the process of developing a social media application using the MERN stack. To ensure the functionality of the backend API, I am utilizing POSTMAN. Here is an overview of the dependencies outlined in the package.json file: { "na ...

What is the process for changing a value in a mongoose document?

Recently, I encountered a challenge in modifying someone else's code. Instead of deleting a MongoDB document, my task is to update it. Let's take a look at the original code snippet: const docs = await docsModel.find({}); for (const doc of doc ...

Start the Python program exclusively instead of displaying the Linux Graphical User Interface upon startup

Is it feasible to deactivate the Linux GUI rendering and restrict access to the system, only displaying the Python (or Node.js) script upon startup? I aim to execute a Python or Node.js script with its own GUI in fullscreen mode and lock it down, while s ...

What could be the reason for my web scraping yielding HTML code but failing to retrieve any textual content?

Hello, I am a new coder and I am currently working on extracting earnings per share data from the following website: My initial attempt was to retrieve the "March" data only, for which I used the code below: from bs4 import BeautifulSoup from requests im ...

The Passport JWT strategy has encountered a malfunction

I can't figure out why my Passport Jwt Auth suddenly stopped working. Code snippet from app.js app.use(passport.initialize()); passport.serializeUser(function (user, done) { done(null, user); }); passport.deserializeUser(function (user, done) { ...

I am trying to move to a different page, but for some reason the router.navigate function is not functioning within the subscribe

//I am attempting to redirect to another page once the subscribe method is executed, however I am encountering issues with the router.navigate function within the subscribe method. //In an effort to address this issue, I have attempted to store the data r ...

Guide to running code repeatedly in node.js

Is there a way to repeat the console.log function 5 times in Node.js using the 'repeating' npm package? I am working on a node application that needs to run a specific code multiple times, but I can't seem to figure out how to achieve this. ...

Discovering the source of an error in Jest: Unveiling the stack trace and cause

I am currently troubleshooting a nodeJS application. I encountered an error where a variable is undefined. When running the code without Jest, the error was clear and easily located: without jest: ➜ server git:(dc/build) ✗ node test/runner.js /Users/ ...

The "DELETE" method in ajax is malfunctioning

I encountered an "internal server error (500)" in the console. When checking my NodeJS console, I received a "ReferenceError: request is not defined" message. Below is the code snippet that caused the issue: $(document).ready(function(){ $('.dele ...

The error message "TypeError: Buffer.alloc is not a function when trying to access Firebase tools"

Attempting to access Firebase console from my terminal, but encountering an error that is hindering me. https://i.stack.imgur.com/2NGFF.png I also looked into this post, but since I have no experience with node.js, it was difficult for me to grasp the so ...

Storing tasks in the server backend for later access

Struggling to find the most efficient way to store todo list items in the backend. I've heard that storing arrays and objects directly in the backend may not be optimal. Currently working on a web app inspired by Google Keep. Here's some context ...

The Node-Express application is unresponsive when trying to connect with external servers

In the first Virtual Machine, I have a basic node script set up. // server.js var express = require('express'); var annotations = require('./routes/annotations'); var cors = require('cors'); var app = express(); app.use(cors( ...

Using the .sendFile method is only compatible with the Chrome browser

I'm encountering an issue when trying to send a PDF file to a client using .sendFile from Express. Everything works perfectly in Chrome - with a Download link, Chrome saves the file; however, with a normal _blank link, Chrome opens it in a new tab. B ...

The property cannot be set because it is undefined in nodejs

var list = [ { title : '', author : '', content : '', } ] router.get('/japan',function(req,res){ var sql = 'select * from japan'; conn.query(sql,function(err,rows,fields){ for(var i = 0 ; ...

Creating Express.js routes with multiple path parameters

Is there a way for Express.js to differentiate between the paths "/1.1.1" and "/login" ? This is my current code: app.get('/:x?.:y?.:z?', function(req, res){ ... app.get('/login', function(req, res){ ...

What steps should I take to resolve the issue with the Error: spawn php-cgi ENOENT?

My current setup involves Nuxt with php-express, and I've been troubleshooting a persistent error in my php script for hours without success. *server.cjs : * const express = require("express"); const expressPhp = require("express-php&q ...