What is the most effective method for storing large volumes of data on a daily basis that can be categorized for statistical analysis?

I am currently creating a unique tracking tool tailored for marketing campaigns. This specialized tool acts as a mediator between advertisements and landing pages, responsible for recording all user data including user-agent information, IP addresses, clicks on landing pages, and geocoding details such as country and ISP associated with the users' IPs.

However, I am facing some design challenges:

  • The volume of traffic on these campaigns is extremely high, potentially resulting in millions of rows being inserted daily. Since this system caters to multiple users, storing all data in a single table could lead to chaos. Perhaps dividing the data into separate tables, one per user, might alleviate this issue, but I am uncertain about this approach.
  • The data storage process must be executed swiftly (within milliseconds), leading me to believe that NodeJS would outperform PHP due to its speed and resource efficiency. I want to avoid server crashes caused by insufficient RAM.
  • For statistical analysis purposes, I require data grouping. While there may be one row for each user visiting a landing page, it is imperative to aggregate this information accurately in order to reflect the total number of impressions on said landing page. These queries need to be conducted quickly given the large dataset involved.
  • To geocode IP addresses effectively, precise details such as Country, ISP, and connection type are necessary. However, incorporating an API service for real-time data retrieval could potentially slow down the saving process. It needs to be done instantaneously without delaying execution.

Following the data-saving procedure, the system should promptly redirect to the landing page to prevent missing out on potential leads. Time sensitivity is crucial.

My primary objective revolves around finding optimal solutions for:

  • Effectively managing extensive databases
  • Rapidly capturing user data (in milliseconds)
  • If feasible, expediting ip geocoding while not hindering operations
  • Enhancing schema and query efficiency for generating statistics

Do you have any recommendations? Thank you in advance.

Answer №1

Avoid the chaos of having one table per user; it's not ideal.

Handling millions of rows daily, possibly at a rapid rate, may require a staging approach where multiple rows are gathered and batch-inserted. Before delving deeper, provide details on the data flow: Single or multiple clients? UI or batch processes? Tentative CREATE TABLE structure, etc.

To deal with large datasets, consider creating and incrementally updating "Summary tables."

If you need to map user IP addresses to countries, that's a separate issue that has already been addressed.

In pursuit of real-time processing in milliseconds, some compromises will have to be made.

For more information, visit ; explore the three blogs on Data Warehouse Techniques.

Optimizing storage by day

To ensure rows for a single day are stored together in InnoDB, prioritize datetime in the PRIMARY KEY. This can significantly enhance certain queries by enabling sequential data scanning and reducing disk I/O. If AUTO_INCREMENT is necessary, modify the PK as follows:

PRIMARY KEY(datetime, id),  -- for clustering and uniqueness
INDEX(id)  -- to satisfy AUTO_INCREMENT

This strategy works well for short time ranges if the data exceeds available RAM, but larger time spans may suffer from slower I/O speeds.

Managing dynamic data with summary tables

Though challenging, maintaining summary tables could be feasible with deeper insights into the data and updates.

Scanning a million rows within a sub-second timeframe isn't feasible, even with optimizations. Summary tables offer a faster solution for fetching desired data.

Reducing data size

  • Prefer smaller datatypes like INT over BIGINT when possible; use UNSIGNED appropriately.
  • Normalize repeated strings to save space.

Smaller data is easier to cache, leading to improved performance when accessing disk storage.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Unleashing the full power of Node.JS asynchronous operations

I've been struggling to grasp how to effectively manage the asynchronous nature of Node.JS. Despite reading extensively on the topic and experimenting with message passing and callback functions, I can't seem to get my object constructor to load ...

Issue encountered during npm installation command

Recently diving into nodejs and experimenting with the Visual Studio Code editor. Encountering difficulties in installing packages, with an error message indicating a possible issue related to the proxy. Despite attempting various solutions found online ( ...

Error message: NodeJS express unknown function or method get()

Currently, I am facing an issue while working with express and Pug (Jade) to render a page as the get() function is returning as undefined along with warnings... I followed these steps: npm install express --save npm install pug --save Here's a sn ...

Problems encountered when transferring information from jQuery to PHP through .ajax request

Hey there! I am currently working with Yii and facing an issue while trying to pass some data to a controller method called events. This is how my jQuery ajax call looks like: var objectToSend = { "categories" : [selectedOption],"datefrom" : month + "" + ...

Enhanced Slider Display featuring Multiple Posts for Improved Performance

My Sample Page features a slider that displays over 200 posts, each containing 5 images. However, the slider loads all the images at once, causing my page speed to be very slow. I am looking for an optimized way to display the slider without compromising l ...

What is the best way to verify a user's login status in AngularJS using $routeChangeStart event?

I am new to AngularJS and I need help checking if my user is logged in or not using $routeChangeStart. Controller angular.module('crud') .controller('SigninCtrl', function ($scope,$location,User,$http) { $scope.si ...

Fetching data from multiple URLs with an unknown quantity asynchronously

What would be the most efficient approach for performing an unknown quantity of asynchronous GET requests? Imagine you have a base URL, '', and you want to execute a series of asynchronous requests to various paths like: [ 'http://www. ...

Having trouble with installing the npm package "testmybot"

I am attempting to utilize this sample to showcase the testing of a chatbot using the node "testmybot" package. However, when I run the "npm install" command, I encounter an error. Please refer to the attached screenshot for details. Steps I have taken ...

Develop a nodejs script to make a request using a curl or similar method

Can anyone help me figure out how to replicate the functionality of this OpenSSL command using Node.js or curl? The command is: openssl s_client api-prd.koerich.com.br:443 2> / dev / null | openssl x509 -noout -dates. I have been unsuccessful in my at ...

When I use ORDER BY in my query, I encounter the ORA-00933 error, indicating that the SQL command

Hey there, I have a question regarding resultSet processing in my query. The query runs smoothly until I add 'ORDER BY name des', resulting in an ORA-00933: SQL command not properly ended error. This is the current query I am using: router.post ...

Is there a way to retrieve the administrator role from a stored value of `a:1:{s:13:"administrator";b:1;}` when querying from a MySQL database

$authors = $wpdb->get_results('SELECT wp_users.user_login, wp_users.display_name, wp_usermeta.meta_value FROM wp_users INNER JOIN wp_usermeta ON wp_users.ID = wp_usermeta.user_id WHERE wp_usermeta.meta_key = "wp_capabilities&qu ...

Move data from one mysql table to another in different databases using PDO

After spending the last hour researching, I still haven't found a straightforward solution that doesn't involve complicated exports and imports. All I want to do is establish a PDO connection with two databases so that I can utilize them both in ...

Understanding Doctrine: Choosing a User's Associated Store

I have a User entity with attached stores in a many to many relationship database. I'm retrieving the stores in my controller using: $userrepository = $this->getEntityManager()->getRepository('Thuiswinkelen\Entity\User'); $u ...

Cypress CI encountered an issue: unable to connect, error code ECONNREFUSED for address 127.0.0.1 on port

Embarking on a small project, I am aiming to conduct Cypress tests against a Nodejs application that I sourced from a GitHub example. Moreover, my intention is to execute these tests within GitLab CI/CD environment. The contents of my yml file outline the ...

Learn how to update a fixed value by adding the content entered into the Input textfield using Material-UI

I made a field using the Input component from material-ui: <Input placeholder="0.00" value={rate} onChange={event => { this.setState({ `obj.rate`, event.target.value }); }} /> Whenever I input a rate into this field, ...

Electron: Interactive menu customization

Is there a way in Electron to dynamically enable/disable specific MenuItem in the context menu based on the element that the user right-clicks on? Additionally, I am looking for a method to identify the exact element clicked and pass that information to th ...

The engine for integrating [email protected] with node-red-contrib-amqp is not compatible

Is the latest version of node-red installed on Ubuntu 20.04? I'm attempting to include the node-red-contrib-amqp plugin according to the instructions on the GitHub page. https://github.com/abreits/node-red-contrib-amqp#installation $ sudo npm instal ...

How come I am unable to display the message "Greetings Earth" using ExpressJS?

I'm currently working on adding an experience section to my website profile page. However, when I try to submit the form on add-experience.html, it redirects me to /Add-Experience-2 but I'm unable to display the "Hello World" text. app.post(" ...

The Express server automatically shuts down following the completion of 5 GET requests

The functionality of this code is as expected, however, after the fifth GET request, it successfully executes the backend operation (storing data in the database) but does not log anything on the server and there are no frontend changes (ReactJS). const ex ...

Real-time chat functionality using Laravel 5.2, socket.io, and private messaging with redis

TLDR; Seeking best solution for private instant messenger integration with Laravel. I currently have a live real-time chat feature on my Laravel website, but I am missing the ability for private messaging. The chat is located on the home page and every ti ...