Is there a way to use jq to divide a JSON stream of objects into individual files according to the values of a specific object property?

Dealing with a hefty file (20GB+ compressed) named input.json, filled with a stream of JSON objects like so:

{
    "timestamp": "12345",
    "name": "Some name",
    "type": "typea"
}
{
    "timestamp": "12345",
    "name": "Some name",
    "type": "typea"
}
{
    "timestamp": "12345",
    "name": "Some name",
    "type": "typeb"
}

The goal is to split this giant file based on the property type: typea.json, typeb.json, etc., each containing its own set of JSON objects that match the respective type.

Tackling this challenge for smaller files wasn't an issue, but handling such a massive file causes memory overflow on my AWS instance. To keep memory consumption in check, I know I need to utilize --stream but figuring out how has me stumped.

cat input.json | jq -c --stream 'select(.[0][0]=="type") | .[1]'
retrieves the values of each type property, but how can I use this information to filter the objects accordingly?

Any guidance or insights would be immensely appreciated!

Answer №1

If the JSON objects in the file are relatively small, typically a few MB or less, you may not need to use the "--stream" command-line option, which can be rather complex and is usually reserved for handling huge JSON files.

However, there are still decisions to make. You can choose between a multi-pass approach involving multiple calls to jq (N or N+1 calls, where N is the number of output files), or a single-call approach followed by using a program like awk to divide the data into separate files. Each method has its own advantages and disadvantages, but if processing the input file multiple times is acceptable, the first approach might be more efficient.

To estimate the computational resources needed, consider measuring the resources used by running jq empty input.json.

Based on your description, it seems that the memory issue you encountered is likely related to unzipping the file.

Answer №2

Utilizing the power of jq to separate a NUL-delimited stream containing pairs of (type, document), and harnessing the capabilities of native bash (4.1 or newer) to write these documents with a persistent group of file descriptors:

#!/usr/bin/env bash
case $BASH_VERSION in ''|[1-3].*|4.0*) echo "ERROR: Bash 4.1 required" >&2; exit 1;; esac

declare -A output_fds=( )

while IFS= read -r -d '' type && IFS= read -r -d '' content; do
  if [[ ${output_fds[$type]} ]]; then  # checking for existing file handle for this output file
    curr_fd=${output_fds[$type]}       # reusing it if available.
  else
    exec {curr_fd}>"$type.json"        # opening a new output file...
    output_fds[$type]=$curr_fd         # storing its file descriptor for later use.
  fi
  printf '%s\n' "$content" >&"$curr_fd"
done < <(jq -j '(.type) + "\u0000" + (. | tojson) + "\u0000"')

This method ensures that only a few records are processed at a time, limiting memory usage and enabling operation on large files with reasonable size records.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

The reason behind my unsuccessful attempt to utilize AJAX with the Google GeoChart API

Learning about JavaScript is exciting! I recently tried incorporating a Google Geochart to generate visual reports. The sample code looked like this: function drawRegionsMap() { var data = google.visualization.arrayToDataTable([ ['Country ...

Discovering key value pairs on a Linux server

I need to extract a key pair from improperly formatted JSON data, making it impossible to use tools like jq. The data also doesn't display in any specific order, ruling out extraction by columns or field numbers. What is the most effective approach f ...

Tips on renaming multiple JSON field names simultaneously

I have a JSON stored in a hashmap and I am currently using the ObjectMapper to map the JSON. The problem is that I need to change the field names of most of the values. For example: { "field1":"abc", "field2": "xyz"} I want it to look like ...

How to deserialize JSON with NewtonSoft when dealing with properties with the same name and arrays

Recently, I received a json result that looks like this: {"status":"1","message":"OK","result":{"status":"0"}} I am trying to extract the status value of 0. This is my code: public class GetTransactionStatus { [JsonProperty("result")] ...

Error: Trying to send FormData to ajax results in an Illegal Invocation TypeError being thrown

Successfully sending a file to the server for processing using the code below: var formData = new FormData(); formData.append('file', $('#fileUpload')[0].files[0]); options = JSON.stringify(options); // {"key": "value"} $.ajax({ ...

There was an issue parsing the query: An unexpected character "'"" was encountered in the Graph

Recently, I began delving into the world of GraphQL and have been experimenting with calling our GraphQL server through Postman using JSON format. {"query":"query{stateQuery{avatar(address:'21638103') {action}}}"} However, I ...

Creating a multi-level JSON object from a string: A step-by-step guide

I've organized my HTML file in the following structure: Chapter 1.1 1.1.1 interesting paragraph 1.1.1.1 interesting paragraph Chapter 1.2 1.2.1 interesting paragraph 1.2.1.1 interesting paragraph ... Chapter 11.4 ... 11.4.12 interesting ...

Enhance the editing capabilities of the Json data form

https://i.stack.imgur.com/YZIjb.png My goal is to enhance a form for editing json data by moving beyond the typical <textarea /> tag and making it more user-friendly. Are there any tools available that can help improve the form's usability? Add ...

Displaying data from a JSON file in a table view in Swift can be challenging, especially if the type `ANY` is used and does

I've noticed that there have been multiple inquiries similar to mine, but none of them seem to address my specific goal. In my local directory, I have a JSON file structured like an array as follows: "["countrycode1", "country1", "countrycode2", "cou ...

The current JSON array is unable to be deserialized (for example, [1,2,3])

I am facing a challenge with JSON data and its corresponding model. Here is the JSON data: [ [ [ { "origin": [ -15.2941064136735, -0.43948581648487, 4. ...

Error encountered: "Invalid keyword 'using' while attempting to insert a new record into Smartsheet."

Welcome, **I am a beginner when it comes to REST and JSON, but I am attempting to set up C# code to add new rows to a SmartSheet through the API. After testing this in POSTMAN, I received the response shown below. Can anyone point out what I might be miss ...

Saving a JSON data structure in a storage system

Currently faced with the challenge of storing JSON data in a MySQL database, I am struggling to identify the most efficient method. { "id":"9", "title":"title", "images":[ { "image":"house.png", "width":"680", "height":"780" },{ "image":"car.png", "width" ...

Developing a JSON object for an HTTP POST request

I am attempting to send a JSON string to the post method to retrieve flight information. I have an example JSON that I need to modify for different requests. { "request": { "passengers": { "adultCount": 1 }, "slice": [ { ...

Utilizing the json_encode() function in PHP and JSON.parse() method in JavaScript for handling file data interchange

Utilizing json_encode() in PHP to store an array in a file, then leveraging JSON.parse() in JavaScript on the client side to read the json encoded file and pass it as an array to a sorting algorithm: The result of my json_encode() operation in the ...

How can JavaScript/jQuery be used to update LocalStorage Objects when editing a form?

Having trouble pinpointing an issue with my code. Despite making modifications, the values in localStorage are not updating as expected. Any suggestions on what may be causing this problem? Note: Changing const idx to const i resulted in only the final va ...

It is essential for Jquery to properly evaluate the first JSON result, as skipping

I'm currently facing an issue where the first JSON result is being skipped when I try to evaluate a set of JSON results. Below is the Jquery code snippet in question: function check_product_cash_discount(total_id){ //check for cash discount ...

Unnecessary display of results caused by the use of CURLOPT_RETURNTRANSFER

Here is a snippet of my code: $ch = curl_init(); // set URL and other appropriate options curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch ...

Unable to display Klout topics using Ajax Json

After successfully displaying the Klout scores, I have encountered an issue with the Klout topics showing up as undefined. <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script> <script> var settings ...

Utilize Gson library to effectively load an object

Apologies if this seems like a basic question, but I am facing a bit of confusion on this Monday morning. I am interested in creating a method that utilizes certain functionalities from the Gson library to load various settings Objects. Essentially, I hav ...

Obtaining JSON data in a separate JavaScript file using PHP

I have an HTML file with the following content: // target.html <html xmlns="http://www.w3.org/1999/xhtml"> ... <script src="../../Common/js/jquery-ui-1.10.3.js"></script> <script src="../../Common/js/select.js" type="text/javascript"& ...