Dividing a JSON array into two separate rows with Spark using Scala

Here is the structure of my dataframe:


root
 |-- runKeyId: string (nullable = true)
 |-- entities: string (nullable = true)

+--------+--------------------------------------------------------------------------------------------+ 
|runKeyId|entities                                                                                    |
+--------+--------------------------------------------------------------------------------------------+ 
|1       |{"Partition":[{"Name":"ABC"},{"Name":"DBC"}],"id":339},{"Partition":{"Name":"DDD"},"id":339}|

I'm looking to explode it with Scala to get this result:


+--------+--------------------------------------------------------------------------------------------+
|runKeyId|entities                                                                                    |
+--------+--------------------------------------------------------------------------------------------+
|1       |{"Partition":[{"Name":"ABC"},{"Name":"DBC"}],"id":339}
+--------+--------------------------------------------------------------------------------------------+
|2       |{"Partition":{"Name":"DDD"},"id":339}
+--------+--------------------------------------------------------------------------------------------+

Answer №1

It appears that the JSON format is invalid at the moment. Please correct the JSON structure before attempting to parse it as JSON and break it down using the following steps.

val dataFrame = Seq(
  ("1", "{\"Partition\":[{\"Name\":\"ABC\"},{\"Name\":\"DBC\"}],\"id\":339},{\"Partition\":{\"Name\":\"DDD\"},\"id\":339}")
).toDF("runKeyId", "entities")
  .withColumn("entities", concat(lit("["), $"entities", lit("]"))) // fix the json


val finalDF = dataFrame.withColumn("entities",
  explode(from_json($"entities", schema_of_json(dataFrame.select($"entities").first().getString(0))))
).withColumn("entities", to_json($"entities"))

finalDF.show(false)

Result:

+--------+----------------------------------------------------------------+
|runKeyId|entities                                                        |
+--------+----------------------------------------------------------------+
|1       |{"Partition":"[{\"Name\":\"ABC\"},{\"Name\&\quot;DBC\&quo\t;}]","id":339}|
|1       |{"Partition":"{\"Name\":\"DDD\"}","id":339}                     |
+--------+----------------------------------------------------------------+

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Trouble with json_encode when dealing with a multidimensional array

I've been struggling to retrieve results from 2 queries in JSON format. Even though var_dump($data) is showing the data, using json_encode either returns empty results or doesn't work at all. $data = array(); $array_articles = array(); $sql_arti ...

Python 3.8 showcases noticeable differences between orjson and json dumps method

I recently made the switch to using orjson for its speed, but I encountered an unexpected issue that had been lingering unnoticed for quite some time. After conducting tests, here are the results. import orjson, json data = json.dumps({"channel_id&q ...

What are the steps to sorting in JavaScript?

I need help with sorting an array. The array I have looks like this: var temp = [{"rank":3,"name":"Xan"},{"rank":1,"name":"Man"},{"rank":2,"name":"Han"}] I've tried to sort it using the following code: temp.sort(function(a){ a.rank}) But unfortun ...

Incorporate a new data field within a JSON array

I recently received a JSON string containing an array structured like this: { "Id": 123, "Username": "Sr. X", "Packages": [ { "Name": "Cups", "SupplierId": 1, "ProviderGroupId": 575, "SupplierName": "Foo Cups" }, ...

Attempting to extract individual strings from a lengthy compilation of headlines

Looking for a solution to organize the output from a news api in Python, which currently appears as a long list of headlines and websites? Here is a sample output: {'status': 'ok', 'totalResults': 38, 'articles': [{ ...

Unable to access the Newtonsoft.Json file or assembly

My current project involves using Json.net in c# to create a json file. After building the code successfully, I managed to generate the parser.exe file without any issues. However, when attempting to run this parser.exe on a different server where it is in ...

Is CoffeeScript used in package.json and bower.json configurations?

I am interested in creating my package.json and bower.json files using CoffeeScript. Even though I am using Gulp, I am still quite inexperienced when it comes to writing Gulp tasks. Is there a way for NPM and Bower to read CoffeeScript configuration file ...

What is the best way to parse this JSON using Jackson?

My JSON data is structured like this: { "summary":{ "somefield1":"somevalue1", "Twilio":{ "field1":"value1", "field2":"value2" }, "Tropo":{ "field1":"value1", "field2":"va ...

Choosing a single random key-value pair from a specific attribute within a JSON schema

Is there a way to generate a new JSON schema based on the existing one, but with only one key-value pair chosen randomly from the "properties" attribute? The new schema should retain the "title" and "type" attributes as well. { "title": "animals object" ...

Discovering how to retrieve data from the payload() function when receiving information from Firestore using an Arduino

Utilizing Arduino to access data from Firestore, a database, has been successful. By using the REST API, I managed to retrieve the payload from Firestore. However, my goal now is to store some of this data into a boolean variable. Here is the information ...

Error encountered while trying to retrieve the response

Currently, I am in the process of developing a script that utilizes node.js, fbgraph API, and the express framework. My task involves sending the user's access_token from an index.html page to the nodejs server via a POST request. I have successfully ...

Is there a way to retrieve particular information from an array in the Facebook Graph API?

I have successfully transformed my object data into an array and now I am facing some difficulties in extracting specific parts from the multidimensional array. Any kind of assistance would be highly appreciated, thank you. /* SDK version 4.0.0 written i ...

Receiving array data in a Javascript function and storing it within a variable

Hello everyone, please take a look at my code below. I am attempting to pass PHP array values to a JavaScript function. When I run the script, I receive alerts for parameter0=1, parameter1=2, and parameter2=3 separately. What I am trying to achieve is to ...

Working with AngularJS: Accessing a JSON file from the local directory

The answers to my previous questions on this topic have not been satisfactory. It seems that most examples and tutorials for AngularJS online always define tiny JSON arrays directly inside the JavaScript code... So, let's simplify things with a basi ...

Tips for correctly adding a JSON output to a dropdown menu option

What is the correct way to add JSON results to a select option? Here is a sample of JSON data. AJAX code example: $.ajax({ url: 'sessions.php', type: 'post', datatype: 'json', data: { from: $ ...

Strategies for sorting through Ansible JSON data efficiently

Utilizing Ansible Automation for Linux Patching. After completing the patching process, I need to extract specific details from the JSON output. Below is the code snippet that does not provide accurate output from the JSON file when executing the playbook ...

What is the best way to interpret a .lua file that contains various tables?

Greetings, I am new to Lua and I have a task of converting a .lua file into JSON format. I successfully achieved this with a file containing a single table structured like this: return { ["Thing"] = { ["SubThing"] = {} } } However, I'm facing diffic ...

Leveraging symbols in JSON with Python

I encountered an issue when working with JSON in Python recently, specifically related to special symbols. The problem can be seen in the following code snippet: import json app = { "text": "°" } print(json.dumps(app, ind ...

What is the correct way to bind pairs (arrays of tuples or multidimensional arrays) in SQLAlchemy?

Can you show me an example of constructing a MySQL query with SQLAlchemy? SELECT * FROM table WHERE (key->>"$.k1", key->>"$.k2") IN ((1, "string1"), (2, "string2")) I attempted to use the text method but encountered an issue: select([table.c ...

RAILS special characters JSON serialization and deserialization: a guide

I'm feeling a bit lost trying to figure out how to serialize and deserialize a simple line of code. I've searched for answers but nothing seems to address this specific issue. Can anyone help me with the following: JSON.parse("É".to_json) JSON: ...