Retrieving data from a JSON object stored within a database column

There is a dataframe presented below:

+-------+--------------------------------
|__key__|______value____________________| 
|  1    | {"name":"John", "age": 34}    |
|  2    | {"name":"Rose", "age": 50}    |

The goal is to extract all the age values from this dataframe and store them in an array.

val x = df_clean.withColumn("value", col("value.age"))
x.show(false)

An error occurs when trying to execute this operation.

Error message: Exception in thread "main" org.apache.spark.sql.AnalysisException: Can't extract value from value#89: need struct type but got string;

How can this issue be resolved?

EDIT

 val schema = existingSparkSession.read.json(df_clean.select("value").as[String]).schema
    val my_json = df_clean.select(from_json(col("value"), schema).alias("jsonValue"))
    my_json.printSchema()
    val df_final = my_json.withColumn("age", col("jsonValue.age"))
    df_final.show(false)

No errors are thrown after implementing these changes. However, there is no visible output as well.

EDIT 2

println("---+++++--------")
df_clean.select("value").take(1)
println("---+++++--------")

Output:

---+++++--------
---+++++--------

Answer №1

To generate a schema for long JSON data, you can utilize the from_json function along with a schema.

import org.apache.spark.sql.functions._

val df = Seq(
  (1, "{\"name\":\"John\", \"age\": 34}"),
  (2, "{\"name\":\"Rose\", \"age\": 50}")
).toDF("key", "value")

val schema = spark.read.json(df.select("value").as[String]).schema

val resultDF = df.withColumn("value", from_json($"value", schema))

resultDF.show(false)
resultDF.printSchema()

Expected Output:

+---+----------+
|key|value     |
+---+----------+
|1  |{34, John}|
|2  |{50, Rose}|
+---+----------+

Generated Schema:

root
 |-- key: integer (nullable = false)
 |-- value: struct (nullable = true)
 |    |-- age: long (nullable = true)
 |    |-- name: string (nullable = true)

If you need to access nested fields directly, you can use the get_json_object function.

df.withColumn("name", get_json_object($"value", "$.name"))
  .withColumn("age", get_json_object($"value", "$.age"))
  .show(false)

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

The system is unable to locate a supporting entity with the identifier '[object Object]', as it is classified as an 'object'

I'm currently working on an Angular 2 application where I am retrieving data from an API and receiving JSON in the following format. { "makes": null, "models": null, "trims": null, "years": null, "assetTypes": { "2": "Auto ...

Using Node.js, I am utilizing a Javascript API to perform a POST request, reading data

Utilizing JavaScript, we perform an API POST request using an external JSON file. The setup involves two main files: order.json and app.js Server information: Ubuntu 22.04 Node v19.2.0 npm 8.19.3 The script retrieves data from the order.js file using the ...

Struggling to convert JSON data into a variable, but all I'm getting is a null response

I am facing an issue where I am attempting to convert a JSON file into a PHP variable, but the result is that the PHP variable ends up being null. Here is the PHP code I have written: $heroes['names'] = json_decode(file_get_contents("file://D:/X ...

Creating an efficient post request using retrofit

Currently, I am delving into the world of Retrofit as it appears to provide solutions to many of the challenges I face with JSON requests and their handling. It is clear that interfaces play a key role in defining the methods utilized, particularly when m ...

What is the best way to transform a JSON object from a remote source into an Array using JavaScript?

Attempting to transform the JSON object retrieved from my Icecast server into an array for easy access to current listener statistics to display in HTML. Below is the JavaScript code being used: const endpoint = 'http://stream.8k.nz:8000/status-json ...

I'm puzzled as to why my HTTP request in Node.js is returning an empty body

Currently, I am facing an issue where I am sending a HTTP request to a server and expecting a response that contains all the necessary information. However, when attempting to parse the body using .parseJSON(), an exception is thrown indicating that the ...

How can a custom format structure be established for the json export feature in Scrapy? If it is possible, what is the process for doing so

As a beginner in the world of Python and Scrapy, I am struggling with the complexities of Scrapy documentation. Despite successfully creating a spider for my school project to scrape data, I am facing issues with the formatting in JSON export. Here is a sn ...

JSON representing an array of strings in JavaScript

Encountering difficulties when trying to pass two arrays of strings as arguments in JSON format to call an ASMX Web Service method using jQuery's "POST". The Web Method looks like this: [ScriptMethod(ResponseFormat=ResponseFormat.Json)] publ ...

An issue arises when using JSON.parse() with regular expression values

I am encountering an issue with parsing a JSON string encoded with PHP 5.2 json_encode(). Here is the JSON string: {"foo":"\\."} Although this JSON string is valid according to jsonlint.com, when using the native JSON.parse() method in Chrome a ...

Combining JSON elements using JsonPath (JayWay)

Given a basic json structure: { "Keys": [ {"Format": "A", "Subtype": "A1"}, {"Format": "A", "Subtype": "A2"}, {"Format": "B", "Subtype": "A1"}] } I am looking to create a new output by combining the Format and Subtype values using JsonPath expres ...

Saving data in a JSON format within the browser's localStorage, whether it be a nested object-tree or a collection

In a scenario with two JSON objects, Car and Wheel, where Car contains a collection of Wheel along with other primitive-type properties: Car: Name Speed Wheels Wheel: Thickness Friction When considering saving this car using localStorage, there are ...

Retrieve the JSON data once the user has applied the filter

Is there a more efficient way to improve the performance of handling an extremely large JSON file with 5000 keys and 5 values each? Currently, I am using AngularJS with a backend in Drupal 7. Here is my view: <ul class="list-group border-0"> < ...

Tips for implementing an Item loader in my Scrapy spider script?

After working on a Scrapy spider to extract news articles and data from a website, I encountered an issue with excessive whitespace in one of the items. Seeking a solution, I came across recommendations for using an Item Loader in the Scrapy documentation ...

Customizing push notifications for multiple servers on iOS with unique messages

In my experience setting up push notifications in iOS projects, I have encountered an issue with message format when changing servers. When using my local machine or a dedicated server, the messages come through correctly. However, switching to web hosting ...

Iterating Through an Array using Foreach Loop in PHP after using json_decode

I am having some issues with a foreach loop that I'm trying to run through an array. Specifically, I keep coming across the error message "Undefined index: text". Here is the code snippet in question: $tweets = json_decode($response,true); foreach ( ...

Could one potentially use jQuery to navigate through JSON output?

My goal is to generate a JSON list that includes CSS classes and their respective URL records. Here's an example: var jsonList = [{ "CSSClass": "testclass1", "VideoUrl": "/Movies/movie.flv" }, { "CSSClass": "testclass2", "VideoUrl": "/Movies/ ...

Searching for a specific value in various Json files: A guide

My goal is to create an application where users can input longitude and latitude coordinates of a location, and the application will return the associated grid code from one of three JSON data files. I am attempting to search through all three files simult ...

What is the best way to send an HTTP request in AngularJS to receive data in JSON format?

I am trying to create an AngularJS app that can send HTTP requests for JSON data. I have written the code in my index.html file to request JSON data using AngularJS, but for some reason, the JSON data is not being printed. When I check the console in Fire ...

How to parse JSON in JavaScript/jQuery while preserving the original order

Below is a json object that I have. var json1 = {"00" : "00", "15" : "15", "30" : "30", "45" : "45"}; I am trying to populate a select element using the above json in the following way. var selElem = $('<select>', {'name' : nam ...

Efficiently loading data using lazy loading in jsTree

I have been facing a challenge with dynamically loading the nodes of a jtree upon expansion. The limited documentation I could find is located towards the end of this specific page. Several solutions involve creating nodes individually using a loop, simil ...