Create a new field in a DynamicFrame using AWS Glue and set its value based on the value

As a beginner in AWS Glue and Pyspark, I'm facing some challenges with a transformation task. My issue involves working with two DynamicFrames; one contains values in a specific column that need to be added as a new column in the other DynamicFrame. The values in this column should correspond to values from another column with matching IDs in the first table. Here is an example of the initial data structure:

Table 1             Table2
+--+-----+-----+    +--+-----+-----+
|id|name |value|    |id|col1 |col2 |
+--+-----+-----+    +--+-----+-----+
| 1|name1| 10  |    | 1|str1 |val1 |
+--+-----+-----+    +--+-----+-----+
| 2|name2| 20  |    | 2|str2 |val2 |
+--+-----+-----+    +--+-----+-----+

The desired output format should look like this:

Table2
+--+-----+-----+-----+-----+
|id|col1 |col2 |name1|name2|
+--+-----+-----+-----+-----+
| 1|str1 |val1 | 10  |     |  <--- add 10 only here because the id from the row in the first table must match the id from the second table
+--+-----+-----+-----+-----+
| 2|str2 |val2 |     | 20  |  <--- add 20 only here because the id from the row in the first table must match the id from the second table
+--+-----+-----+-----+-----+

Answer №1

Let's consider two dataframes with the names df1 and df2.

new_df = df1.groupBy('id').pivot('name').sum('value')
final_df = df2.join(new_df, on='id', how='inner')
final_df.show(truncate=False)

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Instructions for uploading multiple files to a website with Python and Selenium

Having trouble uploading multiple PGP files to a website using Python and Selenium. The script seems to only recognize the first file and not the rest. Any suggestions on how to fix this issue? I've checked my code, which includes the use of Selenium ...

Exploring the concept of nesting control dependencies in tensorflow

When executing the test below from unittest import TestCase import tensorflow as tf class TestControl(TestCase): def test_control_dep(self): print(tf.__version__) a = tf.get_variable('a', initializer=tf.constant(0.0)) d_optim = ...

Exploring Python's Dictionary Manipulation

In Python, I have crafted a dictionary where words from a given text are stored as keys and the number of times they appear in the text is tracked as the corresponding values. This dictionary has been sorted based on the frequency of occurrence in descendi ...

Every time I try to utilize the socket module in Python, I encounter the issue: 'socket.gaierror: [Errno 11001] getaddrinfo failed'

Currently, I am working on a straightforward script that sends messages from a client to a server. Everything works smoothly when the client and server are on the same computer. However, an issue arises when attempting to send a message from a different la ...

Press the button within the table as its name undergoes periodic changes

I am using Python along with Selenium to automate the process of selecting and reserving a room on a website. The site displays a table containing available rooms, and my goal is to locate a specific room and click on the corresponding button within the ta ...

Using Python dictionaries: When the keys match, simply multiply their corresponding values

I have a code snippet where I am trying to compare two dictionaries and multiply values if the keys match. Here is what I have so far: dict_a = { 'r':100, 'y':110, 'a':210 } print('Enter The Number Of It ...

Executing PHP code that imports a Python library

My attempt to execute a Python script from a PHP file is encountering issues when it comes to loading a local library. Surprisingly, the PHP successfully calls the Python script without the local libraries, and manually launching the Python script works fl ...

Tips for aligning tick labels with superscript numbers in matplotlib

In my current project, I'm working on generating a figure with the x-axis set to a base-10 log scale. I want the labels to display as plain numbers (1, 10, 100) for shorter values and in an abbreviated format with superscripts when they are longer ($1 ...

Float data type not supported in BMI Calculator

I have created the following script: # function to determine BMI and display a result based on user input def calculate_bmi(weight, height): bmi_value = weight // (height ** 2) if bmi_value < 18.5: print("Underweight "), print(bmi_value ...

What is the best way to isolate specific strings that begin with a symbol from a text file using Python?

Is there a way to extract words that begin with the symbol '$' from a text file? Here is an example of the text file (ascii): @ExtendedAttr = nvp_add(@ExtendedAttr, "severity", $severity, "description", $description, "eventID", $eventID, ...

Guide to harvesting popular searches on Google

My challenge involves scraping Google Hot Trends. Initially, I attempted to utilize Chrome developer tools to capture all requests, however, no requests were being made. Therefore, I turned to using selenium, but encountered difficulties in fetching the ...

What could be causing the intermittent StaleElementReferenceException with Firefox when using splinter's is_text_present() function, while not experiencing the issue in Chrome or

Recently, I updated my test environment to the newest versions of selenium, splinter, and Firefox. However, one of my tests now shows failures about 80% of the time when using Firefox. The error message is as follows: Traceback (most recent call last): ...

Is there a way to determine the quantity of boxes based on their specific placement?

Currently working on a python memory game, I am struggling with determining the number of boxes that the user clicks based on the cursor's position. This is what I have so far: number = ev.pos[y]//boxsize*numboxsx+ev.pos[x]//boxsize (this calculati ...

"Function is missing a parameter during deployment on Heroku, whereas it works fine when running locally

While attempting to deploy my Django app on Heroku, I encountered an error after running git push heroku master. The error occurred when executing python manage.py collectstatic --noinput. Error Traceback: remote: -----> $ python orphantracker/orphan ...

How can I display the y-axis values on a histogram plot created in Python?

I'm having trouble getting my y-axis values to print in increments of 100 for a histogram plot in Python. Everything else seems to be working correctly. Below is the code I'm using: #!/usr/local/bin/python3 import math from sys import argv from ...

Encountering problems with Python type annotations when inheriting types and overloading members

Here is an example below using Python 3.7 where I am struggling to correctly annotate my code. Mypy is showing errors in the annotations which are explained in comments. I have a "generic class" that contains "generic members", and concrete classes along ...

Obtaining Array of Items using XPath in Selenium

Currently, I am working on automating my monthly expense report process through Selenium and Python. Although I can successfully log in and navigate to the list of expenses, dates, and locations, I am encountering difficulties when trying to save them to a ...

Generating graphs of modulus functions using matplotlib

As someone who is new to matplotlib and plotting, I am attempting to plot the function |x| + |y| = 0.1 using matplotlib. However, I am struggling with the syntax provided. Is there a way to achieve this using a single function? Additionally, when I incre ...

Python - BeautifulSoup - NoneType

Hello everyone. I've been working on scraping some data from a property listing website using Beautifulsoup and Requests. Although I'm able to retrieve the data I need, I keep encountering an error when I try to add .text. Any assistance would ...

Discover your screen orientation using Python

I recently developed a Python game using Pygame which operates flawlessly in both Portrait and Landscape orientations. However, I encountered an issue when the user rotates their device while the game is running, causing everything to appear jumbled up on ...