Creating a dynamically-named dataframe in PySpark based on the configuration settings

I am in need of constructing the final dataframe name dynamically based on the configuration (by joining final_df and suffix). Whenever I execute the code provided below, it throws an error - "SyntaxError: can't assign to operator". Strangely, if I replace each["final_df"]+'_'+ each["suffix"] with a different name, the code works flawlessly.

Data :

df_source_1 = spark.createDataFrame(
        [
          (123,10),
          (123,15),
          (123,20)
        ],
        ("cust_id", "value")
    )

Configuration:

config = """
                [ 
                  {
                      "source_df":"df_source_1",
                      "suffix": "new", 
                      "group":["cust_id"],
                      "final_df": "df_taregt_1"
                  }
                ]
                """   

Code:

import json   
for each in json.loads(config):
    print("Before=",each['final_df'] ) # str object
    print(each["final_df"]+'_'+ each["suffix"]) # df_taregt_1_new , print statement works
    each["final_df"]+'_'+ each["suffix"] = eval(each["source_df"]).groupBy(each["group"]).agg(sum("value")) # Errors out. Here I need to assign the dataframe to df_taregt_1_new

Any assistance would be greatly appreciated.

Answer №1

Using a dictionary for organized coding:

data_dict = {}
data_dict["data_source_1"] = spark.createDataFrame(
    [(123, 10), (123, 15), (123, 20)], ("cust_id", "value")
)

for item in json.loads(config):
    data_dict[item["final_data"] + "_" + item["suffix"]] = (
        data_dict[item["source_data"]].groupBy(item["group_by"]).agg(sum("value"))
    )

Rather than dealing with dynamically created objects, using a dictionary to store all relevant objects aids in better organization and easy reference. Testing the dictionary allows verification of object existence.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Unable to communicate over a socket that has been connected using socket.bind()

I'm currently developing a software that is supposed to retrieve a packet, and then transfer it to another port using the socket.send() method. However, I am facing an issue where after attempting to send the message with a copied packet, nothing seem ...

Utilizing BeautifulSoup to Extract Links

Currently, I am extracting cricket schedules from a specific website using Python's Beatiful Soup library. The webpage I am scraping corresponds to the following URL: www.ecb.c0.uk/stats/fixtures-results?m=1&y=2016 This particular link displays ...

Python 3.10 raised an UnboundLocalError indicating that the variable 'driver' was referenced before being assigned

Just starting out with Python, need some help import pytest from selenium import webdriver @pytest.fixture() def setup(browser): if browser == "chrome": driver = webdriver.Chrome() print("Launching chrome browser.... ...

"Revolutionary adornment for dynamically matching regular expressions in class methods

When faced with the need to simplify method decoration, especially in situations like delegate class implementations, it can become quite cumbersome. Imagine a 3rd party "service" class with numerous methods You find yourself wanting to override many of t ...

Adding additional keywords to the header of a fits file using astropy's io module

I've been attempting to add new cards to the primary header of an existing FITS file. Despite receiving a 'successful' message in the terminal, when I view the header info in DS9, my new card is not displayed. It seems like my changes are no ...

What is the best way to analyze Azure SQL data across development, testing, and production environments using Python?

My task involves comparing the number of rows in Azure SQL databases to ensure data quality. With approximately 50 database tables and occasional additions of new tables, I aim to develop a Python script that can connect to three environments and generate ...

Setting a list as a value in a dataframe - a step-by-step guide

My goal is to populate cells in a table with the monthly activity data of GitHub users, organized by years and months (e.g., 2021_01, 2022_10). https://i.stack.imgur.com/BGwfx.jpg The Xpath for this information can be found here: //*[@id="js-contrib ...

Visualizing data with Matplotlib: Histogram displaying frequency in thousands

I am currently working on a histogram in matplotlib that contains around 260,000 data points. The issue I am facing is that the y-axis on the histogram displays high numbers like 100,000. What I would prefer is to have the y labels represent thousands ins ...

The Corona Simulator ceased to function properly once it established a connection with the server

Having an issue with two server files when working with the Corona simulator. One file is functioning properly while the other isn't. I am struggling to identify the difference between these two files. Below is the code for both servers: Non-working ...

Using a Client Certificate from Azure Key Vault in a Python Azure Function App HTTPS Request

My aim is to initiate an HTTPS-Request from a Python Azure Function App while authenticating the request using a client certificate stored in an Azure Key Vault. During testing, I successfully executed the HTTPS-Request with the pkcs12 container loaded fr ...

Integrating tkinter GUI with selenium testing

As I delve into using the Tkinter plugin for the first time, my knowledge is limited to what I could gather from tutorials. Unlike the answers I found, which suggest putting a class inside the Python file being built, I have a pre-compiled Test class with ...

Swapping out an existing line of code

Currently, I'm working in Python and attempting to develop a "loading screen" that shows repeated dots ("..."), pauses for a second, and then replaces them with another set of dots. Is there a way to achieve this effect in Python? Any guidance on how ...

Grouping Closeby Shapes/Enclosing Boxes

I am trying to analyze an image that contains several rectangular shapes that are not very clear: https://i.stack.imgur.com/0JTGP.png My goal is to group these nearby rectangles together to achieve a final output like this: https://i.stack.imgur.com/UWI1 ...

Methods for deleting all instances of a substring within a collection of lists

I have a collection of strings in Python obtained from reading a .DAT file, structured like this: datContent = [['\x00\x00\x00\x00\x00\x00NGDUID\x00\x00\x00\x00\x00C\SAMPLEx00\x00\x ...

incorrectly installing pip to folder despite correct `which pip` path

Currently running Mac OS X 10.10, my goal is to utilize pip for installing packages intended for my homebrew-installed version of Python (located in /usr/local/bin/python, which serves as an alias pointing to /usr/local/Cellar/python/2.7.11/Frameworks/Pyth ...

Using the list as keys is the best way to eliminate duplicate values from a nested

Initially, I had two regular lists named files and g_list. My goal was to eliminate duplicates from files and synchronize it with g_list. I came across this solution: from collections import OrderedDict as odict od = odict.fromkeys(zip(files, g_list) ...

Implementing the Fill Down Algorithm with Pandas

Within the dataset provided, I am tasked with filling in the 'Parent' column as follows: All values in the column should be labeled as CISCO except for rows 0 and 7 which are to remain blank. It is noteworthy that 'CISCO' appears in th ...

How can I update the code to remove the DeprecationWarning caused by using asyncio.get_event_loop()?

After reviewing the code below, I have come across some confusing warnings: DeprecationWarning: There is no current event loop loop = asyncio.get_event_loop() and also, DeprecationWarning: There is no current event loop loop.run_until_complete(asyncio ...

I attempted to access the website through selenium using this code, but to my surprise, the site shut down immediately upon opening

Upon running this code, the webpage quickly opens and then just as swiftly closes. Despite using the most up-to-date version, I am unable to access the page. It is worth noting that I am a Windows user. The intention behind running this code was to access ...

The dashboards list could not be loaded due to an error in decoding the JSON format within Apache Superset

I encountered an issue while fetching dashboards in Superset. The error message states: ERROR:root:Expecting ',' delimiter: line 1 column 106 (char 105) I observed this error while monitoring the pod. On the front-end, only the following messa ...