Tips for optimizing session.add with various relationships to improve performance

Below is the model structure of my source code, represented as an array in a dictionary format.

# data structure
user_list = [{user_name: 'A', 
  email: '<a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="8feeeeeecfeeeeeea1ece0e2">[email protected]</a>', 
  items:[{name:'a_itme1', properties:[{1....},{2....}...]}
 ]} * 100]

I am attempting to insert this data into a Postgresql database using SQLAlchemy. The database includes tables for users, entities, attributes, and relationships between users/items and items/properties.

for u in user_list:
  new_user = User(user_name=u.get('user_name'),....)
  session.add(new_user)
  session.flush()
  for item in u.get('items'):
    new_item = Item(name=item.get('name'),.....)
    session.add(new_item)
    session.flush()
    new_item_link = UserItemLink(user_id=new_user.id, item_id=new_item.id,...)
    session.add(new_item_link)
    session.flush()
    for prop in item.properties:
      new_properties = Properties(name=prop.get('name'),...)
      session.add(new_properties)
      session.flush()
      new_prop_link = ItemPropLink(item_id=new_item.id, prop_id=new_properties.id,...)
      session.add(new_prop_link)
      session.flush()
session.commit()

This is the simplified version of my models:

class User(Base):
    __tablename__ = 'user'

    id = Column(Integer, Identity(always=True, start=1, increment=1, minvalue=1, maxvalue=2147483647, cycle=False, cache=1), primary_key=True)
    name = Column(String(20))
    email = Column(String(50))

    user_item_link = relationship('UserItemLink', back_populates='user')

class Item(Base):
    __tablename__ = 'item'

    id = Column(Integer, Identity(always=True, start=1, increment=1, minvalue=1, maxvalue=2147483647, cycle=False, cache=1), primary_key=True)
    name = Column(String(50))
    note = Column(String(50))

    user_item_link = relationship('UserItemLink', back_populates='item')

class Properties(Base):
    __tablename__ = 'properties'

    id = Column(Integer, Identity(always=True, start=1, increment=1, minvalue=1, maxvalue=2147483647, cycle=False, cache=1), primary_key=True)
    name = Column(String(50))
    value = Column(String(50))

    item_prop_link = relationship('ItemPropLink', back_populates='properties')

class UserItemLink(Base):
    __tablename__ = 'user_item_link'

    id = Column(Integer, Identity(always=True, start=1, increment=1, minvalue=1, maxvalue=2147483647, cycle=False, cache=1), primary_key=True)
    user_id = Column(ForeignKey('db.user.id'), nullable=False)
    item_id = Column(ForeignKey('db.item.id'), nullable=False)

The above information has been condensed for clarity. However, I have noticed that there is a significant delay when adding user information sequentially in the current setup, taking around 8 seconds or more for inputting 100 users.

I would appreciate any advice on how to improve the efficiency of Python and SQLAlchemy processes.

Answer №1

Utilizing the established relationships within your models allows for the creation of intricate objects without relying solely on ids:

with Session() as session, session.begin():
    for user in user_list:
        user_item_links = []
        for item_data in user.get('items'):
            item_prop_links = []
            for prop_data in item_data['properties']:
                item_prop_link = ItemPropLink()
                item_prop_link.properties = Properties(name=prop_data.get('name'), value=prop_data.get('value'))
                item_prop_links.append(item_prop_link)
            item = Item(name=item_data.get('name'), item_prop_link=item_prop_links)
            user_item_link = UserItemLink()
            user_item_link.item = item
            user_item_links.append(user_item_link)
        new_user = User(name=user.get('user_name'), email=user.get('email'), user_item_link=user_item_links)
        session.add(new_user)

Upon committing the session, SQLAlchemy will handle setting the foreign keys automatically, eliminating the need for manual flushing.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Has the sympy pretty printing functionality been malfunctioning in the latest version of Jupyter Notebook?

Previously, I utilized pretty printing of math in the ipython notebook. However, after upgrading to jupyter (along with many other ipython-related packages), the pretty printing functionality no longer behaves as it did before. To set it up, I typically us ...

What is the best way to verify if one set of entire words is a subset of another set when

words = 'NotAllowed,Trinity,Allowed' selected_word = 'NotAllowed,Allowe' name = frozenset(selected_word) if name.issubset(words) == 1: print 'yes' else: print 'no' The code above outputs 'yes' ba ...

Generate additional rows within a dataset based on a particular column indicating the desired quantity of rows (flatten?)

I am currently working with data that is structured in the following format: ID qi di b start_date end_date delta_t 0 1232111 363.639856 0.902817 2.000000e+01 2020-07-01 2021-05-05 230 1 2445252 304.3775 ...

Transforming DBF documents into CSV style utilizing ydbf

I've been working on converting a specific DBF file into CSV format, and I'm encountering an issue with one particular value in the script. My tool of choice for this task is the ydbf package in Python. So far, I have a total of 598 DBF files th ...

I'm wondering how I can pass an argument in the command line interface and then use it in my Python code. For example, if I were to write "pytest --headless"

How can I modify conftest.py to enable headless mode when running test.py with the command pytest --headless in the terminal? By default, it should run in regular mode (show browser). conftest.py: def pytest_addoption(parser): parser.addoption("- ...

Exploring Numpy Arrays through Loops and Searches

Currently, I am looping through a numpy array and performing a search operation which is taking approximately 60 seconds to complete for arrays (npArray1 and npArray2 in the sample code) containing around 300,000 values. To elaborate, I am seeking the ind ...

Using Selenium in Python to effectively capture and analyze network traffic responses

I'm encountering a problem where I can't seem to access the network traffic responses in Firefox using Selenium (Python). While solutions are available for the Chrome webdriver, my specific requirement is to work with the Firefox version. Despite ...

Using the pandas library, you can save and manage numerous data sets within a single h5 file by utilizing the pd

If I have two different dataframes, import pandas as pd df1 = pd.DataFrame({'col1':[0,2,3,2],'col2':[1,0,0,1]}) df2 = pd.DataFrame({'col12':[0,1,2,1],'col22':[1,1,1,1]}) After successfully storing df1 with the comm ...

Creating a user-friendly commands menu button for a telegram bot with the help of telebot

I've been struggling to get the list of commands set up for my telegram bot, but for some reason it's not working. Here is the code I've been using: bot = telebot.TeleBot("TOKEN") bot.set_my_commands(commands = ['start',& ...

What is the best way to package numpy array data?

My goal is to create a custom class that extends the base type of numpy array, class CustomArray(numpy.ndarray): @classmethod def init_from_data(cls, ...): cls(numpy.empty(...)) However, I encountered an issue where multi-dimensional array types ...

Challenges encountered when trying to use Python Selenium for web scraping

from selenium import webdriver import os os.chdir("C:\\Users\\czoca\\PycharmProjects\\pythonProject4") driver = webdriver.Chrome() driver.get("https://www.sitetoscrape.com/page1.html") driver.implicitly_wait(10) ele ...

Is there a way to sum/subtract an integer column by Business Days from a datetime column?

Here is a sample of my data frame: ID Number of Days Off First Day Off A01 3 16/03/2021 B01 10 24/03/2021 C02 3 31/03/2021 D03 2 02/04/2021 I am looking for a way to calculate the "First Day Back from Time Off" column. I attempted to use it ...

Is there a way to swap out the "-" symbol in Pandas without affecting the values for pd.eval() in the future?

Whenever I try to replace the "-" character in my data, it affects other values as well. df['land_area'] = df['land_area'].str.replace("-", '0') I need to ensure that the evaluation process will run smoothly without any comp ...

Could the sluggishness of Selenium Python with Chrome be attributed to cookies causing issues?

After researching, I discovered that Selenium Chrome may perform better if certain optimizations are applied such as implicit waits, headless mode, usage of ID and CSS selectors, etc. However, before making these adjustments, I am concerned about whether c ...

Is there a way to quickly obtain a sorted list without any duplicates in just one line

Since the return value of sort() is None, the code snippet below will not achieve the desired result: def get_sorted_unique_items(arr): return list(set(arr)).sort() Do you have any suggestions for a more effective solution? ...

The Chrome Driver indicates compatibility with version 114, but it is actually intended to support version 113 - as mentioned in Selenium Python

from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.chrome.service import Service from webdriver_manager.chrome import ChromeDriverManager from time import sleep driver = webdriver.Chrome(service=Service(Chr ...

Tips for retrieving the xpath of elements within a div using python selenium

<div class="col-md-6 profile-card_col"> <span class="label">Company Name :</span> <p class="value">Aspad Foolad Asia</p> </div> For a while now, I've been trying to troublesho ...

Transform a collection of lists containing tuples into a NumPy array

Just diving into the world of python and finding myself stuck on what seems to be a simple issue. Here's the list of coordinates I'm working with: [(-80983.957, 175470.593, 393.486), (-80994.122, 175469.889, 394.391), (-80996.591, 175469.757, 3 ...

The Django POST request is rejecting due to a missing or incorrect CSRF token, despite having included the token in the form

I'm encountering a 403 response when making a POST request despite including csrf_token in the data for an AJAX request. I made sure that csrf_token is not empty before sending the request, so everything seems correct. What could be causing this error ...

Tips for restructuring a pandas data frame

I have a dataframe that looks like this: id points 0 1 (2,3) 1 1 (2,4) 2 1 (4,6) 3 5 (6,7) 4 5 (8,9) My goal is to transform it into the following format: id points 0 1 (2,3), (2,4), (4,6) 1 5 (6,7), (8,9) Can anyone pro ...