Tips for optimizing session.add with various relationships to improve performance

Question

Tips for optimizing session.add with various relationships to improve performance

Below is the model structure of my source code, represented as an array in a dictionary format.

# data structure
user_list = [{user_name: 'A', 
  email: '<a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="8feeeeeecfeeeeeea1ece0e2">[email protected]</a>', 
  items:[{name:'a_itme1', properties:[{1....},{2....}...]}
 ]} * 100]

I am attempting to insert this data into a Postgresql database using SQLAlchemy. The database includes tables for users, entities, attributes, and relationships between users/items and items/properties.

for u in user_list:
  new_user = User(user_name=u.get('user_name'),....)
  session.add(new_user)
  session.flush()
  for item in u.get('items'):
    new_item = Item(name=item.get('name'),.....)
    session.add(new_item)
    session.flush()
    new_item_link = UserItemLink(user_id=new_user.id, item_id=new_item.id,...)
    session.add(new_item_link)
    session.flush()
    for prop in item.properties:
      new_properties = Properties(name=prop.get('name'),...)
      session.add(new_properties)
      session.flush()
      new_prop_link = ItemPropLink(item_id=new_item.id, prop_id=new_properties.id,...)
      session.add(new_prop_link)
      session.flush()
session.commit()

This is the simplified version of my models:

class User(Base):
    __tablename__ = 'user'

    id = Column(Integer, Identity(always=True, start=1, increment=1, minvalue=1, maxvalue=2147483647, cycle=False, cache=1), primary_key=True)
    name = Column(String(20))
    email = Column(String(50))

    user_item_link = relationship('UserItemLink', back_populates='user')

class Item(Base):
    __tablename__ = 'item'

    id = Column(Integer, Identity(always=True, start=1, increment=1, minvalue=1, maxvalue=2147483647, cycle=False, cache=1), primary_key=True)
    name = Column(String(50))
    note = Column(String(50))

    user_item_link = relationship('UserItemLink', back_populates='item')

class Properties(Base):
    __tablename__ = 'properties'

    id = Column(Integer, Identity(always=True, start=1, increment=1, minvalue=1, maxvalue=2147483647, cycle=False, cache=1), primary_key=True)
    name = Column(String(50))
    value = Column(String(50))

    item_prop_link = relationship('ItemPropLink', back_populates='properties')

class UserItemLink(Base):
    __tablename__ = 'user_item_link'

    id = Column(Integer, Identity(always=True, start=1, increment=1, minvalue=1, maxvalue=2147483647, cycle=False, cache=1), primary_key=True)
    user_id = Column(ForeignKey('db.user.id'), nullable=False)
    item_id = Column(ForeignKey('db.item.id'), nullable=False)

The above information has been condensed for clarity. However, I have noticed that there is a significant delay when adding user information sequentially in the current setup, taking around 8 seconds or more for inputting 100 users.

I would appreciate any advice on how to improve the efficiency of Python and SQLAlchemy processes.

python performance for-loop sqlalchemy

Answer 1

Answer №1

Utilizing the established relationships within your models allows for the creation of intricate objects without relying solely on ids:

with Session() as session, session.begin():
    for user in user_list:
        user_item_links = []
        for item_data in user.get('items'):
            item_prop_links = []
            for prop_data in item_data['properties']:
                item_prop_link = ItemPropLink()
                item_prop_link.properties = Properties(name=prop_data.get('name'), value=prop_data.get('value'))
                item_prop_links.append(item_prop_link)
            item = Item(name=item_data.get('name'), item_prop_link=item_prop_links)
            user_item_link = UserItemLink()
            user_item_link.item = item
            user_item_links.append(user_item_link)
        new_user = User(name=user.get('user_name'), email=user.get('email'), user_item_link=user_item_links)
        session.add(new_user)

Upon committing the session, SQLAlchemy will handle setting the foreign keys automatically, eliminating the need for manual flushing.

Answer 2

Utilizing the established relationships within your models allows for the creation of intricate objects without relying solely on ids:

with Session() as session, session.begin():
    for user in user_list:
        user_item_links = []
        for item_data in user.get('items'):
            item_prop_links = []
            for prop_data in item_data['properties']:
                item_prop_link = ItemPropLink()
                item_prop_link.properties = Properties(name=prop_data.get('name'), value=prop_data.get('value'))
                item_prop_links.append(item_prop_link)
            item = Item(name=item_data.get('name'), item_prop_link=item_prop_links)
            user_item_link = UserItemLink()
            user_item_link.item = item
            user_item_links.append(user_item_link)
        new_user = User(name=user.get('user_name'), email=user.get('email'), user_item_link=user_item_links)
        session.add(new_user)

Upon committing the session, SQLAlchemy will handle setting the foreign keys automatically, eliminating the need for manual flushing.

Tips for optimizing session.add with various relationships to improve performance

Answer №1

Similar questions

Has the sympy pretty printing functionality been malfunctioning in the latest version of Jupyter Notebook?

What is the best way to verify if one set of entire words is a subset of another set when

Generate additional rows within a dataset based on a particular column indicating the desired quantity of rows (flatten?)

Transforming DBF documents into CSV style utilizing ydbf

I'm wondering how I can pass an argument in the command line interface and then use it in my Python code. For example, if I were to write "pytest --headless"

Exploring Numpy Arrays through Loops and Searches

Using Selenium in Python to effectively capture and analyze network traffic responses

Using the pandas library, you can save and manage numerous data sets within a single h5 file by utilizing the pd

Creating a user-friendly commands menu button for a telegram bot with the help of telebot

What is the best way to package numpy array data?

Challenges encountered when trying to use Python Selenium for web scraping

Is there a way to sum/subtract an integer column by Business Days from a datetime column?

Is there a way to swap out the "-" symbol in Pandas without affecting the values for pd.eval() in the future?

Could the sluggishness of Selenium Python with Chrome be attributed to cookies causing issues?

Is there a way to quickly obtain a sorted list without any duplicates in just one line

The Chrome Driver indicates compatibility with version 114, but it is actually intended to support version 113 - as mentioned in Selenium Python

Tips for retrieving the xpath of elements within a div using python selenium

Transform a collection of lists containing tuples into a NumPy array

The Django POST request is rejecting due to a missing or incorrect CSRF token, despite having included the token in the form

Tips for restructuring a pandas data frame