Right-joining a table that uses soft delete in Python SQLAlchemy

Saturday September 02, 2017

Typically when implementing a new SQL table, I use soft deletes over hard deletes. Soft deletes enables one to make audit operations without having a separate analytics system, and in many cases (say, purchases which reference old inventory), soft deletes are a requirement.

The only issue is when you try to join a table with soft deletes, funny things can happen. Take the following query, where we have accounts, and active credit cards for people to perform payment, which can be deleted as a user removes cards from their account.

This query returns joined rows with credit cards that have been soft-deleted - not what we want:

(db.session.query(Account, CreditCard)
.join(PaymentPlan), isouter=True)

The naive solution is to filter:

(db.session.query(Account, CreditCard)
.join(CreditCard, isouter=True)
.filter(CreditCard.deleted_at.is_(None)))

However, that removes the rows that previously had a payment plan, but that payment plan has now been deleted - still not what we want.

(db.session.query(Account, CreditCard)
.join(CreditCard, and_(CreditCard.account_uuid==Account.uuid, CreditCard.deleted_at.is_(None)), isouter=True))

This is the proper query and gives us our desired result. Unfortunately, it’s fairly verbose and isn’t using the intelligent join mechanisms of SQLAlchemy. The .join method is defined on the Query class of SQLAlchemy. After some digging through sqlalchemy/orm/query.py and then sqlalchemy/sql/selectable.py, I was able to monkey patch Query by creating a custom .join_alive method, giving us the behavior we hope for:

from sqlalchemy.sql.selectable import Join, _interpret_as_from
from sqlalchemy.orm.query import Query
from sqlalchemy import and_

def join_alive(self, right, isouter=False, full=False):
    # left reference stolen from sqlalchemy/orm/query.py#L2187
    left = self._entities[0].entity_zero_or_selectable
    left_tbl = _interpret_as_from(left)
    right_tbl = _interpret_as_from(right).self_group()
    onclause = Join._join_condition(left_tbl, right_tbl)
    onclause = and_(
        onclause,
        right.deleted_at.is_(None),
    )
    return self.join(right, onclause, isouter=isouter, full=full)


Query.join_alive = join_alive

Now, lo-and-behold, this code does what we want:

(db.session.query(Account, CreditCard)
            .join_alive(CreditCard, isouter=True))

All of my tables use the deleted_at field, so this code is okay dynamically using that column. A more extensible version of this function might take an onclause parameter, and use the sqlalchemy.and_ function to compose it with the auto-discovered join so that the join criteria doesn’t need to be specified. Moreover, I haven't done much testing on this function to make sure it works in all cases - I'll leave that for another day.


Want to receive similar articles? (No spam, promise!)