1

I work with SQL Alchemy a lot and, as a "lazy programmer", I enjoy the convenience it offers. But the "diligent programmer" in me often worries about optimisation and the performance of the queries and more the performance of my application in respect to those queries.

A major pitfall that programmers face when using an ORM is doing work in your application that should be done on the database side.

So here I am, wondering when to sacrifice performance for convenience. For example, in my application I might often need to do something of the sort:

Convenient

users = User.query.filter_by(some_column=True).all()  # list of User objects (all columns)
refined_users = []
for user in users:  # do something with a couple of columns
    refined_users.append((user.name, user.age))

Optimised

users = session.query(User.name, User.age).filter(User.some_column==True).all()  # list of tuples

So this is a pretty simple and contrived example where you would clearly opt for the optimised option. But in cases where I have to query many columns, it becomes tedious. I also like the convenience of accessing columns as user.name whereas the optimised example just returns the data in a list of a tuples (i.e.: no column names).

So whilst querying the entire object (i.e.: every column) is convenient and most likely doesn't have that big an impact on my database, it still feels a little dirty knowing that I can be nicer to my database.

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
turnip
  • 1,657
  • 2
  • 15
  • 21
  • I think this is mostly going to be a judgement call on your part. You will not typically see a huge performance difference. If there were a noticeable difference, you'd find it when profiling. If you're not profiling yet, your code is fast enough. Avoid premature optimization. – amon Oct 02 '18 at 17:05
  • In this particular case, the "optimized" call is clearer to me. However, Robert Harvey has the right answer. You do it when it's necessary. – Berin Loritsch Oct 02 '18 at 19:08

2 Answers2

5

The way you make this decision is by optimizing when you need to.

There are many scenarios (e.g. a single-record data-entry form) where pulling all of the fields is perfectly acceptable, since you're only working with a single record and/or you're going to use most or all of the fields anyway.

Conversely, there are instances where you're only pulling one or two fields and it doesn't make sense to retrieve all of them. Your scenario lies somewhere in the middle.

How do you make the decision to optimize?

  1. Identify an actual performance problem that is causing a significant impact.
  2. Measure the actual performance, using a profiler or timers.
  3. Optimize those areas where the improvement in performance exceeds the additional effort required.
Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
0

The slow part of a database operation is the data transfer: disk to memory, and then the network transfer from server to client. Retrieving only select columns does nothing to reduce disk to memory transfer for row-storage databases, but may reduce network transfer time. The difference will be largest if you are retrieving only a few columns from wide rows.

ORMs may produce queries that optimize poorly compared or fail to take advantage of existing indices, but that's not happening here. Don't worry, be happy.

kevin cline
  • 33,608
  • 3
  • 71
  • 142