What is an apporpriate design pattern when dealing with Pandas and databases?

Question

We're dealing with a lot of "data analysis", basically different sorts of data mangling, aggregations and calculations using Pandas. Usually, the data is time series data.

All underlying data is stored in a SQL database.

As of now, we usually wrap all data access logic into a single "DBWrapper" class that implements methods returning different datasets as pandas Dataframes. (I.e. all database querys are defined here)

Question is: Is there any established architectural/design pattern applicable here?

I have looked at https://martinfowler.com/eaaCatalog/index.html, but cant seem to find anything that is a match. The patterns described in EAA seems to be focused on domain classes.

Current implementations are usually something like

import pandas as pd

class DbWrapper:
    
    def __init__(self) -> None:
        self._db_conn = None
        
    def get_price_data(self, ticker, start_date, end_date) -> pd.DataFrame:
        query = """SELECT * FROM Prices WHERE ticker=?, AND date BETWEEN ? and ?"""
        params = (ticker, start_date, end_date)
        return pd.read_sql(self._db_conn, query, params)
    
    def get_volume_data(self, ticker, start_date, end_date) -> pd.DataFrame:
        query = """SELECT * FROM Volume WHERE ticker=?, AND date BETWEEN ? and ?"""
        params = (ticker, start_date, end_date)
        return pd.read_sql(self._db_conn, query, params)

score 1 · Accepted Answer · answered Dec 21 '22 at 15:43

You don't need a pattern for everything. But your design to have some module that provides functions for various database operations looks a lot like a traditional data access layer (DAL).

https://en.wikipedia.org/wiki/Data_access_layer

A potential concern with DALs is that they can be difficult to mock for unit tests. You're partially alleviating this by using an object instead of a module, but the object still has a fairly large API. Whether this is even a problem depends very much on context.

Related concepts include Active Records (moving the data access into the domain model) and Repositories (roughly comparable to your approach, but more focused on the life cycle of entities).

It is worth noting that a lot of the literature on data access patterns assumes a certain context such as web applications. Your data analysis context might have different needs. Notably, data analysis deals with data, not so much with entities and business rules.

OK, that makes sense. What is your opinion of the name of the class? — highviolet, Dec 22 '22 at 06:32
@highviolet I don't think the name is terribly important here. Your name `DbWrapper` is fine. I'd have called it `DAL`, `DB`, or `DataSource`. — amon, Dec 22 '22 at 11:42

What is an apporpriate design pattern when dealing with Pandas and databases?

1 Answers1