We're dealing with a lot of "data analysis", basically different sorts of data mangling, aggregations and calculations using Pandas. Usually, the data is time series data.
All underlying data is stored in a SQL database.
As of now, we usually wrap all data access logic into a single "DBWrapper" class that implements methods returning different datasets as pandas Dataframes. (I.e. all database querys are defined here)
Question is: Is there any established architectural/design pattern applicable here?
I have looked at https://martinfowler.com/eaaCatalog/index.html, but cant seem to find anything that is a match. The patterns described in EAA seems to be focused on domain classes.
Current implementations are usually something like
import pandas as pd
class DbWrapper:
def __init__(self) -> None:
self._db_conn = None
def get_price_data(self, ticker, start_date, end_date) -> pd.DataFrame:
query = """SELECT * FROM Prices WHERE ticker=?, AND date BETWEEN ? and ?"""
params = (ticker, start_date, end_date)
return pd.read_sql(self._db_conn, query, params)
def get_volume_data(self, ticker, start_date, end_date) -> pd.DataFrame:
query = """SELECT * FROM Volume WHERE ticker=?, AND date BETWEEN ? and ?"""
params = (ticker, start_date, end_date)
return pd.read_sql(self._db_conn, query, params)