Abstracting constrained strings in serializer. Good or bad practice?

Question

I'm using FastAPI and in my schemas (that is, serializers) I have something like this:

from pydantic import StrictStr, BaseModel

class Str255(StrictStr):
    max_length = 255

# my schemas: 
class CatSchema(BaseModel):
   name: Str255
   owner: Str255
   age: int

The reason I use Str255 is to apply DRY, by avoiding the following:

# my schemas: 
class CatSchema(BaseModel):
   name: StrictStr(max_length=255)
   owner: StrictStr(max_length=255)
   age: int

However, I might have to define 5 or 10 Str variations: Str10, Str50, etc.

Is it good practice to create those classes or should I keep it as StrictStr(max_length=...)? Does the answer change if I have 40 variations of Str?

I don't see how `Str255` is really any more abstract than `StrictString(max_length=255)`; in both cases the reader sees something which describes a string with a length of 255, so seem to have exactly the same level of abstraction (i.e. it is not adding any generalisation nor hiding any information), and no more or less 'DRY' than each other. It looks more like abbreviation, which is often something to be wary of for anything with wide scope/visibility and likely to be reused in a lot of places - https://softwareengineering.stackexchange.com/questions/24077/are-short-identifiers-bad/ — Ben Cottrell, May 02 '23 at 16:46

score 11 · Answer 1 · answered May 02 '23 at 16:41

Your idea is a bit like defining a constant SEVEN = 7 because you want to avoid magic constants like "7" in your code. This kind of misses the point about naming such values. "DRY" shouldn't be understood as eliminating all repetition, but as ensuring that concepts in your code are defined once explicitly, not just implicitly and all over the place.

In my experience, there are two good reasons for giving something a name.

1. We might name something because it has some meaning. The name would express the "why", not the "how". For example, I assume that you have this repeated length limit of 255 chars due to security reasons or database constraints, not just because you like this number. Then give this limit a clear name. If the same limit is used in two places for different reasons, it is likely they should be using different names as well. Such code is without unnecessary repetition because each concept is defined and named once.

2. Sometimes, a pattern is repeated throughout the code base so frequently that you have to create an abstraction to simplify the code. This isn't always worth it. Every abstraction makes the overall code a bit more complex, but sometimes it can hide more complexity that it can introduce. There is also the risk that you abstract over two occurrences of a pattern that currently happen to be the same, but for entirely different reasons. If you then want to change the code only in one place, there's a risk of accidentally also affecting unrelated code that uses the same abstraction. Blind application of "DRY" can make it really really hard to maintain code in the future.

So I'd suggest being hesitant to introduce abstractions like this, unless you can give that abstraction a clear name, ideally a name from the problem domain. It can be very useful to repeat yourself a couple of times if you can't yet see which patterns are worth abstracting.

A classic example would be an age of maturity and a VAT rate that both happen to be 18. If you extract that "duplication" out, then you have, for all eternity, tied the age of maturity to the VAT rate. Or, in Germany, drinking age, driving age, and voting age are the same. Then, you get your first customer in the US … — Jörg W Mittag, May 02 '23 at 17:03
Yup, numbers and code, mean different things in different contexts. Even when they look identical. — candied_orange, May 02 '23 at 17:51
I chose 255 because it's what my DB constraints are (sorry for not mentioning it). But I might have `StrictStr10`, `StrictStr20`, `StrictStr255`. What name would be most suitable? — PythonForEver, May 03 '23 at 07:22

mentallurg · Answer 2 · 2023-05-03T12:23:39.137

Str255 is a complication of code compared to StrictStr(max_length=...). Introduction of a named type/contraint might makes sense only, if you have multiple occurrences of the same logical, not just technical constraint. For instance, you have an attribute that means person name and it is used on multiple places in your API, then it may make sense to make it consistent across all API methods:

class PersonName(StrictStr):
    max_length = 123

class CatSchema(BaseModel):
   name: PersonName
   owner: PersonName
   age: int

I say "might makes sense", because the code should not only be uniform, but also easy readable.

That's why I'd suggest you to start with pure technical constraints like StrictStr(max_length=...), setting priority to readability over ensuring uniformness.

Abstracting constrained strings in serializer. Good or bad practice?

2 Answers2