How to implement some constrained .NET struct
, where its state with default values (set by CLR to technical defaults, i.e. nulls, zeros, etc.) should be prohibited due to some design constraint?
You can't. As of 2022 and C# 10.0, there still is no way to prevent consuming code from having default
struct values:
LimitedString[] values = new LimitedString[ 5 ];
MethodThatRequiresNonNullString( values[0].Value ); // <-- This will always fail at runtime, without _any_ compile-time warnings or errors.
Q.E.D.
Whereas if LimitedString
were a class
type, and if C# 8.0 nullable-reference-types were enabled, you'd get a compile-time warning that LimitedString[] values
should be typed as LimitedString?[]
and that the values[0].Value
dereference is unsafe..
However this does not necessarily mean that you should be using a class
type for this: it just means you need to understand how you can implement struct
types correctly and appropriately.
For example in case of trivial struct LimitedString
, properties are String Value
and int MaxLength
, where value of the MaxLength
property must be at least 1
. Value 0
is not allowed by design.
But when I initialize the structure, I have 0 there. How to force value 10
into defaults?
You appear to be thinking that it's okay to define classes and structs that can be instantiated into an invalid state and then set their properties afterwards until they're somehow "initialized". This is not how classes nor structs should be designed.
(I blame WinForms and WPF/XAML for so many .NET developers getting into this plainly wrong mindset, because WinForms and WPF/XAML basically require all component classes to have parameterless constructors and be post-hoc initialized).
Constructors exist to ensure that their newly created objects are in a specific valid state, this means having to assert preconditions about their parameter values (using ArgumentException
). And in .NET, struct
types should always be immutable (which necessarily means that their properties are strictly get
-only: no set
nor init
properties!) so always write readonly struct
-types, not struct
-types.
With that in mind, let's review your first question again:
But when I initialize the structure, I have 0 there. How to force value 10
into defaults?
You should make maxLength
a constructor parameter, not as a property you set after construction - which means it can also validate the String value
.
However because maxLength
is not an invariant of your program it means your LimitedString
becomes less useful (e.g. e.g. a method that accepts a LimitedString limStr
parameter has no useful compile-time guarantees that the actual limStr.Value.Length
is anything but non-zero, so it would have to check the ``limStr.Value.Lengthitself at runtime which is hardly better than just passing a
Stringvalue. Instead the
MaxLengthvalue should be expressed as a _type-parameter_ of
LimitedString, e.g.
LimitedString<MaxLength: 10>, unfortunately C# does not support
inttype-parameters like C++ does - but you can hack it in other ways... <sub>(but that's another discussion...)</sub> - but I'll continue with my answer anyway, but I'll disregard the _invariance_ aspects of your
MaxLength` design.
Throw exception in static
parameterless constructor → force using only costructor(s) with parameters. Impractical, parameterless use is expected sometimes.
Again, you misunderstand the purpose of constructors (and are also seemingly getting confused by the type-level static LimitedString()
"static constructor", which is actually completely irrelevant to your question, as your struct LimitedString
won't have any static
members).
You should have a parameterized constructor (accepting String value, Int32 maxLength
) and your constructor must throw new ArgumentException
to make precondition assertions about those parameter's arguments values. That's the whole point of a constructor, regardless of if it's a struct
or a class
's constructor.
...However, because C#/.NET struct
types always have a parameterless constructor that cannot be manually defined or removed (an unavoidable consequence of the low-level details of how struct
types work: it's their default
or "zero" value!) So in C#, whenever you're writing a method with a struct
-type parameter or struct
-typed property-setter you always need to be cognizant of the possibility that that input is default
: and then act accordingly depending on your business/domain rules (i.e. "can a default
or "uninitialized" value of this type ever be considered valid in my program?"). If not, then your program needs to reject it in appropriately: either by throwing an ArgumentException
, returning false
from a Try...
-pattern method, etc.
Add helper private field IsInitialized
and while it is false
, assume default values, i.e. MaxLength = 10
. Slightly higher complexity inside the struct.
You actually wouldn't need to add a whole new field to detect default
struct state: assuming your constructor requires the String value
argument to be non-null
(in addition to checking the length) before storing it in readonly String myStringValue;
then you know that if the struct is default
then the myStringValue
field will also be null
(as null == default(String)
) - so just checking if this.myStringValue is null
is enough to tell you the struct is invalid. But you don't even have to do that: the as you said MaxLength
must be > 0
and because default(Int32) == 0
you could always just check if this.maxLengthValue == default(Int32)
to see if your struct is invalid.
Slightly higher complexity inside the struct.
Unfortunately in the case of struct
types, that "slightly higher complexity" is absolutely necessary because a struct
's member methods and properties can be invoked on default
instances (whereas a class
will never have its instance methods invoked when this == null
), so all of your structs' externally visible (i.e. public
and internal
members) must self-validate this
as a precondition.
Is option #2 a legitimate way or does this violate some design principles?
On the contrary: Option #2 is the only way (and Option #1 is either nonsensical or demonstrates a lack of understanding of OOP fundamentals and the purpose of constructors).
With that in mind, let's review the hard-and-fast rules for struct
type design (especially since C# 7 made significant improvements to struct
types with the addition of readonly struct
, for example):
TL;DR: aka Hard and fast rules:
- If your instance's data is mutable, or exceeds ~32 bytes in aggregate, then then you should use a
class
instead of a struct
.
- Use
readonly struct LimitedString
, not struct LimitedString
.
- Define your
struct LimitedString
's state using only fields, not auto-properties...
- ...and never read those fields directly!
- Instead all read access to those fields should be indirectly done via wrapper getter-only properties, which all ensure
this != default(LimitedString)
before returning.
- Ensure all members and consumers of your
struct LimitedString
use only those self-validating wrapper properties.
But why?
Because any and all struct
-types can always possibly be default
, it means that if struct-type's default
state is invalid (and so should never be encountered during program operation) it means all public members of structs must be self-validating in some way or another.
I find the best approach is to always use private readonly
fields and require all access to their data to by via an expression-bodied property that performs the state validation. This does mean that you cannot use auto-properties to avoid having to define both the field and property for the same logical member *grumble*.
Rather than using your struct LimitedString
for my first example, let's review this contrived example struct Foo
instead, which features more problem-areas:
public struct Foo
{
public Foo( Bar bar, Qux qux, Int32 neverZero, Int32 canBeZero )
{
if( neverZero == 0 ) throw new ArgumentOutOfRangeException( paramName: nameof(neverZero), actualValue: neverZero, message: "Cannot be zero." );
this.Bar = bar ?? throw new ArgumentNullException(nameof(bar));
this.qux = qux ?? throw new ArgumentNullException(nameof(qux));
this.NeverZero = neverZero;
this.CanBeZero = canBeZero
}
public Bar Bar { get; }
private readonly Qux qux;
public Int32 NeverZero { get; }
public Int32 CanBeZero { get; }
public SomethingElse Baz()
{
return this.Bar.Hmmm( this.qux ).LoremIpsum( 123 );
}
public CompletelyDifferent MyHovercraftIsFullOfEels()
{
return this.qux.IWillNotBuyThistTobaccanistItIsScratched();
}
}
...you'll need this instead:
public readonly struct Foo
{
public Foo( Bar bar, Qux qux, Int32 neverZero, Int32 canBeZero )
{
if( neverZero == 0 ) throw new ArgumentOutOfRangeException( paramName: nameof(neverZero), actualValue: neverZero, message: "Cannot be zero." );
this.bar_DoNotReadDirectlyExceptViaProperty = bar ?? throw new ArgumentNullException(nameof(bar));
this.qux_DoNotReadDirectlyExceptViaProperty = qux ?? throw new ArgumentNullException(nameof(qux));
this.NeverZero = neverZero;
this.CanBeZero = canBeZero
}
private readonly Bar bar_DoNotReadDirectlyExceptViaProperty;
private readonly Qux qux_DoNotReadDirectlyExceptViaProperty;
private readonly Int32 neverZero_DoNotReadDirectlyExceptViaProperty;
private readonly Int32? canBeZero_DoNotReadDirectlyExceptViaProperty;
public Bar Bar => this.bar_DoNotReadDirectlyExceptViaProperty ?? throw new InvalidOperationException();
private Qux Qux => this.qux_DoNotReadDirectlyExceptViaProperty ?? throw new InvalidOperationException();
public Int32 NeverZero => this.neverZero_DoNotReadDirectlyExceptViaProperty != 0 ? this.neverZero_DoNotReadDirectlyExceptViaProperty : throw new InvalidOperationException();
public Int32 CanBeZero => this.canBeZero_DoNotReadDirectlyExceptViaProperty ?? throw new InvalidOperationException();
public SomethingElse Baz()
{
return this.Bar.Hmmm( this.Qux ).LoremIpsum( 123 );
}
public CompletelyDifferent MyHovercraftIsFullOfEels()
{
return this.Qux.IWillNotBuyThistTobaccanistItIsScratched();
}
}
- The main thing to observe here is that now all accesses of
struct Foo
's state (i.e. its fields) from the Baz
and MyHovercraftIsFullOfEels
methods now go via the Bar
and Qux
properties instead of accessing the fields directly (which would cause a NullReferenceException
, even when using C# 8.0 nullable-reference-types).
- The field name style
bar_DoNotReadDirectlyExceptViaProperty
is intentionally ugly: it mitigates the risk of human-factors (e.g. if the struct Foo
code is ever modified in future by someone who is unfamiliar with the patterns in-use here) and ensures all access go through the self-validating wrapper properties: Bar
and Qux
respectively.
- Warning sign naming is a common technique, e.g.
dangerouslySetInnerHTML
in ReactJS. I don't know if there's a formal name for the technique though.
- Ideally C# would let us have property-scoped fields which would make this technique completely moot but it doesn't. Oh well.
- I suppose you could write a Roslyn code-analysis rule that enforces the requirement that all
readonly struct
fields are always read only via their matched self-validating properties.
- Observe how the self-validating property-getter throws
InvalidOperationException
because that is the specific exception type you should use for this situation (emphasis mine):
InvalidOperationException
: The exception that is thrown when a method call is invalid for the object's current state.
- I'm sure we're all familiar with the maxim that "property-getters should never throw exceptions", I'll remind everyone that it should really be rephrased as "property-getters on correctly constructed objects should never throw exceptions".
- This is because it's the constructor's responsibility to ensure the object is in a valid state - that's the entire point of constructors in the first place: to make guarantees about the state of an object.
- Therefore, if a
struct
value is not constructed correctly (e.g. SomeStruct val = default(SomeStruct);
) then its property-getters should throw an exception, because it's an exceptional circumstance and you really do what the program to fail.
- The
Int32 canBeZero
value is stored in a Nullable<Int32>
field (aka Int32?
aka int?
) because otherwise it is impossible to differentiate between default(Int32)
and zero, while using Nullable<Int32>
makes it possible to detect never-initialized fields that store Int32
values - or any other value-type T
where default(T)
is meaningful in your application (e.g. TimeSpan
).
- However this isn't necessary for the
Int32 NeverZero
property because the domain-rules (which are implemented in the constructor) which specifically disallow zero means that default(Int32)
is an invalid value, so there's no need to use Nullable<Int32>
as the underlying field type.
A better struct LimitedString
So having considered that, this is how I would design your struct LimitedString
:
public readonly struct LimitedString
{
public LimitedString( String value, Int32 maxLength )
{
if( value is null ) throw new ArgumentNullException(nameof(value));
if( maxLength < 1 ) throw new ArgumentOutOfRangeException( message: "Value must be positive and non-zero.", nameof(maxLength));
if( value.Length > maxLength ) throw new ArgumentException( message: "Length {0} exceeds maxLength {1}.".Format( value.Length, maxLength ), nameof(value));
this.value_doNotReadDirectly = value;
this.maxLength_doNotReadDirectly = maxLength;
}
private readonly String value_doNotReadDirectly;
private readonly Int32 maxLength_doNotReadDirectly;
public String Value => this.value_doNotReadDirectly ?? throw new InvalidOperationException( "This LimitedString is a default value and was not constructed correctly." );
public Int32 MaxLength => this.maxLength_doNotReadDirectly > 1 ? this.this.maxLength_doNotReadDirectly : throw new InvalidOperationException( "This LimitedString is a default value and was not constructed correctly." );
}
While struct LimitedString
now correctly handles both default
and constructed states, it still needs other improvements made to it in order to make it suitable for production use, such as these listed below:
- With the above
public readonly struct LimitedString
, as-is, Visual Studio will instantly prompt you to override Equals
and GetHashCode
.
- Implementing
GetHashCode
correctly is very important for struct
types and is an absolute requirement if you want to use it as a TKey
type in a Dictionary<TKey,TValue>
or as T
in HashSet<T>
.
- You'll also want to override
ToString()
too, so you can pass LimitedString
directly into String.Format
args or Console.Write
, as well as to get a better experience in the VS debugger.
- This is trivial: just do
public override String ToString() => this.Value.ToString();
.
- Also consider adding
[DebuggerDisplay]
too.
- Add an
implicit
conversion operator to String
: because every valid LimitedString
represents a valid String
.
- But don't add an
implicit
conversion from String
: because not every non-null String
is a valid LimitedString
.
- And also because
LimitedString
requires a maxLength
parameter, which is not present in a normal String
value, of course.
- Implement
IEquatable<LimitedString>
and maybe IEquatable<String>
too.
- Add public methods and forwarded
get
-only properties to LimitedString
that match the public API surface of String
so that your LimitedString
type can be a drop-in replacement for a String
parameter or property in your existing code.
- so add members like
public Int32 Length => this.Value.Length;
and public String Trim() => this.Value.Trim();
.
- This can be tedious though, and it is unfortunate that C# still cannot auto-generate those members for us.
A real-world example: Refinement types.
I use readonly struct
-types as refinement types in my projects, which is handy for when I want to express a static constraint over some business object.
For example, supposing I have class User
with an optional String? EmailAddress { get; }
property and I want to pass it around some in code that requires the User
to have a valid e-mail address (not just a non-null String
e-mail address value) then I'd solve this problem by defining readonly struct UserWithValidEmailAddress : IReadOnlyUser
and then changing all member parameters and return-types in the code that requires e-mail addresses from User
to UserWithValidEmailAddress
, which means the compiler will give me a compile-time error if any code tries to pass a User
(because that User
object could have an invalid or absent e-mail address) without first validating it with a validating-factory method.
- I use
readonly struct
because that means it's essentially a zero-cost abstraction, which we don't get with class
-types (as class
-types always involve a GC allocation and subsequent collection) whereas struct
types are "free" until/unless boxed.
- The use of
implicit
conversion and implementing the IReadOnly...
interface means that struct
-types lack of inheritance aren't problems either (and even if these refinement types were class
types instead of struct
types inheritance would be the worst way to implement them).
- In all of my projects I always define
IReadOnly...
interfaces for all domain-types, so class User
will have IReadOnlyUser
which has only read-only versions of User
's instance properties in addition to all methods that don't cause any mutation.
- Having
IReadOnly...
interfaces is also useful because C# still doesn't have C++-style const
-correctness. *grumble*.
Unfortunately because struct
-types can always exist in their default
form, it's necessary for them to self-validate inside every public instance member, not just inside their parameterized constructor. And unfortunately, C# auto-properties (used for brevity) cannot self-validate, which makes them unsafe - so those have to be implemented as private fields with public get-only properties which throw InvalidOperationException
if they detect they're in an invalid/default
state (which is trivial: just ensure that reference-type fields are non-null
, or if all fields are value-types, then use System.Nullable<T>
for the first field and throw if that's null
at runtime - or add a bool
field that will always be true
if any of the defined ctors are used).
So this is what my UserWithValidEmailAddress
looks like:
public readonly struct UserWithValidEmailAddress : IReadOnlyUser
{
public static Boolean TryCreate( User user, [NotNullWhen(true)] out UserWithValidEmailAddress? valid )
{
if( ValidateEmailAddress( user.EmailAddress ) )
{
valid = new ValidateEmailAddress( user, new MailAddress( user.EmailAddress ) );
return true;
}
valid = default;
return false;
}
public static implicit operator User( UserWithValidEmailAddress self )
{
return self.User;
}
private UserWithValidEmailAddress( User user, MailAddress addr )
{
this.user_DoNotReadDirectlyExceptViaProperty = value ?? throw new ArgumentNullException(nameof(user));
this.validatedMailAddress = addr ?? throw new ArgumentNullException(nameof(value));
}
private readonly User user_DoNotReadDirectlyExceptViaProperty;
private readonly MailAddress validatedMailAddress;
public User User => this.user_DoNotReadDirectlyExceptViaProperty ?? throw new InvalidOperationException("This is a default(UserWithValidEmailAddress).");
public MailAddress ValidEmailAddress => this.validatedMailAddress ?? throw new InvalidOperationException("This is a default(UserWithValidEmailAddress).");
public override String ToString() => this.User.ToString();
#region IReadOnlyUser
// This code-block is generated by my own tooling, which helps with the tedium.
// Because all of these members go through `UserWithValidEmailAddress.User` (which self-validates) instead of directly accessing the `user_DoNotReadDirectlyExceptViaProperty` field it means `default`-safety is guaranteed.
public String UserName => this.User.UserName;
public String DisplayName => this.User.DisplayName;
public DateTime Created => this.User.Created;
// etc
// The IReadOnlyUser.EmailAddress property is implemented explicitly, instead of as a public member, because it's redundant: if a method is using `UserWithValidEmailAddress` then it should use the `MailAddress ValidEmailAddress` property instead.
String IReadOnlyUser.EmailAddress => this.User.EmailAddress;
#endregion
}
While this looks tedious to write every time I want a refinement type, I have a VS code-snippet that inserts the outline for me, as well as having other tooling to automatically generate the #region IReadOnlyUser
part too.