Never use Strings in Java?

Question

I stumbled upon a blog entry discouraging the use of Strings in Java for making your code lack semantics, suggesting that you should use thin wrapper classes instead. This is the before and after examples the said entry provides to illustrate the matter:

public void bookTicket(
  String name,
  String firstName,
  String film,
  int count,
  String cinema);

public void bookTicket(
  Name name,
  FirstName firstName,
  Film film,
  Count count,
  Cinema cinema);

In my experience reading programming blogs I've come to the conclusion that 90% is nonsense, but I'm left wondering whether this is a valid point. Somehow it doesn't feel right to me, but I couldn't exactly pinpoint what is amiss with the style of programming.

Relevant discussion on the original Wiki: ["what is that the toString() of?"](http://c2.com/cgi/wiki?ToStringQuestion) — Christoffer Hammarström, Jan 26 '12 at 15:07
I've read at least one book (which may have been the first edition of Code Complete) that recommended typedefing all of your primitive types to have names from the problem domain. Same idea. — user16764, Jan 28 '12 at 04:57
@user16764 yes, you can definitely find that in code complete (see my comment about it below) — DPM, Jan 28 '12 at 10:35
Related primitive obsession http://haacked.com/archive/2012/09/30/primitive-obsession-custom-string-types-and-self-referencing-generic-constraints.aspx/ — Daniel Little, Jul 01 '14 at 00:12

DeadMG · Accepted Answer · 2012-01-26T15:12:32.057

88

Encapsulation is there to protect your program against change. Is the representation of a Name going to change? If not, then you're wasting your time and YAGNI applies.

Edit: I've read the blog post and he has a fundamentally good idea. The problem is that he's scaled it way too far. Something like String orderId is really bad, because presumably "!"£$%^&*())ADAFVF is not a valid orderId. This means that String represents a lot more possible values than there are valid orderIds. However, for something like a name, then you can't possibly predict what might or might not be a valid name, and any String is a valid name.

In the first instance, you are (correctly) reducing the possible inputs to only valid ones. In the second instance, you have achieved no narrowing of possible valid inputs.

Edit again: Consider the case of invalid input. If you write "Gareth Gobulcoque" as your name, it's going to look silly, it's not going to be the end of the world. If you put in an invalid OrderID, chances are that it's simply not going to function.

edited Jan 26 '12 at 15:12

answered Jan 25 '12 at 22:10

DeadMG

36,794
8
70
139

5

With order ids I would still probably prefer adding a check in the code accepting ids and leaving the String. The classes are supposed to provide methods for operating with data. If some class only checks the validity of some data and then does nothing, this doesn't seem to me proper. OOP is good, but you shouldn't overdo it. – Malcolm Jan 25 '12 at 22:53
13

@Malcolm: But then you have no way of knowing which Strings are validated except to validate them again and again. – DeadMG Jan 26 '12 at 00:56
7

_"[...] you can't possibly predict what might or might not be a valid name, and any String is a valid name."_ I guess one strength of this technique is that if your parameter is of type `Name`, you cannot accidentally pass another unrelated String value. Where you expect a `Name`, only a `Name` will compile, and the String "I owe you $5" cannot be accidentally accepted. (Note this is unrelated to any kind of name validation!) – Andres F. Jan 26 '12 at 03:01
How often do we really do something like that @AndresF. I've never seen that problem happen before. – user606723 Jan 26 '12 at 06:37
1

@AndresF. Not necessarily. If I read a String from a file or a database, I have only their word that it is a `Name`- it could, in fact, be any unrelated String value. – DeadMG Jan 26 '12 at 06:59
@DeadMG Yes, I thought that you could justify creation of such a small class by reducing the number of checks, but it depends on the overall design. Maybe ids are a part of a larger class, which can handle all the validation combining the functionality of all the small classes-validators. I'm not saying that having OrderId instead of String is always bad, it's just something I wouldn't do by default. – Malcolm Jan 26 '12 at 07:27
6

Creating types that are specific to a particular use adds semantic richness, and helps add safety. Plus, creating types to represent specific values helps a lot when using an IoC container to autowire your classes - easy to resolve the right bean to use when it's the only bean registered for a particular class. Much more effort is required when it's just one of many strings registered. – Dathan Jan 26 '12 at 07:52
2

@DeadMG Of course it's not foolproof; nothing is. But what you said can only happen at DB boundaries (or more in general, at any interface with the "outside world"); the rest of your app is _typesafe_. With plain Strings, the problem can occur anywhere. – Andres F. Jan 26 '12 at 12:21
1

@user606723 You mean you never experienced debugging a complex system and finding the wrong kind of value in a variable? I envy you! :) – Andres F. Jan 26 '12 at 12:23
@AndresF. But there is no problem in reality. That's the whole point. – DeadMG Jan 26 '12 at 15:11
3

@DeadMG That's argumentative, and not something we can resolve in comments. I'd say misplacing values of primitive types does happen, and hardening your interfaces is one way to improve the situation. – Andres F. Jan 26 '12 at 15:21
@AndresF., I have not. I can't understand how something like that would pass unit testing/model testing and I've never seen it before. – user606723 Jan 26 '12 at 16:07
2

@user606723 Keep in mind yours is the same argument proponents of dynamic languages have against static typing: "static typing never saved me from anything. I can't understand how a typing error can get past unit testing" ;) An obvious response would be "yes, unit testing can help detect this sort of errors, but I'd rather spend my time writing more interesting unit tests". – Andres F. Jan 26 '12 at 17:15
@AndresF. Perhaps, but it's hard to come up with a downside to static casting. I can't say the same for this. 1. time consuming 2. **You need to unit test it anyway** 3. lost transparency. (Over abstraction can definitely be a negative thing even if it's free from a initial standpoint.) – user606723 Jan 26 '12 at 17:43
@user606723 1- How exactly is this significantly more time consuming? 2- You need to unit test your domain anyway. String-as-a-name is less testable than `Name`. 3- How does this "lose" transparency? (Hope you're not talking about _implementation_ transparency) – Andres F. Jan 26 '12 at 17:56
1

@Andres: Just because static typing > dynamic typing (and I'm in agreement with this) does not mean that in absolutely every situation it is a valid expenditure of time. Yes, having a Name class might be *better*. But will it be *worth the time*? No. – DeadMG Jan 26 '12 at 18:00

Makach · Answer 2 · 2012-01-25T22:21:18.643

45

That's just crazy:)

Use common sense
Don't over-engineer!
Keep it small simple!
You Arent Gonna Need It (YagNi)

edited Jan 25 '12 at 22:21

answered Jan 25 '12 at 22:09

Makach

577
3
13

2

That's my opinion as well, but the thing is that I remember this suggestion in "Code Complete". Of course not everything in that book is indisputable, but at least it makes me think twice before rejecting an idea. – DPM Jan 25 '12 at 22:17
8

You just mentioned some slogans without any real justification. Can you elaborate on your opinion? Some accepted patterns and language features, for example, may look like added complexity, yet offer something valuable in return (for example: static typing) – Andres F. Jan 26 '12 at 02:00
12

Common sense is anything but common. – Kaz Dragon Jan 26 '12 at 09:50
1

+1, to this I would add that when a language supports named parameters, and the IDE is good, such is the case with C# and VS2010, then one does not have to compensate for the lack of features in the language with the crazy patterns. There is no need for class named X and a class named Y, if one can write `var p = new Point(x: 10, y:20);` Also, it is not like Cinema is that different from a string. I would understand if one were to deal with physical quantities, such as pressure, temperature, energy, where units differ and some cannot be negative. The author of the blog needs to try functional. – Job Jan 26 '12 at 19:41
1

+1 for "Don't over-engineer!" – Ivan Jan 26 '12 at 22:00

score 23 · Answer 3 · answered Jan 25 '12 at 23:28

23

I agree with the author, mostly. If there is any behavior peculiar to a field, like validation of an order id, then I would create a class to represent that type. His second point is even more important: if you have a set of fields that represent some concept like an address, then create a class for that concept. If you are programming in Java, you are paying a high price for static typing. You may as well get all the value you can.

answered Jan 25 '12 at 23:28

kevin cline

33,608
3
71
142

2

Can the downvoter comment? – kevin cline Jan 26 '12 at 22:57
What is the high price we are paying for static typing? – Richard Tingle Jan 09 '15 at 21:32
@RichardTingle: The time to recompile the code and restart the JVM for every code change. The time lost when you make a source-compatible but binary-incompatible change and things that need to be recompiled aren't and you get "MissingMethodException". – kevin cline Jan 10 '15 at 02:38
I've never experienced a problem with java applications (what with continuous compilation it's usually already compiled by the time I hit run) but it's a fair point with web WARs for which redeploying does seem a little slower – Richard Tingle Jan 10 '15 at 10:57

score 16 · Answer 4 · edited Jun 16 '20 at 10:01

Don't do it; it will overcomplicate stuff and You Ain't Gonna Need It

... is the answer I would have written here 2 years ago. Now, though, I'm not so sure; in fact, in recent months I've started migrating old code to this format, not because I have nothing better to do but because I genuinely needed it for implementing new features or changing existing ones. I understand the automatic aversions others here have seeing that code, but I think this is something that deserves serious thought.

Benefits

Chief among benefits is the ability to modify and extend the code. If you use

class Point {
    int x,y;
    // other point operations
}

instead of just passing a couple of integers around - which is something that, unfortunately, many interfaces do - then it becomes far easier to later add another dimension. Or change the type to double. If you use List<Author> authors or List<Person> authors instead of List<String> authors it later becomes much easier to add more information to what an author represents. Writing it down like this it feels like I'm stating the obvious, but in practice I've been guilty of using strings this way many times myself, especially in cases where it wasn't obvious at start then I'd need more than a string.

I'm currently trying to refactor some string list which is intertwined throughout my code because I need more information there, and I feel the pain :\

Beyond that, I agree with the blog's author that it carries more semantic information, making it easier for the reader to understand. While parameters are often given meaningful names and get a dedicated line of documentation, this is often not the case with fields or locals.

The final benefit is type safety, for obvious reasons, but in my eyes it's a minor thing here.

Drawbacks

It takes longer to write. Writing a small class is fast and easy but not effortless, especially if you need a lot of these classes. If you find yourself stopping every 3 minutes to write some new wrapper class, it can be a real detriment to your concentration, too. I'd like to think, however, that this state of effort will usually only happen at the first stage of writing any piece of code; I can usually get a pretty good idea quickly of what entities will need to be involved.

It can involve a lot of redundant setters (or constructions) and getters. The blog author gives the truly ugly example of new Point(x(10), y(10)) instead of new Point(10, 10), and I'd like to add that a usage might also involve stuff like Math.max(p.x.get(), p.y.get()) instead of Math.max(p.x, p.y). And long code is often considered harder to read, and justly so. But in all honesty, I have a feeling a lot of code moves objects around, and only select methods create it, and even fewer need access to its gritty details (which isn't OOPy anyway).

Debatable

I'd say whether or not this helps with the readability of code is debatable. Yes, more semantic information, but longer code. Yes, it's easier to understand the role of each local, but it's harder to understand what you can do with it unless you go and read its documentation.

As with most other programming schools of thought, I think it's unhealthy to take this one to the extreme. I don't see myself ever separating the x and y coordinate to be each of a different type. I don't think Count is necessary when an int should suffice. I dislike the unsigned int usage in C - while theoretically good, it just doesn't give you enough information, and it does prohibit extending your code later to support that magical -1. Sometimes you do need simplicity.

I think that blog post is a bit on the extreme side. But overall, I have learned from painful experience that the basic idea behind it is made from the right stuff.

I have a deep aversion to over-engineered code. I truly do. But used right, I don't think this over-engineering.

Bill K · Answer 5 · 2015-01-09T18:30:34.857

Although it is kind of overkill, I often think most stuff I've seen is under-engineered instead.

It's not just "Safety", One of the really nice things about Java is that it helps you a lot with remembering/figuring out just what any given library method call needs/expects.

The WORST (by far) Java library I've worked with was written by someone who was very fond of Smalltalk and modeled the GUI library after it to make it act more like smalltalk--the problem being every method took the same base object, but couldn't actually USE everything that the base object could be cast to, so you were back to guessing what to pass into methods and not knowing if you had failed until runtime (Something I used to have to deal with every single time when I was working in C).

Another issue--If you pass around strings, ints, collections and arrays without objects, all you have is balls of data with no meaning. This seems natural when you think in terms of libraries that "some app" will use, but when designing an entire app it's much more helpful to assign meaning (code) to all your data in the place where the data is defined and only think in terms of how these high-level objects interact. If you are passing around primitives instead of objects then you are--by definition--altering data in a place different from where it's defined (This is also why I really don't like Setters and Getters--same concept, you are operating on data that isn't yours).

Finally, if you define separate objects for everything you always have a great place to validate everything--for instance if you create an object for zip code and later find that you have to ensure that the zipcode always includes the 4 digit extension you have a perfect place to put it.

It's not actually a bad idea. Thinking about it I'm not even sure I'd say it was over-engineered at all, it's just easier to work with in nearly every way--the one exception being the proliferation of tiny classes but Java classes are so lightweight and easy to write that this is hardly even a cost (they can even be generated).

I'd be really interested to see a well-written java project that actually had too many classes defined (Where that made it harder to program), I'm beginning to think it's not possible to have too many classes.

Egon · Answer 6 · 2012-01-28T21:40:25.257

I think you have to look at this concept from another starting point. Take a look from the perspective of a database designer: the types passed in the first example do not define your parameters in a unique way, let alone a useful way.

public void bookTicket(
  String name,
  String firstName,
  String film,
  int count,
  String cinema);

Two parameters are needed to specify the actual patron that is booking tickets, you might have two different movies with identical names (e.g. remakes), you might have the same movie with different names (e.g. translations). A certain chain of movie theaters might have different branch offices, so how are you going to deal with that in a string and in a consistent way (e.g. are you using $chain ($city) or $chain in $city or even something else and how are you going to make sure that this is used consistently. The worst actually is specifying your patron by means of two parameters, the fact that both a first name and surname are provided does not guarantee a valid customer (and you cannot distinguish two John Does).

The answer to this is to declare types, but these will rarely be thin wrappers as I show above. Most likely, they will function as your data storage or be coupled with some kind of database. So a Cinema object will likely have a name, location, ... and that way you get rid of such ambiguities. If they are thin wrappers, they are by coincidence.

So, IMHO the blog post is just saying "make sure you pass the correct types", its author just made the overly-constrained choice to pick at basic data types in particular (which is the wrong message).

The proposed alternative is better:

public void bookTicket(
  Name name,
  FirstName firstName,
  Film film,
  Count count,
  Cinema cinema);

On the other hand, I think the blog post goes too far with wrapping everything. Count is too generic, I could count apples or oranges with that, add them and still have a situation on my hands where the type system allows me to do nonsensical operations. You can of course apply the same logic as in the blog and define types CountOfOranges etc., but that is also plain silly.

For what it's worth, I'd actually write something like

public Ticket bookTicket(
  Person patron,
  Film film,
  int numberOfTickets,
  Cinema cinema);

Long story short: you shouldn't be passing nonsensical variables; the only times you'd actually specify an object with a value that doesn't determine the actual object, is when you either run a query (e.g. public Collection<Film> findFilmsWithTitle(String title)) or when you are putting together a proof-of-concept. Keep your type system clean, so don't use a type that is too general (e.g. a movie represented by a String) or too restrictive/specific/contrived (e.g. Count instead of int). Use a type that defines your object uniquely and unambiguously whenever possible and viable.

edit: an even shorter summary. For small applications (e.g. proof of concepts): why bother with a complicated design? Just use String or int and get on with it.

For large applications: is it really that likely that you have lots of classes that consist of a single field with a basic data type? If you have little such classes, you just have "normal" objects, nothing special going on there.

I feel the idea of encapsulating strings, ... is just a design that is incomplete: overly complicated for small applications, not complete enough for large applications.

I see your point, but I'd argue you are assuming more than the author is stating. For all we know, his model only has a String for a patron, a String for a film and so on. That hypothetical extra functionality is really the crux of the problem so either he was very "absent-minded" to omit that when making his case or he thinks we should provide more semantic power just because. Again, for all we know, it's the latter he had in mind. — DPM, Jan 28 '12 at 20:48
@Jubbat: indeed I assume more than what the author is stating. But my point is that either you have a simple application, in which case either way is overly complicated. For that scale, maintainability is not an issue and semantic distinction impedes your coding speed. If on the other hand, your application is large, it becomes worthwhile to properly define your types (but it is unlikely to be simple wrappers). IMHO, his examples are just not convincing or have major design flaws beyond the point he is trying to make. — Egon, Jan 28 '12 at 21:32

score 2 · Answer 7 · answered Jan 25 '12 at 22:21

2

To me doing this is the same as using regions in C#. Generally if you feel this is needed to make your code readable then you have bigger problems you should be spending your time on.

answered Jan 25 '12 at 22:21

Tom Squires

17,695
11
67
88

2

+1 for implying the treatment of symptoms rather than the cause. – Job Jan 26 '12 at 19:43

score 2 · Answer 8 · answered Jan 26 '12 at 08:12

I'd say it is a really good idea in a language with a strongly typed typedef like feature.

In Java you don't have this so creating whole new class for these things means the cost probably outweighs the benefit. You can also get 80% of the benefit by being careful about your variable/parameter naming.

score 0 · Answer 9 · answered Jan 26 '12 at 15:38

It would be good, IF String (end Integer, and ... speaking only about String) was not final, so these classes could BE some (restricted) String with meaning and still could be send to some independednt object that knows how to handle the base type (without conversing back and forth).

And "goodnes" of it increase when there are eg. restrictions on all names.

But when creating some app (not library), its always possible to refactor it. So I prefare to start without it.

score 0 · Answer 10 · answered Apr 20 '12 at 12:20

To bring an example: in our currently developed system, there are lots of different entities which can be identified by multiple kinds of identifiers (due to the usage of external systems), sometimes even the same kind of entity. All of the identifiers are Strings - so if someone mixes up which kind of id should be passed as parameter, no compile time error is shown, but the program will blow up at runtime. This happens quite often. So I have to say that this principle's primary purpose is not to protect against change (although it serves that also), but to protect ourselves from bugs. And anyway, if someone designs an API, it has to reflect the concepts of the domain, so it is conceptually useful to define domain-specific classes - everything depends on whether there is order in the mind of the developers, and this can be greatly facilitated by the type system!

@downvoter: can you please give an explanation? – thSoft Apr 26 '12 at 13:58 — thSoft, Apr 26 '12 at 13:58

Never use Strings in Java?

10 Answers10

Benefits

Drawbacks

Debatable

Linked