When creating a Set in Java, what's the difference between the following? Which one should I use, and why?
Set< T > set = new HashSet<>();
HashSet< T > set = new HashSet<>();
When creating a Set in Java, what's the difference between the following? Which one should I use, and why?
Set< T > set = new HashSet<>();
HashSet< T > set = new HashSet<>();
In object-oriented programming, there is a concept of "programming to an interface." The idea is you do not really care which type of set you use, only that it fulfills the contract of the Set
interface (or List
, or Map
).
Your examples do not really show the benefits of the first example, but consider this code:
public class Example {
public HashSet<String> doSomething() {
final HashSet<String> set = new HashSet<>();
// do stuff with the set
return set;
}
}
You use this class all over the place. There are tons of references, and code in multiple locations has to store the result of that method in a HashSet<String>
reference.
Oops, now you have a requirements change, and order matters. HashSet
does not guarantee iteration order, now you need a TreeSet
. So you update the method to use TreeSet
. Except now you have tons of references that need to be updated. Perhaps your IDE can update this for you: perhaps you have external users of the class and it will create headaches for them.
A better alternative is to code to an interface. Instead of defining the implementation in the method signature and body, use plain old Set
. If the method's contract is that it returns an ordered set, return a SortedSet
. If users of the set need not only order but there is also a guarantee that an iterator can navigate the set, return a NavigableSet
.
public class Example {
public SortedSet<String> doSomething() {
final SortedSet<String> set = new TreeSet<>();
// do stuff with the set
return set;
}
}
The Java collections framework includes interfaces for most of the common implementations. The Map
interface has subinterfaces mirroring the set interfaces I already mentioned (makes sense, because the sets delegate internally to maps). Lists are an exception (ArrayList
implements RandomAccess
, a marker interface), but queues (including LinkedList
) have several interfaces.
Anyway, code to the highest level interface you can. If a consumer of your interface only wants to iterate, then store a reference to that set in a Collection
or even Iterable
, insulating it from interface changes later on.
The answer from @Snowman is right for the general case. It is most of the time correct to code to an interface, for the reasons he stated.
There are cases however, where you don't want that, as the underlying implementation can make a big difference to performance.
Consider the case of a List<E>
which you process by using List.get(int)
and List.set(int, E)
. While these operations are available for all List
implementations, it will make a huge difference in performance whether you are working on an ArrayList
or on a LinkedList
.
In such a case you could define that your method accepts only ArrayList
as a parameter. Or you could accept List
and then create an ArrayList
from it for processing.
So the answer is, it depends. In most cases code to an interface, but when you need to, don't be afraid to specify the exact type you are expecting.