30

We're implementing an adapter for Jaxen (an XPath library for Java) that allows us to use XPath to access the data model of our application.

This is done by implementing classes which map strings (passed to us from Jaxen) into elements of our data model. We estimate we'll need around 100 classes with over 1000 string comparisons in total.

I think that the best way to do this is simple if/else statements with the strings written directly into the code — rather than defining each strings as a constant. For example:

public Object getNode(String name) {
    if ("name".equals(name)) {
        return contact.getFullName();
    } else if ("title".equals(name)) {
        return contact.getTitle();
    } else if ("first_name".equals(name)) {
        return contact.getFirstName();
    } else if ("last_name".equals(name)) {
        return contact.getLastName();
    ...

However I was always taught that we should not embed string values directly into code, but create string constants instead. That would look something like this:

private static final String NAME = "name";
private static final String TITLE = "title";
private static final String FIRST_NAME = "first_name";
private static final String LAST_NAME = "last_name";

public Object getNode(String name) {
    if (NAME.equals(name)) {
        return contact.getFullName();
    } else if (TITLE.equals(name)) {
        return contact.getTitle();
    } else if (FIRST_NAME.equals(name)) {
        return contact.getFirstName();
    } else if (LAST_NAME.equals(name)) {
        return contact.getLastName();
    ...

In this case I think it's a bad idea. The constant will only ever be used once, in the getNode() method. Using the strings directly is just as easy to read and understand as using constants, and saves us writing at least a thousand lines of code.

So is there any reason to define string constants for a single use? Or is it acceptable to use strings directly?


PS. Before anyone suggests using enums instead, we prototyped that but the enum conversion is 15 times slower than simple string comparison so it's not being considered.


Conclusion: The answers below expanded the scope of this question beyond just string constants, so I have two conclusions:

  • It's probably OK to use the strings directly rather than string constants in this scenario, but
  • There are ways to avoid using strings at all, which might be better.

So I'm going to try the wrapper technique which avoids strings completely. Unfortunately we can't use the string switch statement because we're not on Java 7 yet. Ultimately, though, I think the best answer for us is to try each technique and evaluate its performance. The reality is that if one technique is clearly faster then we'll probably choose it regardless of its beauty or adherence to convention.

gutch
  • 520
  • 1
  • 4
  • 10
  • 4
    You don't plan on manually keyboarding a 1000 if statements do you? – JeffO Apr 24 '12 at 02:58
  • Yes we do! No matter how we do implement this, we somehow have to define each unique mapping. We could, say, use Java annotations but then we'd need 1000 annotations... not necessarily an improvement. – gutch Apr 24 '12 at 03:23
  • 1
    I find it so sad how unpleasant something this simple can be in some languages… – Jon Purdy Apr 24 '12 at 04:08
  • I have no problem writing it in another language if it compiles to JVM bytecode... any suggestions? – gutch Apr 24 '12 at 04:42
  • 5
    Java 7 allows Strings as `switch` labels. Use a switch instead of `if` cascades. – Martin Schröder Apr 24 '12 at 06:53
  • 3
    enum conversion is 15 times slower if you convert a string to its enum value! Pass enum directly and compare with another enum value of the same type! – Neil Apr 24 '12 at 07:26
  • @Jeffo If only there was some kind of language we could use to speed up that process and allow only typing the strings, maybe matching them together to create the if statements systematically, some kind of language... We could call it a programming language even! Sorry... Couldn't help myself... – Paul Hazen Apr 24 '12 at 10:38
  • @JeffO Hey Jeff, I wasn't trying to be mean or sarcastic, was trying (and I guess failed) to be funny. It's late (or early) and I'm tired, so I apologize if you were put off by my comment. – Paul Hazen Apr 24 '12 at 12:14
  • If only there were [libraries that did all this for you](http://xstream.codehaus.org)... – Michael K Apr 24 '12 at 12:21
  • 2
    Smells like HashMap can be a solution. – MarioDS Apr 24 '12 at 18:12
  • @MichaelK I don't think XStream is appropriate for our scenario because the purpose here is to handle XPath queries — an XPath query comes in from an external system, and we have to return some data objects in response. AFAIK XStream doesn't support XPath queries; instead it would have to serialise our data into XML that is then loaded into a DOM where XPath queries can be made. That would mean serialising gigabytes and gigabytes of data into a DOM just in case an XPath query needs it... something many orders of magnitude slower than using a Jaxen adapter. – gutch Apr 27 '12 at 04:10
  • Yeah, XStream is built so that you don't have to use XPath or parse a DOM yourself. I didn't realize your queries were externally produced. – Michael K Apr 27 '12 at 12:17
  • *"Using the strings directly is just as easy to read and understand as using constants"* I disagree ! Its not just as easy, its **much easier** ! Adding one level of indirection would just make things a little more bloated here. I'm currently working on a very small project where everything is dispatched with one-use constants referring to files containg one-use constants and so on on several levels. Its a hell to read. Of course, used correctly, constants are of great use, but not in such case. – Balmipour Jan 10 '18 at 10:54

6 Answers6

6

If at all possible use Java 7 which allows you to use Strings in switch statements.

From http://docs.oracle.com/javase/tutorial/java/nutsandbolts/switch.html

public class StringSwitchDemo {

    public static int getMonthNumber(String month) {

        int monthNumber = 0;

        if (month == null) {
            return monthNumber;
        }

        switch (month.toLowerCase()) {
            case "january":
                monthNumber = 1;
                break;
            case "february":
                monthNumber = 2;
                break;
            case "march":
                monthNumber = 3;
                break;
            case "april":
                monthNumber = 4;
                break;
            case "may":
                monthNumber = 5;
                break;
            case "june":
                monthNumber = 6;
                break;
            case "july":
                monthNumber = 7;
                break;
            case "august":
                monthNumber = 8;
                break;
            case "september":
                monthNumber = 9;
                break;
            case "october":
                monthNumber = 10;
                break;
            case "november":
                monthNumber = 11;
                break;
            case "december":
                monthNumber = 12;
                break;
            default: 
                monthNumber = 0;
                break;
        }

        return monthNumber;
    }

    public static void main(String[] args) {

        String month = "August";

        int returnedMonthNumber =
            StringSwitchDemo.getMonthNumber(month);

        if (returnedMonthNumber == 0) {
            System.out.println("Invalid month");
        } else {
            System.out.println(returnedMonthNumber);
        }
    }
}

I have not measured, but I believe that the switch statements compile to a jump table instead of a long list of comparisons. This should be even faster.

Regarding your actual question: If you only use it once you do not need to make it into a constant. Consider however that a constant can be documented and shows up in Javadoc. This may be important for non-trivial string values.

  • 2
    Concerning the jump table. The String switch is replaced by to switches, first is based on the hashcode (equality is checked for all constants with same hashcode) and selects the branch index, second switches on the branch index and selects the original branch code. The latter one is clearly suitable for a branch table, the former is not due to the distribution of the hash function. So any performance advantage is probably due to the hash based realization. – scarfridge Apr 24 '12 at 08:09
  • A very good point; if it performs well it might be worth moving to Java 7 just for this... – gutch Apr 27 '12 at 06:16
6

Try this. The initial reflection is certainly expensive, but if you're going to use it many many times, which I think you will, this is most certainly a better solution what what you're proposing. I don't like using reflection, but I find myself using it when I don't like the alternative to reflection. I do think that this will save your team a lot of headache, but you must pass the name of the method (in lowercase).

In other words, rather than pass "name", you would pass "fullname" because the name of the get method is "getFullName()".

Map<String, Method> methodMapping = null;

public Object getNode(String name) {
    Map<String, Method> methods = getMethodMapping(contact.getClass());
    return methods.get(name).invoke(contact);
}

public Map<String, Method> getMethodMapping(Class<?> contact) {
    if(methodMapping == null) {
        Map<String, Method> mapping = new HashMap<String, Method>();
        Method[] methods = contact.getDeclaredMethods();
        for(Method method : methods) {
            if(method.getParameterTypes().length() == 0) {
                if(method.getName().startsWith("get")) {
                    mapping.put(method.getName().substring(3).toLower(), method);
                } else if (method.getName().startsWith("is"))) {
                    mapping.put(method.getName().substring(2).toLower(), method);
                }
            }
        }
        methodMapping = mapping;
    }
    return methodMapping;
}

If you need to access data contained within members of contact, you might consider building a wrapper class for contact which has all methods to access any information required. This would also be useful for guaranteeing that the names of the access fields will always remain the same (I.e. if wrapper class has getFullName() and you call with fullname, it will always work even if contact's getFullName() has been renamed -- it would cause compilation error before it would let you do that).

public class ContactWrapper {
    private Contact contact;

    public ContactWrapper(Contact contact) {
        this.contact = contact;
    }

    public String getFullName() {
        return contact.getFullName();
    }
    ...
}

This solution has saved me several times, namely when I wanted to have a single data representation to use in jsf datatables and when that data needed to be exported into a report using jasper (which doesn't handle complicated object accessors well in my experience).

Neil
  • 22,670
  • 45
  • 76
  • I like the idea of a wrapper object with the methods called via `.invoke()`, because it does away with string constants entirely. I'm not so keen on runtime reflection to set up the map, though maybe executing `getMethodMapping()` in a `static` block would be OK so that it happens on startup instead of once the system is running. – gutch Apr 27 '12 at 06:13
  • @gutch, wrapper pattern is one I use often, since it tends to risolve a lot of problems related to interface/controller. Interface can always use the wrapper and be happy with it and the controller can be turned inside out in the meantime. All you need to know is what data you want available in the interface. And again, I say for emphasis, I don't normally like reflection, but if it's a web application, it's completely acceptable if you do it on startup since the client isn't going to see any of that wait time. – Neil Apr 27 '12 at 08:42
  • @Neil Why don't to use BeanUtils from Apache commons? It also support embedded objects. You can go through a whole data structure *obj.attrA.attrB.attrN* and It has many other possibilites :-) – Laiv Apr 23 '16 at 20:30
  • Instead of mappings with Maps I would go for @Annotations. Something like JPA does. To define my own Annotation for mapping controller entries (string) with an specific attr or getter. To work with Annotation is quite easy and It's available from Java 1.6 (I think) – Laiv Apr 23 '16 at 20:34
4

If you're going to maintain this (make any sort of nontrivial changes ever), I might actually consider using either some sort of annotation-driven code generation (perhaps via CGLib) or even just a script that writes all the code for you. Imagine the number of typos and errors that could creep in with the approach you're considering...

Steven Schlansker
  • 1,616
  • 14
  • 10
  • We considered annotating existing methods, but some mappings traverse multiple objects (for example "country" mapping to `object.getAddress().getCountry()`) which is difficult to represent with annotations. The if/else string comparisons aren't pretty, but they're fast, flexible, easy to understand and easy to unit test. – gutch Apr 24 '12 at 04:40
  • 1
    You're right about the potential for typos and errors; my only defence there is unit tests. Of course that means yet more code... – gutch Apr 24 '12 at 04:45
2

I would still use constants defined at the top of your classes. It makes your code more maintainable since it's easier to see what can be changed at a later time (if necessary). For instance, "first_name" could become "firstName" at some later time.

Bernard
  • 8,859
  • 31
  • 40
  • I agree, however, if this code will be generated automatically and the constants are not used elsewhere, then it does not matter (the OP says they need to do this in 100 classes). – NoChance Apr 24 '12 at 03:18
  • 5
    I just don't see the "maintainability' angle here you change "first_name" to "givenName" once in one place in either case. However in the case of named constants you are now left with a messy variable "first_name" which refers to a string "givenName" so you probably want to change that as well so you now have three changes in two places – James Anderson Apr 24 '12 at 05:34
  • 1
    With the right IDE, these changes are trivial. What I am advocating is that it is more apparent _where_ to make these changes because you've taken the time to declare constants at the top of the class and don't have to read through the rest of the code in the class in order to make these changes. – Bernard Apr 24 '12 at 12:26
  • But when you are reading the if statement, you have to go back and check that the constant contains the string you think it contains -- nothing saved here. – James Anderson Apr 25 '12 at 10:26
  • 1
    Perhaps, but this is why I name my constants well. – Bernard Apr 25 '12 at 13:35
1

If your naming is consistent (aka "some_whatever" is always mapped to getSomeWhatever()) you could use reflection to determine and execute the get method.

scarfridge
  • 1,796
  • 13
  • 20
  • Better getSome_whatever(). Might be breaking camel case, but it's far more important to make sure reflection works. Plus it has the added advantage that it makes you say, "Why the heck did we do it that way.. oh wait.. Wait! George don't change the name of that method!" – Neil Apr 24 '12 at 07:29
0

I guess, annotation processing could be the solution, even without annotations. It's the thing which can generate all the boring code for you. The downside is that you'll get N generated classes for N model classes. You also can't add anything to an existing class, but writing something like

public Object getNode(String name) {
    return SomeModelClassHelper.getNode(this, name);
}

once per class should not be a problem. Alternatively, you could write something like

public Object getNode(String name) {
    return getHelper(getClass()).getNode(this, name);
}

in a common superclass.


You can use reflection instead of annotation processing for code generation. The downside is that you need your code to compile before you can use reflection on it. This means you can't rely on the generated code in your model classes, unless you generate some stubs.


I'd also consider direct use of reflection. Sure, reflection is slow, but why is it slow? It's because it has to do all the things you need to do, e.g., switch on the field name.

maaartinus
  • 2,633
  • 1
  • 21
  • 29