3

I'm working on designing a new programming language and trying to decide how I will do variable comparisons. Along with many different types of languages, I've used PHP for years and personally had zero bugs related to its comparison operations other than situations where 0 = false. Despite this, I've heard a lot of negativity towards its method of comparing types.

For example, in PHP:

 2  <  100      # True
"2" < "100"     # True
"2" <  100      # True

In Python, string comparison goes like this:

 2  <  100      # True
"2" < "100"     # False
"2" <  100      # False

I don't see any value in Python's implementation (how often do you really need to see which of two strings is lexicographically greater?), and I see almost no risk in PHP's method and a lot of value. I know people claim it can create errors, but I don't see how. Is there ever really going to be a situation where you are testing if (100 = "100") and you don't want the string to be treated as a number? And if you really did, you could use === (which I've also heard people complain about but without any substantial reason).

So, my question is, not counting some of PHP's weird conversion and comparison rules dealing with 0's and nulls and strings mixed with characters and numbers, are there any substantial reasons that comparing ints and strings like this is bad, and are there any real reasons having a === operator is bad?

dallin
  • 412
  • 4
  • 10
  • 20
    `how often do you really need to see which of two strings is lexicographically greater?` Umm... *every single time you sort a list of strings*, just off the top of my head. – Mason Wheeler Feb 05 '14 at 03:10
  • @MasonWheeler lol, got me there! Other than via database ORDER BY, I haven't sorted a list of strings in a while, so it didn't occur to me. Even so though, when I have created sorting algorithms, I always had to make special cases for numbers at the beginning of strings since users don't want them sorted lexicographically but numerically. – dallin Feb 05 '14 at 03:23
  • With all the questions that get shut down, why is this still open? PHP developers don't have a problem with the way PHP handles comparisons. Non PHP users likely do as they tend to disagree with PHP on the whole. It's that simple. "Why is PHP's method of comparing different types bad"? With 240+ millions sites doing fine with it, I'd consider it adequate to say the least. – JSON Feb 05 '14 at 09:57
  • I'm sure many see this question as an opportunity for intellectual discussion, but the real answer is that PHP type comparisons are not considered bad. Just because an issue has room to be criticized doesn't make it a bad thing when you consider how subjective criticism is. "The other side" might consider it bad, but that's universal. I personally find the lack of real IPC in Java a real problem. This is likely the most subjective question I've came across on SE. – JSON Feb 05 '14 at 10:30
  • @ClosetGeek This question is not asking for opinions but for factual negatives and positives to help someone form their own opinion, in this case for the purpose of designing a new language. "Intellectual discussion" is not what the question was aiming for. If it's worded in a way that incites that, it was not my intention and I'd be glad to edit the question. – dallin Feb 05 '14 at 18:26
  • @dallin What is a "factual negative"? The only facts are the defined behaviors for the operators; everything beyond that is opinion. – nmclean Feb 05 '14 at 19:42
  • @nmclean I disagree. That's like saying there is no such thing as negatives at all because saying anything has negatives is all a matter of perspective and opinion. You could say smoking causing cancer is not a negative by this same line of logic. You could say GOTO statements have no negatives. If someone shows me code examples where type juggling can cause unexpected errors, those are factual negatives. That's what I'm looking for. – dallin Feb 05 '14 at 20:06
  • 1
    I interpret "factual" as "objectively true" and find your statement oxymoronic because a "negative" is something *undesirable* and desire is subjective. Ultimately, it comes down to how humans want to be able to express logic. – nmclean Feb 05 '14 at 20:30
  • @dallin - "fact" us indisputable, or is at least without reasonable dispute. The issues this question addresses are highly disputed. – JSON Feb 05 '14 at 21:33
  • 1
    as for a citation, try the [dictionary](http://www.merriam-webster.com/dictionary/objective). PHP's comparison behavior is based on how it handles dynamic types. There'a an actual function to it, rather than just 'bad' design. Some may prefer a more standard behavior, but this sacrifices the functional capability of comparing two different types which may have the same value (such as "105" vs 105). In the end it comes down to how different methods might make more or less sense to different people in different situations. In other words it's subjective. – JSON Feb 05 '14 at 21:45
  • 1
    @ClosetGeek I don't think you understand me. I'm asking if there are any objective negatives to string to int type juggling. I am not confused about the difference between objective and subjective and I'm aware of the common subjective opinions on PHP and its usefulness. – dallin Feb 06 '14 at 18:42
  • @dallin I edited my answer a bit. Also keep in mind that removing '===' will require string to bool, string to null, int to bool, etc. Also, _I_ was rather subjective in my prior response. There's enough nonsense criticism of PHP that it tends to get under my skin. – JSON Feb 06 '14 at 20:46
  • @ClosetGeek Thank you for your edit. It was very enlightening and was just what I was looking for. +1. – dallin Feb 06 '14 at 20:53

6 Answers6

23

The biggest problem is that an equivalence relationship, the mathy term for things like ==, is supposed to satisfy 3 laws.

  1. reflexivity, a == a
  2. commutativity a == b means b == a
  3. transitivity a == b and b == c means a == c

All of these are very intuitive and expected. And PHP doesn't follow them.

'0'==0 // true
 0=='' // true
'0'==''// false, AHHHH

So it's not actually an equivalence relationship, which is a pretty distressing realization for some mathy people (including me).

It also hints at one of the things that people really hate about implicit casts, they often behave unexpectedly when combined with the mundane. It's basically just an arbitrary set of rules because it's unprincipled in this sense, weird stuff happens and it all needs to be specified case by case.

Basically we sacrifice consistency and the developer has to shoulder the extra burden of making sure there's no funny (and expensive) conversions happening behind the scene's. To quote this article

Language consistency is very important for developer efficiency. Every inconsistent language feature means that developers have one more thing to remember, one more reason to rely on the documentation, or one more situation that breaks their focus. A consistent language lets developers create habits and expectations that work throughout the language, learn the language much more quickly, more easily locate errors, and have fewer things to keep track of at once.

EDIT:

Another gem I stumbled across

  NULL == 0
  NULL < -1

So if you try to sort anything, it's nondetermistic and entirely dependent on the order in which comparisons are made. Eg suppose bubble sort.

  bubble_sort([NULL, -1, 0]) // [NULL, -1, 0]
  bubble_sort([0, -1, NULL]) // [-1, 0, NULL]
daniel gratzer
  • 11,678
  • 3
  • 46
  • 51
  • 1
    The solution is *obviously* to pull perl6's equality operators... `eq` vs `==` vs `~~` vs `===` vs `eqv` vs `=:=`. Shouldn't confuse anyone. –  Feb 05 '14 at 01:35
  • 1
    @MichaelT Absolutely. There is no other possible way to solve this issue then with 6 (count 'em) operators. – daniel gratzer Feb 05 '14 at 01:36
  • I understand this, but this doesn't exactly answer the question I meant to ask. I said "not counting some of PHP's weird conversion and comparison rules dealing with 0's and nulls". I was more wondering about the concept of allowing 100 == "100", which would still satisfy the 3 laws. Is there any problem with that? – dallin Feb 05 '14 at 01:38
  • 3
    @dallin That's in the last part, it leads to weird unintuitive behavior because either you allow `0 == '0'` to maintain consistency and sacrifice an equivalence relation. Basically since we're just arbitrarily adding cases to make it "easier" we make a lot of things weirder – daniel gratzer Feb 05 '14 at 01:43
  • @jozefg Thanks for the explanation. If 0 were to be equal to true instead of false, would this solve all the problems then? Are there any other cases like this? – dallin Feb 05 '14 at 01:48
  • 1
    @dallin you'd have a whole host of [other problems](http://programmers.stackexchange.com/questions/198284/why-is-0-false). –  Feb 05 '14 at 01:49
  • @dallin That'd just be weird, in The Good Old Days, we didn't have booleans, we had 1 (true) and 0 (false). In fact C didn't have booleans until quite recently. Most people would be thrown off if suddenly `0` was true. And transitivity would still be broken. – daniel gratzer Feb 05 '14 at 01:49
  • Programmers coming from static typed languages to PHP often confuse type juggling as a problem with dynamic typing. The two are not related. `C#` has type juggling as well. Objects can also implement `IConvertible` to support type juggling. The problem here is that PHP changed the meaning of the `==` operator. Programmers don't like that. It pist them off. So people complain about it. Stop expecting `==` in PHP to do what `==` does in `C`. If you can't learn how to use `==` correctly in PHP then you shouldn't be using PHP. – Reactgular Feb 05 '14 at 16:56
  • 3
    @MathewFoscarini My answer hedges that in PHP `==` behaves very counterintuitively to how any sane definition of *equality* is supposed to work. Perl has pervasive implicit coercions as well and it still implements `==` correctly. Sure you should know how your language works but all languages have WTF parts, this qualifies as one of them. – daniel gratzer Feb 05 '14 at 18:50
3

Its the old practitioners versus theoreticians conflict.

In Computer Science theory strong typing etc is considered good, and, there are a lot of theoretical (and some practical) ambiguities that arise e.g. do you really want ("2.000" == "2.0") to evaluate to true?

However in practice its just plain nice to be able to code:

if (user_choice == 2)

rather than the equivalent Java

try {
    if (Integer.parseInt(user_choice) == 2) {
       isTwo = true;
    } else {
       isTwo = false; 
} catch (NumberFormatException nex) {
       isTwo = false;
}
James Anderson
  • 18,049
  • 1
  • 42
  • 72
  • 3
    Java is a bad example, we all know that it's verbose. Consider the equivalent Python `2 == int(user_choice)` or even the very strongly statically typed Haskell `2 == read userChoice`. We used an extra 4-5 characters and documented the fact that the type is a string. This is an acceptable trade off to me but it's of course, personal. I think it's more a "consistency vs character count" issue, but a rather small character count in this case.. – daniel gratzer Feb 05 '14 at 01:47
  • 3
    Not really knocking the other languages -- but php is a great language for getting stuff done and all the criticism from CS types tends to ignore this. 80% of the web is now served by php its the "Model T Ford" of computer languages -- not the best but it was at the right place at the right time for the right price and it does the job reliably and efficiently if not elegantly. – James Anderson Feb 05 '14 at 01:56
  • 1
    Of course, I can be eggheaded at times but frankly if it gets the job done, then it gets the job done. But PHP does have some bad design decisions in it, objectively. It doesn't mean that it's impractical; all languages have some. And there are some reasons why having some implicit coercions is annoying. – daniel gratzer Feb 05 '14 at 01:59
  • 1
    Not nearly as bad as Javas ".equal()" syntax. :-) – James Anderson Feb 05 '14 at 02:02
  • 1
    @jozefg The equivalent Python example is actually `try: is_two = (2 == int(user_choice)) except ValueError: is_two = False` - not very much nicer at all. (And the Java example could be simplified to use the same boolean assignment, making them essentially the same - the Java one would only be longer because of closing braces) – Izkata Feb 05 '14 at 04:48
  • `if(String.valueOf(userChoice).equals("2"))` - is it really that much more verbose/"impractical"? The difference is about whether you want to trade consistency for some temporary promptness, to paraphrase a certain statesman. – mikołak Feb 05 '14 at 06:35
  • 1
    @TheTerribleSwiftTomato -- given that PHP is mostly used for web apps and webapps and a large part of a web app is accepting text fields from a form, would say the "(String.valueOf(" and ".equals(" for each numeric form field is a large amount of extra verbage to deal with for no discernible benefit. Also given that you are dealing with human input its very nice to have "02.0" "2" "2.000" all equal to 2 and most other strings to just evaluate to "false". – James Anderson Feb 05 '14 at 06:51
  • @JamesAnderson : do you really process every text field by hand, or do you use a validation library (as part of a framework or otherwise)? The benefit of PHP's promiscuous implicit conversion is that it makes it easier for beginners to write their first apps - once you start writing your 10th one you usually a) tend to get sick of repeating yourself and b) become aware that accepting such inputs as `"002.0'; DROP DATABASE important_client_data;'"` can be dangerous. – mikołak Feb 05 '14 at 08:06
  • 1
    @JamesAnderson: Frankly, I couldn't disagree more. Maybe I'm a "CS type", but as a software engineer who likes to get things done at speed I've never bought into the idea that hacky code is fast. You get your 0.1 up faster, sure, but the road to 1.0 is made much longer by a poorly architected and buggy system. Speaking specifically for PHP, my own experience is that I'm much faster to get a page rendering in PHP, but I'm much faster to get something useful working in python. – Phoshi Feb 05 '14 at 09:23
  • 1
    @Phoshi --I don't really want to knock other languages. But I do make the point that php is the chosen development language for 80% of web sites -- including at least three of the top ten. There were sound reasons for this. While I like Python its http support is fragmented to put it politely and has just 1 top 100 user (admittedly it is Google :-) ). – James Anderson Feb 05 '14 at 09:58
  • @JamesAnderson: There are many potential reasons for this, and enough of them have nothing to do with language "quality" that you cannot use it as such. Python isn't a dedicated web language and so has many high quality web frameworks, this is not a disadvantage. Indeed, sanely written PHP always uses a framework as well, making it just as "fragmented" (or better put, gives people a choice of good implementations, which is never bad). PHP's popularity, IMO, comes more from being trivial to hack something together in and being preinstalled on $1/m hosting packages. – Phoshi Feb 05 '14 at 10:01
  • 1
    @Phoshi -- precisely the reason it was chosen for Facebook! Just so happens that it easily scales from a college project to a billion cute puppy pictures. – James Anderson Feb 05 '14 at 10:04
  • @JamesAnderson - reliable and efficient isnt the issue. Facebook made it as the most feature rich service on the web while it was still pure PHP (as well as the most lucrative user drivin platform.) This is in fact where the problem lies. Lower grade developers are taking their jobs because they can do the same thing. They can argue about '0' == true vs '0' === true while they pack their desk. I'm sure many do. – JSON Feb 05 '14 at 10:14
  • 1
    @JamesAnderson: Well, given that they ended up having to write their own optimising compiler/VM they aren't really running the same "PHP". Additionally, quality of language design and 'scalability' are at best orthogonal, and realistically opposed concerns. C is very scalable, but few would call it an expressive programmer-first language with no gotchas. The simple fact of the matter is that market penetration has more to do with history and hassle-free usage (not, mind you, hassle free setup. Python/django was significantly more straightforward, but few set up their own PHP stacks). – Phoshi Feb 05 '14 at 11:07
  • 1
    @Izkata Of course the only reason we should do integer parsing is if *do* want to handle the formatting exception. If we just want to check if it "equals 2" the same way that PHP does (regardless of whether it is actually a number), the equivalent code is simply `if user_choice == str(2)`, compared to Java's `if (user_choice.equals(String.valueOf(2)))`. – nmclean Feb 05 '14 at 20:00
  • @nmclean Good point, but in that case Java version should be `if ("2".equals(user_choice))` – Izkata Feb 05 '14 at 20:18
  • @Izkata Only if "2" is actually a hard-coded literal, which in most cases it shouldn't be. – nmclean Feb 05 '14 at 20:32
3

One of the more complicated projects that I have undertaken was embedding the PHP library into Google's v8 in order to allow javascript to create and otherwise access PHP objects directly, without a form of bridge, and without any form of interpretation (outside of javascirpt that is). This required access to the zend tables of a PHP object, and all properties and methods of a PHP object were accessible directly in this way from javascript.

From this experience I also got a taste of the challenge of handling PHP data types within C (the lower level structures that is) in a way that is useful in a more dynamic, abstract level above. This was within javascript at this point, not PHP, but was still the value of the true, internal representation of a PHP object.

There are many challenges in these situations, including when it comes to comparisons. In the end though, at least in my opinion, it was more important to respect and adhere to the dynamic nature of PHP. This was a matter of function, not a matter of logical interpretation or intuitive perception etc.

The discussion of whether "0" == true needs to be based on what "0" really is. As a numerical type, 0 is the binary value for false. However, as a string, "0" is valid, non null, and non empty. So in the end, it's a matter of whether you think "0" should be seen as a string, a numerical type, or as third option, whether you should try to handle it as both in different situations. Each view has valid, empirical reasons for why one way is more or less useful than the other, yet in the end there is no real or "factual" answer to this, and will always come down to how you, I, and the other guy values "0".

Edit: Also keep in mind that comparisons are one of the most basic operation of a language. While it is possible to make comparisons more "intelligent", this requires more logic within the actual comparison operation itself. With the frequency that comparisons are used, this can cause a negative effect on performance, especially with string type. So even this side of the discussion would likely become a matter of preference and/or priority.

Edit 2: To answer the root of your question more fully, dynamic types need either a form of dynamic comparison or a larger variety of comparison operators.

  • '==' alone will only suffice if the operation '==' performs is smart enough to know how to handle each type under all use scenarios, specifically if the left operand and the right operand are different types. The downside to the more dynamic operation is a likely performance hit. The smarter the operation is, the more noticeable the cost is likely to be.
  • more forms of operators (such as the addition of '===') may seem foreign to people new to your language and will likely receive criticism.
  • third option would be forcing appropriate comparisons for each type at all times, but this will sacrifice dynamism and will go against the goals of a dynamic language (at least in my opinion).

PHP tries to handle comparisons dynamically, but includes '===' to allow strict comparisons as well. The only other option would be to make the default comparisons 'smarter' which will cost performance.

While it's true that the additional operators add complication and dynamic comparisons leave room for error, a junior high student should be able to grasp the concept (and many do). '==' means that left and right operands can be different types, but can lead to different results in specific, defined situations. '===' means that left and right must be the same types. As for practice, it's better to use '===' when checking the return value of a function, as well as is_string, is_int, is_bool, etc. This is the downside of a dynamic language, and not doing so will likely lead to unexpected comparison results at times.

But in the end, a 'good' comparison will always follow a defined, documented behavior. A language can use Klingon for comparisons as long as the results are well defined.

JSON
  • 321
  • 2
  • 11
1

Another point is this. When I have

"x" == "x"
"2" < "100"

I'd expect also that

"x2" < "x100"
"2x" < "100x"

In other words, comparision result should remain the same when I append the same string on the front or the end. This is the same with numbers:

if a < b then so is 2*a < 2*b and 2+a < 2+b and a-2 < b-2

=== EDIT ===

It looks like I got downvoted because people read the above in the following way:

String operations must behave like numerical ones.

or

String operations must behave like I said because numerical ones do.

And then give the "counterexample" of

2+a == a+2 therefore "2x" == "x2"??

No, that's not what I said.

Here it is more formal:

If e and § form a Monoid on some type T, then we should have it that, for any x,y,z of type T the following holds:

(z § x) < (z § y) <=> x < y
(x § z) < (y § z) <=> x < y

Since

0 and + form a Monoid on Integers
1 and * form a Monoid on Integers
"" and append form a Monoid on Strings
[] and list concatenation form a Monoid on lists, ...

we could expect the above general laws to hold. They don't in PHP, so that's why we say PHP string comparision is broken.

Observe that

a § b == b § a

is not a requirement for Monoids, though it is true for addition and multiplication of numbers. Specifically, it is invalid to conlude that string append must be commutative because integer addition is commutative.

Ingo
  • 3,903
  • 18
  • 23
  • Downvoters, any comments? – Ingo Feb 06 '14 at 09:07
  • `2+a == a+2`; therefore `"x2" == "2x"`? – nmclean Feb 06 '14 at 16:29
  • @mmclean Nonsense. I nowhere said that. I said that, if `a < b` and you do append or prepend the same to both a and b, the relation should still hold for the results. Go, try in any language except PHP and you will see that strings behave the way I gave as examples above. (That is "x2" > "x100" **because** "2" > "100" **because** '2' > '1') – Ingo Feb 06 '14 at 16:40
  • *"I nowhere said that"* -- Exactly. The fact that you **don't** agree with this shows the logical inconsistency of your point. **If** we should expect the results to be the same after adding to both sides **because** that is true with numbers, **then** we should expect string adding rules to be consistent with the numerical counterpart as well. – nmclean Feb 06 '14 at 16:59
  • @nmclean I didn't say **because it is true with numbers** (I just observed that some numerical operators are ordering preserving, *just like* string appends.). But even if I did say "What is true for string operations, is also true for number operations." the conclusion that the reverse holds is a logical fallacy. (Compare: *When it is true that p is prime, then it is also true that p is a number.*) – Ingo Feb 06 '14 at 17:49
  • Your last sentence is true, but inapplicable: *"x should be true for strings because it is true for numbers"* **means** *"what is true for numbers should be true for strings"*; I did not need to *infer* any "reverse" holding true because the conclusion was already directly given. And if the number behavior is **not** the basis of your expectation, then what is? – nmclean Feb 06 '14 at 18:02
  • @nmclean You still invent the **because**. Just remove (mentally) starting from *This is the same with numbers*. Then you see the basis of my expectation better: *[string] comparision result should remain the same when I append the same string on the front or the end [on both operands]*. – Ingo Feb 06 '14 at 18:12
  • Hence the question: *And if the number behavior is **not** the basis of your expection, then what is?* Without the "because" connection, there is no support given in your answer. **Why**, given "2" < "100", would you expect that "2x" < "100x"? What is your point of reference? – nmclean Feb 06 '14 at 18:19
  • @nmclean - See my edit. Note that the *monoid operation with constant preserves ordering* is the basis for implementing string, or more generally, list comparision: if the head elements are equal, you just compare the rest. This is how it is done in every programming language except PHP and in the CPU - the comparision is complete on the first unequal character. – Ingo Feb 06 '14 at 18:37
1

Data on the Web is transmitted as strings, for example a GET request: www.foo.bar/?number=2

Even though we intuitively know that this is a number that is sent as a request and not a string, technically it is a string since it is the only type of data you can get from an url.

IMO, Web languages such as PHP and Javascript have dynamic typing allowing automatic conversion of types depending on context mainly because of that. This is a pragmatic approach and I think this is also why they vastly dominate the Web.

I think that it's likely that if PHP had been created for another environment than the Web (like Python, C, whatever...) the typing rules would have been different and probably more strongly typed and less dynamic, but this is a Web language and dynamic typing fits the ecosystem it is in.

Pascalc
  • 166
  • 2
0

You ask what situation would ever come up in which you want to compare 100 to "105" and want the string to be converted to a number, but I'm having more trouble coming up with a situation in which you'd want to deliberately compare 100 to "105" at all.

This situation happens in PHP all the time, of course, as well as in many other programming languages, but convert-and-compare is almost never what you actually want. Very nearly 100% of the time, you want to compare two numbers or two strings, and the only reason the types don't match is that something's wrong in the way you process the user input.

PHP's tries to Do What I Mean, which it assumes to be "convert the string to a number, and compare the numbers." At first blush, this sounds like a reasonable way of doing things, as long as PHP's assumptions about what you want are correct. But the moment you depart from those assumptions, this comes back to bite you. It might not even cause harm during that particular operation, instead wreaking havoc much later down the line, but the earlier the error can be caught, the easier it is to find the source and fix it.

And that, ultimately, is the problem with PHP's type comparison: in a well-meaning attempt to Do What I Mean, it masks errors that come up earlier in the code. Hidden bugs stay hidden because of it, only surfacing later to cause trouble.

The Spooniest
  • 2,160
  • 12
  • 9