2

Implementing operator overloading for e.g. + (plus) isn't terribly difficult if you know it's a binary operator. One can just parse expression + expression

But what if the programmer can choose whether + is binary, unary (prefix/postfix), part of a ternary, or something else? E.g.

  • .. ? .. : ..
  • ..²
  • .. | .. | .. | .. | ..
  • .. 大 .. 小 ..
  • 不 ..

The problem seems to be that the arity should be known at parsing time, but it isn't, it's only known after the whole compiling/transpiling.

Some ideas:

  • I think the whole problem kind of goes away by switching to Lisp-like syntax (not familiar enough with Lisp yet), but I'm gonna make it difficult by saying the syntax has to be somewhat C-like as in the examples.
  • Some kind of restriction like having overloads at the top / in a separate file so they can be compiled first.
  • (Maybe Arity must be fixed for each symbol - doesn't really answer the question though).

Is there a way that isn't terrible?

Does it become more possible by assuming operators can be distinguished from other identifiers at the parsing phase (e.g. other identifiers are alphanumeric)?

Kilian Foth
  • 107,706
  • 45
  • 295
  • 310
Mark
  • 662
  • 5
  • 12
  • 1
    Tangent (https://github.com/Telastyn/Tangent) supports this, but its terribleness is up for debate. Under the covers it effectively does your second idea, parsing the declarations of functions before the bodies of them. – Telastyn Aug 19 '17 at 23:59

2 Answers2

4

If a language supports user-defined operators (as opposed to overloading of existing operators), then the language is generally structured in a way so that arity, associativity, and precedence are known at parsing time!

Of course, this also prevents you from using a run-of-the-mill parser generator, but there are alternatives (like writing a parser by hand).

An operator must be declared before it is used. This is similar to how C/C++ functions must be declared before they are called.

Example: Haskell. Here operators can be freely declared from a set of allowed characters. You can specify associativity, arity, and precedence level with a declaration like infixr 5 **. Otherwise, operators are declared like ordinary functions, e.g. (++) a b = concat a b. When operators are enclosed to one or both sides with parentheses, they lose their arity, e.g. we can write equivalently a + b, (a +) b, (+ b) a, (+) a b.

Example: Scala. In Scala operators are just normal methods, and methods don't have to be invoked with dot notation: so a.m(b) is equivalent to a m b. So if we define a method called ++ we can write a ++ b. Some operators are more special and can only be overloaded with specially named methods, e.g. the function call operator with the apply() method. The precedence level of an operator is determined by the first character in the operator. So the operators +, +*, and +| all have the same precedence.

Example: Perl6. The Perl6 language includes a powerful grammar engine. New rules can be added to the grammar on the fly, for example by declaring a subroutine as an operator. An operator is always declared with an arity, so multi sub prefix:<++>($x), multi sub infix:<++>($x, $y), and multi sub postfix:<++>($x) can coexist as different operators. Operators are not restricted to symbols but can also contain letters. All in all the operator system of this language is extremely complex and flexible.

amon
  • 132,749
  • 27
  • 279
  • 375
  • It seems like Haskell and Perl6 achieve something like what I want, thanks for the examples! For `An operator must be declared before it is used. This is similar to how C/C++ functions must be declared before they are called.`: function calls are easy to parse without the function being defined, but for the operators you can't parse `a+-b` without knowing the arity of `+` and `-`, right? Does it mean that Haskell and Perl6 must connect parsing and semantic analysis, so that e.g. when an operator overload is recognized, the parser can be updated to be able to parse this operator correctly? – Mark Aug 19 '17 at 17:31
  • @Mark Functions can be interpreted as n-ary operators. Perl5 is a bit unique because a program is executed in multiple phases. BEGIN phase code is immediately executed, during parsing. This code can dynamically define functions in a way that affects parsing for the rest of the file, e.g. defining a function as nullary or unary or n-ary with low precedence → Parsing Perl is undecidable/Turing complete. Perl6 generalizes this by directly offering access to the grammar engine that parses the file. This makes static analysis supremely difficult. You don't have to make those mistakes. – amon Aug 19 '17 at 17:58
  • Scala needs to blend parsing & semantic analysis: is `a b c` a chained call `a.b.c` or a binary method `a.b(c)`? This depends on whether `a.b` provides a method `c` or is a function that takes one argument. Haskell is simpler and completely static: you need to explicitly declare all operators with precedence and associativity before they can be defined as functions and used. No semantic analysis needed for parsing. I.e. you first do `infixr <@: 9` so the parser knows how to handle it, later you can declare the operator with a type, e.g. `(<@:) :: Int -> [a] -> [a]` – amon Aug 19 '17 at 18:00
  • It seems like `you first do infixr <@: 9 so the parser knows how to handle it` contradicts `do semantic analysis needed for parsing`, doesn't it? But I think I get the general idea now, thanks for all the info! – Mark Aug 19 '17 at 18:10
  • Actually, you don't need to define associativity *before* using an operator – by default, everything will be a low-precedence left-associative infix operator, I think. Any string of symbols is interpreted as a single operator, i.e. `a+-b` is equivalent to `(+-) a b`. The declaration like `infixr 5 <@:` can be used to instruct the parser to handle an operator differently. All of that is parser-level. Later, during semantic analysis it can be determined that no `+-` function exists – Haskell will even suggest similarly named operators like `++` or `-`. – amon Aug 19 '17 at 18:13
  • I see, I get the idea now. Seems I should really make time to learn finally Haskell. This information has been very useful, thanks! – Mark Aug 19 '17 at 18:17
2

You didn't mention the language.

In Swift, there is first a syntax for character sequences that could be operators. For example, "+" is an operator not because it is built into the language, but because it is implemented as part of the standard library.

Second, there is syntax that determines whether an operator is a unary or binary operator. In A+B or A + B the "+" is a binary operator, in A * +B it is a unary operator. (A*+B would have a binary operator *+).

With this syntax, the operator is found, and then it is treated very much like a function call. If the operator isn't found, that's just like the function call syntax for an undefined function name. (I left out operator precedence and left or right associativity, which are defined per operator).

So you can easily define a unary operator *+ for example, and a binary operator *+ as well, and the compiler has no problem keeping them apart. (Swift beginners are often surprised that a+ b or a +b don't compile).

gnasher729
  • 42,090
  • 4
  • 59
  • 119
  • It's not about a specific language. Using whitespace to hint at arity is a good idea though, although I can't really visualize yet how it'd extend to ternary or higher... – Mark Aug 19 '17 at 17:23
  • In Fortress, whitespace doesn't *change* precedence, but it can be used by the programmer to tell the compiler what he *thinks* the precedence is, and the compiler will abort with an error if the programmer is wrong. IOW: `a+b*c` will compile, `a + b * c` will compile, `a + b*c` will compile, but `a+b * c` is an error. Whitespace doesn't change how an expression is parsed, but it *does* change whether the compiler accepts the expression or not. – Jörg W Mittag Aug 20 '17 at 09:09