Writing a new programming language - when and how to bootstrap datastructures?

Question

I'm in the process of writing my own programming language which, thus far, has been going great in terms of what I set out to accomplish. However, now, I'd like to bootstrap some pre-existing data structures and/or objects. My problem is that I'm not really sure on how to begin.

When the compiler begins do I splice in these add-ins so their part of the scope of the application?

If I make these in some core library, my concern is how I distribute the library in addition to the compiler--or are they part of the compiler?

I get that there are probably a number of plausible ways to approach this, but I'm having trouble with the setting my direction. If it helps, the language is on top of the .NET core (i.e it compiles to CLR code).

Any help or suggestions are very much appreciated!

Even such a trivial, low level language as C cannot be served without a runtime library. See, for example, how gcc is handling `crt0.o`, `crtbegin.o` and alike. For .net, you're likely to end up with a single runtime library dll, which have to be shipped with all the binaries produced by your compiler. — SK-logic, Jul 10 '12 at 11:09
@SK-logic Low-level languages such as C can very well be used without a runtime library. When the computer starts, there is nothing, not even a runtime library. Yet, I've written a simple operating system kernel in C, which, of course, does not depend on some crt files or anything from the runtime. Things like _memcpy_ had to be written from scratch, directly as part of the kernel, but structs and unions are supported by the compiler without a runtime. It may not be simple, but it is possible and therefore your statement is false. — Daniel A.A. Pelsmaeker, Aug 09 '12 at 13:49
@Virtlink, of course - you either use a default runtime library or you have to provide your own. — SK-logic, Aug 09 '12 at 14:14
@SK-logic For most languages, in most cases, you are right that a runtime is needed. But if I provide _my own_ then I don't call it a runtime. Its just a bunch of functions, not required but merely nice to have, that can do anything you want and have any name you want. There is no requirement to have anything runtime-like or named `memcpy` for example. It's even in the same executable, which is clearly not the case for a runtime. — Daniel A.A. Pelsmaeker, Aug 09 '12 at 14:19
Strange question and terminology: usually, we bootstrap a *compiler*, not some *datastructures*. — Basile Starynkevitch, Jul 26 '14 at 16:56

score 1 · Answer 1 · answered Jul 10 '12 at 12:23

When the compiler begins do I splice in these add-ins so their part of the scope of the application?

That depends on how the language is designed. If there is special syntax to define a list for example, then you should include the list. If there's not, then it should be part of a (standard) library and follow the rules of other libraries.

If I make these in some core library, my concern is how I distribute the library in addition to the compiler--or are they part of the compiler?

Again, if the language syntax requires the structures to function then maybe just include them in the compiler. Otherwise make them a library and bundle the library with the compiler, or make it a simple install dependency.

However, it's often good to keep your compiler as simple as possible. If your data structures can be expressed in the language you are writing and can't be optimized by the compiler explicitly, perhaps leave the definition of the structures to a library in your new language. — David Cowden, Jul 10 '12 at 13:45

score 1 · Answer 2 · answered Jul 26 '14 at 10:44

In the core runtime library, which you will distribute with your compiler.

Your question is a bit abstract, which I think has misled others in how they've tried to answer. I shall assume your language has a domain of interest, and there are some core objects (base classes, perhaps) which are critical for the language. If your language was for writing games then perhaps you might have some special base classes for players, non-player characters, objects and terrain. This would allow you to have special keywords and language constructs for creating games.

While it is tempting to somehow 'emit' those classes from inside the compiler, in practice it makes more sense to write them as a separate library with known features and have the compiler assume that they will exist. On top of the 'core' library you then build one or more 'extension' or 'add-on' libraries as the language system grows. The compiler knows nothing about these.

In fact it turns out that you should put as little as possible inside the compiler and the core because it is far easier to extend the language system by add-ons than by extending the compiler and core itself.

If this is off the mark, please edit your question to provide some more concrete details.

score 0 · Answer 3 · answered Aug 09 '12 at 14:01

The .NET Framework and all its tools form a two-part system: it has a compiler that compiles a language such as C# into an assembly with CIL code, and a just-in-time (JIT) compiler that compiles the CIL to machine code.

A big advantage of the CLR is that you can specify any type you want just by using its fully qualified name. In addition, the assembly specifies in which assemblies such types can be found. Then when later the JIT compiler comes along, it will look at the fully qualified type name, load the referenced assembly that contains the type, and use that. The type does not have to be known beforehand, and there is never an issue with circular references. So if you put some types in a core library, all you have to do is to make sure the CLR assemblies generated from your language reference your core library, and then the types in them are instantly available for use.

The JIT compiler includes only logic to handle some primitive types (32-bit integer, 64-bit integer, float, managed references and managed pointers) and knows about some very special object classes (Object, Enum, ValueType, Delegate, ...). Almost all other functionality is either simulated with the primitive types (e.g. boolean true is any integer with a non-zero value) or supported through (virtual or non-virtual) method calls (properties, methods, overloaded operators, coercions and conversions all use method calls internally). Only some core functionality that cannot be expressed (efficiently) in a managed language (such as adding integers, or allocating arrays) is integrated in the JIT compiler.

score 0 · Answer 4 · answered Jul 29 '14 at 06:34

Since you are building a language on top of the .NET runtime I think what you want to do is distribute a DLL that will initialize the wanted structures/environment, then call an entry point in that DLL in your compiler-generated MSIL code, before you execute the main() function (or however your programming language finds its entry point) of the user's program.

This is analogous to how libc variables and runtime dependencies are initialized by assembly code that is executed before the user's main() function [or more precisely the assembly code generated for the prolog+body of main()] is actually called in C.

Writing a new programming language - when and how to bootstrap datastructures?

4 Answers4