The current big thing seems to be VMs. Java’s got one, .NET has one, hell, even Perl’s getting one. Now, Java had one for several reasons mostly write-once-run-anywheriness. I’m not sure why .NET has one. Microsoft can’t want people to write code that can run anywhere but Microsoft operating systems, unless they’re planning on moving away from x86. (Watch out Intel, they’re onto you!). Perl seems to be getting one because it’s the fashionable thing to do. “Everyone else was doing it and I wanted to be cool.”[1]
Now, for the closed source community a virtual machine is important. It mostly obscures your source code er, “Intellectual property”, and it gives you hardware independance so you don’t have to develop for lots of different platforms. But these requirement doesn’t exist for open source software. We don’t want to hide the source code, in fact, we work very hard to make sure that everyone has access to the source code. Many open source scripting languages don’t use a VM (such as PHP), and really, noone cares particularly much. Putting a PHP script under the GPL is like selling a car with a license that says that everywhere the car is, you take the chassis along too. I mean, a car that doesn’t have a chassis isn’t a car.
Now, the mono people have apparently stated that the reason they used Mono was because they wanted an easy way to provide bindings to multiple languages. They can now target .Net and then every .NET “enabled” language gets the bindings for free without a lot of work being repeated. Presumably they also used .NET because Sun will get rather angry at them if they try and use Java with bindings to Gnome. Also, programmers desperately need a decent language for writing applications in. C is a great language for writing lowlevel code in, however it’s a royal PIA for doing highlevel application development. C++ has the potential for being great if it actually had a decent standard library. Want to fetch a URL? Whoops, you’ve got to write code for HTTP, and resolving and network sockets on top of the C api’s which don’t “mesh” with C++ very well. (Look at getaddrinfo(3), and consider how much effort it takes to nicely wrap that into some C++ classes and repeating that for every program you write, compare to Java/.Net/Python/Perl libraries). Python is quite popular for writing small to medium sized gui applications, particularly with the Twisted library. But it’s not exactly zippy, and many people won’t touch it on religious grounds.[2]
So, my thought here is why are we following this lead. We have the best representation of the program at our disposal, the source code. The source, as written by the programmer has the highest level of expressibility you can achieve. Instead of trying to JIT some abstract byte code, you can compile the source directly into executable code. Instead of .pyc files littering around the place, you can have “.so”’s and executables kicking around. In fact some distro’s are starting to do this, for example Gentoo where you download the source, then compile it to meet the specifications of your computer.
So what pieces of the puzzle are we missing? Well, the major feature .NET has brought to the table has been a standard ABI. Basically the idea that programs written in one .NET language can call functions written in another .NET language, or even outside the .NET framework using P/Invoke. What we should be doing is trying to come up with a standard ABI that encapsulates all these things that compilers for different languages can all implement. This ABI could even be a subset of the C++ ABI, as C++ seems to be a superset of every language feature ever. You want strings (std::wstring)? templates? even things like garbage collection can be “easily” retrofitted into C++ (boost::shared_ptr<> or even more complicated systems). VMS did this, you could call pascal code from C, and both from fortran. The Unix people have quite happily standardised on the C ABI, almost everything supports it.
I tried doing some of these things before. Swig provides a way to allow arbitary C/C++ code to be made available to a large range of languages. I’ve tinkered with writing code that using nm(1) and c++filt(1) to automatically generate a Swig file, compile it, then have this all triggered from a python import hook so you can say “import foo” and it will go off, read the symbols out of /usr/lib/libfoo.so, process them with c++filt to get the function prototype, have it generate a swig file, compile said swig file into a .so which python can load as a module, and then proceeding to do so.
The code to do this is all fairly straight forward, and can be easily cached in ~/.autoswig/python/ or whatever. It’s dependant on reasonably standard tools, and will work for a large set of languages (basically whatever swig supports). It can be easily improved by just dlopen(3)ing the library, using libbfd(3) to parse the debugging information to get the prototypes of the functions, and some magic code to marshal data to the library, which is nearly what P/Invoke does. P/Invoke however requires you to write your own prototypes, my suggeston is to use the debugging (stabs) information that is already in the binary to build the prototype directly. My biggest issues with this was not wanting to write platform specific “magic code” (as mentioned above) since I don’t have access to a wide range of hardware. My other major issue that calling conventions are a nightmare particularly to do with whose responsibility is it to free allocated memory.
So I ask, why are the open source community wasting time on VMs when we can use the strength of our source to provide a superior solution?
[1]: Ok, I’m harsh on the perl people again :) Both perl and python have a VM that they are “compiled to” to basically cache the result of parsing the source.
[2]: Apparently many people were fortran programmers in another life and have a genetic fear of whitespace being used by the compiler, irrespective of the fact that people will moan if that whitespace isn’t there anyway.