Archive for September, 2004

Why an interpreted language isn’t for us.

Wednesday, September 29th, 2004

So I’ve been making several postings recently about why I think that Mono and Java are the wrong choice for the Open source community. I believe we should work towards a stable, and portable ABI that all languages can adhere to and can use to form a common base. I know that I’d love to be able to write code in python that can call c++, c, perl and ruby functions as easy as it can call a python function. So, a short summary of the articles:

But JIT is faster…

Wednesday, September 29th, 2004

One of the arguments from people who recommend Java or the CLR is that the just in time compilation of a program means that it can take the use of an application into account and optimise the code generated towards that goal. Part of this idea comes from the HP Dynamo project. My personal theory is that because JIT interpreters are dealing with a full program they can do global optimisations, where normal compilers get feed one source file at a time, and therefore they cannot do the optimisations that the JITers can do. From personal experience I’ve discovered that flagging functions “static” in C means that gcc will do a much much better job at optimising them, as it knows that it knows everything it needs to about that function. Linkers that do Link time optimisations seem to be able to get similar sounding benchmarks.

So, my hypothesis is that writing a language in such a way that global optimisations can be done easily, or designing compilers so that they can take advantage of global optimisations will rival code able to be produced by the JITers.

Garbage Collection

Saturday, September 25th, 2004

Current GC implementations ammeliorate large amounts of time spent doing GC, by doing it infrequently, ie, after a certain amount of memory has been allocated. This is wasteful of memory, which slows down performance. To my understanding, most JVM’s at the moment have a nursary where objects are “born” into. When the nursary is full, then the objects are garbage collected and moved into another region of memory. This means that the applications are going to use at least the same amount of memory as the nursary. If every application did this then we’d run out of memory long before we would normally.

Now I think garbage collection is a great idea. Programmers suck at memory management, at least I know I do. But I don’t think that the current methods, and approaches are the way forward.

Before garbage collection we had… what?

Before garbage collection is used, and even today in lower level languages, programmers explicitly free(3) memory themselves. In general this mostly works well, occasionally people have huge memory leaks in their program, but programmers are skillful at creating and using tools such as valgrind to help find and fix these problems. It’s not perfect, but it is possible.

Most objects die young

Most objects are destroyed just after they are created. It’s very rare for an object to last a long amount of time. There is a lot of established literature about this.

Most objecs cannot form cycles

Most objects can’t form cycles so long as type safety is enforced. An object such as a string cannot point to another object, and thus can be refcounted. In fact, std::string in C++ is often implemented using refcounting, so only a pointer is passed around from place to place.

I hypothosise that most objects that do form cycles are ones that implement collections of some description. Doubly linked lists have a prev pointer, and tree’s often have a parent pointer. If a good standard collection class library is provided that is smart enough to take care of it’s own memory management outside that of the compiler or even inside the compiler, but providing hints such as “weak”, then potentially circular objects become very rare.

The gripping conclusion

So, therefore I suggest that the best way for doing garbage collection is at compile time. Java and .NET can’t do this, as at compile time they are only compiling to a virtual machine which is untrusted. However, it should be possible for a compiler to recognise:

  • If an object has a lifetime of a block of code and no pointers leak out of that code, then the object can be easily allocated on the stack.
  • If an object must be allocated on the heap, often based on the PC, the refcount can be implicit. In particular, if a block of code doesn’t “leak” any references to the object, then the refcount need not be updated. If it does need to be updated, then it needs to be updated by the sum of the leaked references
  • If an object can form a circular loop, then provide a warning to the user that this can happen and fall back on a full garbage collection system, perhaps with hints from the programmer. (eg, I would treat all blocks that can potentially form loops as “long lived” and bypass a nursary)

The memory manager should also be able to do a much better job as it knows more about what the program is doing. For instance, if you are malloc(3)ing a fixed sized object, then flag it to be freelisted later, as it’s more likely that you are going to allocate objects of the same size again later

Why emulation is not the way forward.

Friday, September 24th, 2004

So after several long heated discussions about my previous blog entry, I’ve go another kinda related point to make. Emulating Windows will not help us Win. If you can target linux and windows with the same executable, which OS are you going to test on? Thats right, windows. Ask any Java developer about write once, run anywhere. They’ll tell you to write once, test everywhere. In fact, they get bug reports on such obscure hardware that they cannot easily reproduce or test the bug. So, you’ll end up with software that (probably due to some bug in the software) works well under Windows, but fails to run under Linux. So, if you are going to have to select which OS to run these applications on, which one are you going to choose? Windows of course. So, we’ve gone to all this effort to have people develop software under Windows for users running, uh, Windows. However, it’s even worse than that, instead of developing a Linux version of their software people can say “It should just work under Linux”. So, now we don’t have a Linux version of the software, we have a Windows version that runs under Linux, where everyone complains that it just doesn’t look right. Maybe because it calls things a “Wizard” instead of a “Druid”, or maybe it refers to the “Start Menu” and “Internet explorer” in the help and/or documentation. Linux isn’t a first class citizen.

  • So, why write a business app for users for Linux? Write it for C# on windows, then those linux hippies can use it too
  • Why port random-important-app to linux? We can just get them to run it under wine.

So, I say that Wine and .NET are bad for Linux. They don’t help, they just give people excuses for not helping us.

VM’s and the opensource community

Monday, September 20th, 2004

The current big thing seems to be VMs. Java’s got one, .NET has one, hell, even Perl’s getting one. Now, Java had one for several reasons mostly write-once-run-anywheriness. I’m not sure why .NET has one. Microsoft can’t want people to write code that can run anywhere but Microsoft operating systems, unless they’re planning on moving away from x86. (Watch out Intel, they’re onto you!). Perl seems to be getting one because it’s the fashionable thing to do. “Everyone else was doing it and I wanted to be cool.”[1]

Now, for the closed source community a virtual machine is important. It mostly obscures your source code er, “Intellectual property”, and it gives you hardware independance so you don’t have to develop for lots of different platforms. But these requirement doesn’t exist for open source software. We don’t want to hide the source code, in fact, we work very hard to make sure that everyone has access to the source code. Many open source scripting languages don’t use a VM (such as PHP), and really, noone cares particularly much. Putting a PHP script under the GPL is like selling a car with a license that says that everywhere the car is, you take the chassis along too. I mean, a car that doesn’t have a chassis isn’t a car.

Now, the mono people have apparently stated that the reason they used Mono was because they wanted an easy way to provide bindings to multiple languages. They can now target .Net and then every .NET “enabled” language gets the bindings for free without a lot of work being repeated. Presumably they also used .NET because Sun will get rather angry at them if they try and use Java with bindings to Gnome. Also, programmers desperately need a decent language for writing applications in. C is a great language for writing lowlevel code in, however it’s a royal PIA for doing highlevel application development. C++ has the potential for being great if it actually had a decent standard library. Want to fetch a URL? Whoops, you’ve got to write code for HTTP, and resolving and network sockets on top of the C api’s which don’t “mesh” with C++ very well. (Look at getaddrinfo(3), and consider how much effort it takes to nicely wrap that into some C++ classes and repeating that for every program you write, compare to Java/.Net/Python/Perl libraries). Python is quite popular for writing small to medium sized gui applications, particularly with the Twisted library. But it’s not exactly zippy, and many people won’t touch it on religious grounds.[2]

So, my thought here is why are we following this lead. We have the best representation of the program at our disposal, the source code. The source, as written by the programmer has the highest level of expressibility you can achieve. Instead of trying to JIT some abstract byte code, you can compile the source directly into executable code. Instead of .pyc files littering around the place, you can have “.so”’s and executables kicking around. In fact some distro’s are starting to do this, for example Gentoo where you download the source, then compile it to meet the specifications of your computer.

So what pieces of the puzzle are we missing? Well, the major feature .NET has brought to the table has been a standard ABI. Basically the idea that programs written in one .NET language can call functions written in another .NET language, or even outside the .NET framework using P/Invoke. What we should be doing is trying to come up with a standard ABI that encapsulates all these things that compilers for different languages can all implement. This ABI could even be a subset of the C++ ABI, as C++ seems to be a superset of every language feature ever. You want strings (std::wstring)? templates? even things like garbage collection can be “easily” retrofitted into C++ (boost::shared_ptr<> or even more complicated systems). VMS did this, you could call pascal code from C, and both from fortran. The Unix people have quite happily standardised on the C ABI, almost everything supports it.

I tried doing some of these things before. Swig provides a way to allow arbitary C/C++ code to be made available to a large range of languages. I’ve tinkered with writing code that using nm(1) and c++filt(1) to automatically generate a Swig file, compile it, then have this all triggered from a python import hook so you can say “import foo” and it will go off, read the symbols out of /usr/lib/libfoo.so, process them with c++filt to get the function prototype, have it generate a swig file, compile said swig file into a .so which python can load as a module, and then proceeding to do so.

The code to do this is all fairly straight forward, and can be easily cached in ~/.autoswig/python/ or whatever. It’s dependant on reasonably standard tools, and will work for a large set of languages (basically whatever swig supports). It can be easily improved by just dlopen(3)ing the library, using libbfd(3) to parse the debugging information to get the prototypes of the functions, and some magic code to marshal data to the library, which is nearly what P/Invoke does. P/Invoke however requires you to write your own prototypes, my suggeston is to use the debugging (stabs) information that is already in the binary to build the prototype directly. My biggest issues with this was not wanting to write platform specific “magic code” (as mentioned above) since I don’t have access to a wide range of hardware. My other major issue that calling conventions are a nightmare particularly to do with whose responsibility is it to free allocated memory.

So I ask, why are the open source community wasting time on VMs when we can use the strength of our source to provide a superior solution?


[1]: Ok, I’m harsh on the perl people again :) Both perl and python have a VM that they are “compiled to” to basically cache the result of parsing the source.

[2]: Apparently many people were fortran programmers in another life and have a genetic fear of whitespace being used by the compiler, irrespective of the fact that people will moan if that whitespace isn’t there anyway.

Few minor things

Monday, September 20th, 2004

I’ve just been blogged spammed. Sigh. So I’m playing around with moderation, such as moderating all comments that contain a url, since, to my knowledge noone has legitimately posted a URL yet, it sounds fine by me :) Maybe I’ll hack something to allow a whitelist of links for places like the wlug wiki (If a link is useful to me, it’s useful to others…) or slashdot or other random places that people usually link to.

What is the state of the art with dealing with blog spam? Type-the-word-in-the-image? I don’t like to force registration on people, I get peeved when blogs demand my email address, so I don’t want to force others to have to do it. There has been talk about various blacklists, do these actually work? Are any worthwhile for use with wordpress?

I dunno, leave me a comment to discuss what you think is a wise idea here.

 

Usability vs Customisation

Wednesday, September 15th, 2004

I’m getting sick of people muttering about Usability. Now, don’t get me wrong. I’m not against Usability, but usability as people propose comes at a cost, and that cost is functionality.

For instance, the gnome people have changed their default window manager from sawfish to metacity. Metacity’s “easier to use” because it has no features. Theres nothing confusing for users to configure. However, you lose a lot of functionality. You can’t have multiple workspaces with multiple desktops. You can’t bind scripts to keys anymore. Theres a whole heap of things you can’t do. This is apparently a “feature”. It’s apparently not confusing to users. Not that I knew of any user who would want to configure their window manager without expecting pain and suffering.

My problem isn’t that the programs are now magically usable. My problem is that there is now no way to do the things I was doing before. It’s not that the features have been hidden behind an “Advanced…” button, or even hidden so that they can only be configured via an obscure about:config interface, or via direct editing of the configuration files. Those features just don’t exist. I’d be happy to have to drop to vi and edit a config file to edit my window manager, in fact, that was how I did it for many years. I’d almost be insulted if there was an easy gui way of setting things up.

The reason I moved to Linux from ol’ MSDOS was because I liked the power and flexibility of Linux. I could see everything, and contrary to popular belief, for a programmer at least, Linux has more documentation than you could ever wish for. If you can’t find the documentation for something you can always just go and read the source! But the reasons I came to love linux are being eroded. Choices are being reduced in the name of usability.

Usability exports regularly complain that open source software is hard to use, and has thousands of tiny, obscure options. But the reasons are that Open source software has been made by programmers for programmers, and programmers want, if not need, to be able to tweak things. Can the two ideals coexist? I think they can. Advanced users are willing to take the time to read the documentation to figure out how to customise weird options. No advanced user would turn away the opportunity for making features easier and quicker to use in the common case. But don’t take away features that “new users” don’t want or need! They don’t have to be there right “in your face”. They can be hidden behind an Advanced… button, or in a command line option, or an option in a config file. Advanced users will look for the feature and will be plesantly surprised to find it, and users who are afraid of the myriad of options won’t see them.

The reason I came to linux was because of the choice. Please! Stop limiting them!