Archive for June, 2006

Operating Systems

Saturday, June 3rd, 2006

Recently I've been looking at Inferno and reading a bit about Plan 9 (there seems to be a lot written about Plan 9, and barely anything written about Inferno).  Some of their ideas just really blew me away.  I understood that everything was a file, and I understood that you could export things over 9P/Styx, and I knew that you could build up your own namespace.  What I didn't realise was just how cool this is!

  • Want to use the CDROM drive in that machine over there?  Go mount it's device locally
  • Your window manager provides windows as files.  If you move them over the network, then you have remote display
  • Your text editor provides it's world as a file, you can control it like that.
  • Want to debug a process remotely?  Just mount /proc over the network and attach a debugger to it.
  • Networking is done via files, so want to use someone elses networing?  Go mount their stuff locally!
  • You can mount arbitary shell commands! neat!  This is how ftpfs and friends work.  Why oh why don't other OS's have FTPFSes?

I can't help but think tho, that there is something missing.  I can move files around a network, but I can't move processes.  Inferno has a virtual machine to abstract the hardware away, it's filesystem can abstract everything else away, but I can't move processes?

I want it so you don't ever "quit" an application, you can "close" it, but all that ever does is serialise the process to disk.  When you run it again you just thaw the process and start up where it left off.

I want processes to be "checkpointed" like you would "autosave" today. Power outage? Thats ok, when the machine comes up again, you can revive your last checkpoint and continue from where you left off.   If a process crashes you can restore it back to any previous checkpoint.

I want processes to migrate!  I want to goto work and be able to pull my webbrowser to my machine at work and continue running it.  Sure networking will have to round trip via my machine at home, but that's doable already with plan9 (see above).

When I shutdown my laptop, I want all my running programs to migrate onto my fileserver at home.  There they might be serialised to disk, or they might continue running while I'm away.  My IM client might be designed with two halves, a UI part which migrates onto my phone (via bluetooth) and the actual program which migrates to my file server (or desktop) and continues to communicate with it's UI thread over GPRS.

I want this to use my "Internet Drive" for disk storage.  I want to write programs for this in a programming language like I described two posts ago.   I want a pony.

The Internet Drive

Saturday, June 3rd, 2006

There has been some discussion about google/amazon/apple/microsoft/and-everyone-else providing storage on the Internet that you can use to store files and move them around between places. Personally, I think thattheres a much more interesting solution here.

The idea is that every person dedicates a small portion of their disk to a "cache", say 50% of their free space (which would dynamically change of course). When they access a file off the "Internet Drive", they locate it off a computer somewhere that has it, and download it and store it in their cache, so that further accesses to the same file happen at the speed of the local disk. When you modify the file it is replicated to x other computers elsewhere, thus if any one computer fails (or is just "offline") then it can be restored from another computer.  x can be selected on a per file basis (maybe 1 for temporary files, 3 for "normal" files, 10 for important files).

Each person gets their own personal "Internet Drive", so when you install the software you have a blank drive where you can store files.  However you can make a "seed" file of another file or directory.  These seed files can be sent to someone else via email, or posted on a website or whatever.  The recipient "plants" the "seed" file in their drive and they get access to that file or directory.   Want to share those photos with your family? Put them in a directory and send them a seed of the directory.   Want to access your Internet Drive from work?  Just send yourself a seed at work that you can plant.
Except in special circumstances, people have a small working set of files that they use, only a few hundred mb.  Of those they modify an even smaller set.  A lot of the larger files are usually shared amongst multiple computers, and are never modified.  Files that are modified are usually only modified by one person, and that person can't be in two places at once.

If your computer is disconnected from the Internet then you can still continue to work as normal, when you reconnect to the Internet the computer will deal with expiring stale files and backing up modified ones.  If a conflict occurs, then you punt to the user, let them know that there is a conflict and let them deal with it, it should be a rare enough occurance that noone should worry.  Coda showed that this was quite possible.

Compilers

Saturday, June 3rd, 2006

I read a presentation by Tim Sweeney, about programming language design.  He talks about how games developers use programming tools, and has some surprising results. I was quite impressed by this.

I had noticed some similar things sometime about 2000ish.  I noticed a big block comment in some code (ircu) explaining what all the variables were doing in a particularly hairy piece of code, and thought "I wish I the compiler could read these comments and make sure the program matched them".  About the same time Beep pointed out that I should be using "assert(3)".

Since then I've decided that assert(3)'s are far more powerful than comments.  They document what the code is expecting to both a programmer, and to the compiler.  They don't get out of date, because the minute they are wrong your program aborts with a (only slightly) useful error message.  A commented out assert() is a sure sign that somethings amiss in your code.

However the compiler doesn't make use of these hints.  It doesn't warn me that the assert()'s are conflicting (assert(x==0); assert(5/x>1).  It doesn't use assert()'s as an optimisation guide.  It doesn't propergate assert()'s back or forwards through code.  I can't set up class invarients.  I can't set up assertions as continous relationships between variables.
The compiler doesn't let me annotate types with meaningful information.  Almost every pointer passed to a function should be "not null", the compiler could statically check this.  Global/static variables/thread constructor arguments/any other cross thread datastructure should have a "protectedBy(Mutex)" attribute and the compiler should enforce this by raising a compile error if a thread isn't holding the correct lock when accessing this variable.  Some variables should have constrained ranges (>1, >0, 0..10, etc) It has been shown that compilers can achieve some near miracles with type checking.  Why can't we tell the compiler what we're actually trying to achieve? 

And besides, why are we even using integers for all this stuff?  why dont we have better concepts like foreach/map/reduce/foldl/foldr in our languages?  How come we have to code a for loop using integer indexes all the time with the number of bugs that introduces?  When we really need an integer, we either want a limited range type (there are between 28..31 days in a month, so why should the compiler allow any other value?), a "BigNum" integer with practically infinite precision, or a "high speed" integer for use in vector operations.

While I'm here, why on earth do we still have an array datatype?  We should have a "list" datatype, and if the compiler  deterimines the size of the list, and that the list isn't excessively sparse, or dynamic, it could implement it as an array.  If we only append to the end of the list and pop off the front, a singly linked list might be appropriate.  If we ask for it to be "sorted" then it could be stored as a tree.  If we use "insert" "delete" and "key exists" methods, then perhaps it should be implemented as a hash table.  If we use append/pop and ask for it to be sorted, maybe it should be stored as a priority queue.  These are all determinable statically at compile time, why do we as a programmer have to decide on whats going on?  Worse yet, in many languages, why do we still have to code these datastructures over and over again?

The compiler is our friend.  It's doing us a favour of releaving us from some of the (significant) burden of writing programs.  Why then do we keep secrets from it and in places down right lie to it.  We should involve it more in what our program is doing, let it join in and proof read our code for us.