Archive for June, 2005

World Wide Chat

Wednesday, June 29th, 2005

Today I was thinking about IRC, and how it’s dead (prompted by a good friend of mine). IRC doesn’t know it’s dead yet, it’s wondering around still thinking it’s alive. But it’s dead inside. Where it matters. IRC is archaic. It predates ISO-10646 (Unicode) which is annoying. But the fact that it predates ISO-8859 (latin-#) is even worse. It’s bizarre case mapping rules, that are even incorrectly specified in the RFC however make it ripe for confusion! It’s centralised in a spanning tree of servers that can be easily attacked. It’s fragmented into little fiefdoms called “Networks”, and has more political intrigue than a byzantine brothel. IRC is extremely vunerable to denial of service attacks. Because of IRC’s centralised nature denial of service attacks on any part of IRC causes major disruption to the entire network. There is only one reason that IRC still exists today and that is because there is no other medium on the Internet that allows for realtime chat amongst multiple participants that is keyed off a name. Some Instant Messanging networks allow for multiuser chats, but the only way to join the chatroom is to be invited by a current participant, as opposed to IRC’s /join #channelname. The fact that users mistakenly confuse “IRC” the protocol and “mIRC” the de facto windows client, shows that it’s not IRC itself that they are attached to, but a well written client that provides a service that they cannot get from any other application. I’d replace IRC with a well written (preferably crossplatform) client that doesn’t use spanning trees, and doesn’t have the concept of a “network”. I’d model it more on the idea of a “URL” or “email address”. You can DoS any part of the email infrastructure. You can DoS peoples mail servers and stop them getting mail. You can DoS peoples mailing list servers and stop everyone on the list getting mail. There are a few innocent casualtys when this occurs, but nothing compared to the number you get on IRC networks. I’d throw in user registration for good measure. People that know me well can probably guess where I’m going with this. My proposal is to replace IRC with Jabber. It has all the requirements, it can support multiuser chat with people only knowing the username, support for newer standards such as UTF8, XHTML, etc, and has a good community behind it. So perhaps someone needs to write a good client that supports Jabber, some kind of good scripting language and is “channel” orientated, not roster orientated. Maybe even in Java. You could use Rhino for scripting.

Intersections of large distributed datasets

Monday, June 6th, 2005

Heres the problem:
You have two (or more) datasets that you want to return the intersection of, but they are only accessible over a slow link. You can’t upload the smaller set to the larger set and have that do the intersection, so what do you do?

Example:
You have a distributed search engine where each node holds a list of urls that contain one term. A search comes in for “Hello World”, and you want to return the results of the Intersection of the urls that contain “Hello”, and the urls that contain “World”.
My solution:
Both lists are sorted in some way (and both are sorted the same way).

  1. NodeA sends NodeB it’s lowest numbered id.
  2. NodeB sends NodeA it’s lowest numbered id that is greater than or equal to NodeA’s id, and greater than any item it has previously sent.
  3. NodeA and NodeB repeat step 2 until there are no more ids.
  4. NodeA and NodeB now both have the intersection of NodeA’s ids and NodeB’s ids (as they’ve both been sent an id that they have too).

This will take at most 2*n operations where n is the size of the intersection. [Edit: hrm? where did I come up with that from. It’s obviously not!]
As an optimisation you could split the list up into “n” divisions, and send the lowest one in each division each time.