Archive for the ‘General’ Category

Building Hitchhikers Guide To The Galaxy

Tuesday, March 17th, 2009

I’m (hopefully) going travelling soon, and I’d like to have ready access to Wikipedia so I can investigate more information about various things, and generally keep up with what I should know while I’m visiting places. So I’ve spent some time trying to figure out how to get data plans for my phone.

This morning I had an epiphany, why not download the Internet before I left? My Nokia e66 can take a MicroSD Card (according to Nokia it only supports up to 8GB cards, although I don’t see any reason why it wouldn’t support a 16GB card.)

There are several articles on how to build an offline wikipedia. I like the idea of having the compressed entire English text-only version of Wikipedia (~4.5GB). Maybe using the compressed Open Street Maps data (~5.2GB) to provide some geo-location while offline. And for good measure maybe compressed FreeBase dump (~1.2GB) to provide more links between articles, and provide information on regions (Wikipedia tends to provide a single point representing a region, not useful.)

Hopefully with a bit of hacking, I can end up with something like Mobilizy’s Wikitude, using the GPS for location, and accelerometers to figure out direction and rotation in 3D space (Android has an electric compass which makes this a bit easier for them), using freebase + Wikipedia to annotate the current scene.

What would be really cool would be to have a HUD that overlayed wikipedia articles over your current vision from your phone. Although I suspect no matter what you do, you’ll end up looking like a tool.

It’s scary that the sum of human knowledge (Wikipedia + Open Street Maps + FreeBase) fits in ~11GB (ignoring indexes and the like), and that I can fairly easily fit this onto my cellphone, with heaps of room to spare!

New Zealand Copyright Amendment

Friday, January 16th, 2009

Some interesting points I’ve not heard anyone mention about the New Zealand Copyright Amendment (IANAL, YMMV, …).

The submission format

92D Requirements for notice of infringement
A notice referred to in section 92C(3) must—
(a) contain the information prescribed by regulations made under this Act; and
(b) be signed by the copyright owner or the copyright owner’s duly authorised agent.

2D Requirements for notice of infringement

I’m unaware of any regulations made under this act so far, so currently you can’t create a notice of infringement that is prescribed by any regulations… yet.

The Submission format II
Although there hasn’t been any discussion about the submission format yet, it concerns me that you obviously need enough information to uniquely identify the copyright infringer (either the person, or the account). If an ISP’s business model involves putting customers behind a NAPT box, then a timestamp and IP address is not sufficient to uniquely identify the user, you at least need a timestamp, and the 5 tuple used. This is particularly concerning given that we are rapidly running out of IPv4 addresses, and one of the suggested solutions is to place as many customers as possible behind a Service Provider NAT box. Since connections through a NAPT box are far more ephemeral than IP address allocation, timestamps must be more precise, and more accurate. Which customer an IP is assigned to is usually stored along with the rest of the accounting information in RADIUS and generally is recorded by an ISP for essentially free. Having to record every connection through a NAPT box would incur a serious overhead, and data management problem for an ISP. Also, how long should an ISP hold onto this information so that it can process these notice of infringements before it can discard it?

You can only disconnect people.

92A Internet service provider must have policy for terminating accounts of repeat infringers
(1) An Internet service provider must adopt and reasonably implement a policy that provides for termination, in appropriate circumstances, of the account with that Internet service provider of a repeat infringer.
(2) In subsection (1), repeat infringer means a person who repeatedly infringes the copyright in a work by using 1 or more of the Internet services of the Internet service provider to do a restricted act without the consent of the copyright owner.

92A Internet service provider must have policy for terminating accounts of repeat infringers

This leads me to some interesting questions: If Alice is a member of an organisation, and the organisation has an account, and Alice infringes peoples copyright repeatedly, then the account that Alice is using is the organisations, but the account is not Alice’s. Is the organisation (perhaps Alice’s place of work) considered an ISP? In the more obvious case if Bob sits at an Internet Cafe and infringes peoples copyright then can the Internet Cafe’s account get shutdown? If the Internet Cafe buys it’s bandwidth from LittleIspInc, can LittleIspInc’s account get shutdown by their upstream? What should happen if UpstreamInc receives a notice for Bob’s infringement? Obviously it should pass it to LittleIspInc and LittleIspInc should pass it on to the Internet Cafe, who should terminate Bob’s account. In this case, Bob probably doesn’t even have an account at all. Are Internet Cafes going to require ID so they can check people against previously banned lists?

If LittleIspInc gets a series of notifications from UpstreamInc, should LittleIspInc be cut off, even though it’s multiple different downstream customers of LittleIspInc that have been infringing? Should the Internet Cafe get cut off if it has multiple different customers infringe? What if the Internet Cafe places everyone behind NAPT, and the infringement notices aren’t specific enough to identify an individual person?

Fake notice of infringments
While I’m not a lawyer, I’m sure there are laws already about sending fake infringement notices. So anyone who’s doing this maliciously is likely to get themselves into trouble.

False Positives
Ok, this one I have seen people talk about at length. There is no incentive for people sending notice of infringements to make sure they aren’t generating false positives. If people are too abusive they will probably end up running into trouble, but as long as they put in a reasonable effort, it seems to me that they are likely to get away with it.

I’ve seen people sent takedown notices for Open Office because some automated tool decided it was actually Microsoft Office (At the time, an unintended compliment I’m sure). I have seen people asked to take their photo’s down, because someone /else/ had permission to use the photo and was believed incorrectly to be the copyright holder.

Under this law, you appear to have no right of reply, no way to state your case and point out that you are innocent. ISP’s don’t appear to have the right to make judgement as to the quality of the notice of infringement (not that the ISP’s want this responsibility).

What’s an ISP?
I can’t find anywhere a definition of what is considered an ISP. Does it include anyone providing IPv4/IPv6 connections? If I run a public IPv4 network that doesn’t connect with the Internet, am I an ISP? If I run a public packet switched network (such as X.25), am I an ISP? Is a disconnected UUCP graph considered an ISP? Is a FidoNet BBS considered an ISP given that you can send FidoNet files and emails around (even tho noone in a FidoNet network need be connected to “The Internet”?). Is the phone system an Internet, given that I can dial anyone and send them data via a modem? Can I call Telecom and get them to disconnect an account for infringing my copyright?

In Summary
I don’t like this law. It seems to have too many problems. It appears that it could force ISP’s to use real world IPv4 addresses where their use is unwarranted, and impractical thus hastening the depletion of the IPv4 address pool. I am not a lawyer, I’m trying to interpret this the best I can without any formal law training, but I do know something about the technology from the ISP point of view.

Hostnames, Domain names, and “Official” host names

Tuesday, January 16th, 2007

Due to regular arguments I’ve had with people about this I thought I should document this somewhere for people to see:

Host names
The syntax for a hostname was defined waay back in RFC 952. it’s defined as:

      <hname> ::= <name>*["."<name>]
      <name>  ::= <let>[*[<let-or-digit-or-hyphen>]<let-or-digit>]

(if you prefer as a regex /[a-z]([a-z0-9-]*[a-z0-9])?(\.[a-z]([a-z0-9-]*[a-z0-9])?)*/.)

RFC 952 also says a computer has an “official” hostname and zero or more “nicknames” or aliases, but are discouraged.

Domain names
A domain name is something that’s in DNS. A Domain name is much less strict on what it allows. This is used in RFC 2782 to avoid collisions with hostnames by requiring that SRV records have a “_” prepended to them since no valid hostname can have a _.

Implications
Other premsises:

  • RFC 821 says in section 3.7:
          Whenever domain names are used in SMTP only the official names are
          used, the use of nicknames or aliases is not allowed.
    
  • RFC 1123 says:
          5.2.5  HELO Command: 
    
             The sender-SMTP MUST ensure that the <domain> parameter in a
             HELO command is a valid principal host domain name for the
             client host.
    
  • RFC 2181 Section 10.3:
                                          [...] Thus, if an alias is used as the value
       of an NS or MX record, no address will be returned with the NS or MX
       value.  This can cause extra queries, and extra network burden, on
       every query.  It is trivial for the DNS administrator to avoid this
       by resolving the alias and placing the canonical name directly in the
       affected record just once when it is updated or installed.
    
  • RFC 2821 Section 4.3.1
       Note: all the greeting-type replies have the official name (the
       fully-qualified primary domain name) of the server host as the first
       word following the reply code.
    

Therefore:

  • You cannot have a “_” in your HELO/EHLO. Even tho microsoft windows machines love to do this. Expect your mail to be dropped.
  • MX’s should point to the “official” (canonical) name of the mail server, which should in theory be the same as the name produced in the 220 banner.

Updated: With Aristotles fixed regex. Ooops! My bad.

“Or” considered harmful.

Monday, October 30th, 2006

On the weekend we decided that "or" (as used in the english language) is ambiguous and leads to confusion. We decided that instead we should use three terms:

  • "andor" (to mean "are any of these true?")
  • "xor" or "either-or" or "exclusive-or" (to mean "are only one of these true?")
  • "ewok" (to mean "which of these are true?")

Some examples:

  • Is it wet outside ewok is it fine? ("it's wet outside"), Is it wet outside andor is it fine ("yes"), Is it wet outside xor is it fine ("yes").
  • Am I mad ewok have small furry animals invaded this conversation? ("You are both mad, and small furry animals have invaded this conversation"), Am I mad andor have small furry animals invaded this conversation? ("Yes"), Am I mad xor have small furry animals invaded this conversation? ("No" both are true).
  • Would you like green eggs and ham on a plane ewok on a train? ("Mu" — I don't want green eggs on either). Would you like green eggs and ham on a plane andor on a train? ("No"), Would you like green eggs and ham on a plain xor on a train? ("No").
  • Should this be blue ewok green ewok red? ("Green"), Should this be blue andor green andor red ("yes"), Should this be blue xor green xor red ("Yes").

so, in closing, I Think we should stop using "or" in every day conversation and instead use "andor", "xor" and "ewok" instead to be more precise about what question we're asking.

Fragmenting IP

Monday, July 17th, 2006

When fragmenting IP packets, you usually split the packets up into "n-1" pMTU sized packets, and 1 packet of the remainder.  Now with wireless networks the larger the packet is the more likely it is to be lost/corrupted.  I suspect it would make more sense to fragment packets into equal sized pieces, so totlen/n bytes each.  Most routing overhead is per packet, and the total number of packets hasn't been changed.  Also this would tend to avoid so many pMTU issues as your packets are probably going to less than pMTU sized.

Operating Systems

Saturday, June 3rd, 2006

Recently I've been looking at Inferno and reading a bit about Plan 9 (there seems to be a lot written about Plan 9, and barely anything written about Inferno).  Some of their ideas just really blew me away.  I understood that everything was a file, and I understood that you could export things over 9P/Styx, and I knew that you could build up your own namespace.  What I didn't realise was just how cool this is!

  • Want to use the CDROM drive in that machine over there?  Go mount it's device locally
  • Your window manager provides windows as files.  If you move them over the network, then you have remote display
  • Your text editor provides it's world as a file, you can control it like that.
  • Want to debug a process remotely?  Just mount /proc over the network and attach a debugger to it.
  • Networking is done via files, so want to use someone elses networing?  Go mount their stuff locally!
  • You can mount arbitary shell commands! neat!  This is how ftpfs and friends work.  Why oh why don't other OS's have FTPFSes?

I can't help but think tho, that there is something missing.  I can move files around a network, but I can't move processes.  Inferno has a virtual machine to abstract the hardware away, it's filesystem can abstract everything else away, but I can't move processes?

I want it so you don't ever "quit" an application, you can "close" it, but all that ever does is serialise the process to disk.  When you run it again you just thaw the process and start up where it left off.

I want processes to be "checkpointed" like you would "autosave" today. Power outage? Thats ok, when the machine comes up again, you can revive your last checkpoint and continue from where you left off.   If a process crashes you can restore it back to any previous checkpoint.

I want processes to migrate!  I want to goto work and be able to pull my webbrowser to my machine at work and continue running it.  Sure networking will have to round trip via my machine at home, but that's doable already with plan9 (see above).

When I shutdown my laptop, I want all my running programs to migrate onto my fileserver at home.  There they might be serialised to disk, or they might continue running while I'm away.  My IM client might be designed with two halves, a UI part which migrates onto my phone (via bluetooth) and the actual program which migrates to my file server (or desktop) and continues to communicate with it's UI thread over GPRS.

I want this to use my "Internet Drive" for disk storage.  I want to write programs for this in a programming language like I described two posts ago.   I want a pony.

The Internet Drive

Saturday, June 3rd, 2006

There has been some discussion about google/amazon/apple/microsoft/and-everyone-else providing storage on the Internet that you can use to store files and move them around between places. Personally, I think thattheres a much more interesting solution here.

The idea is that every person dedicates a small portion of their disk to a "cache", say 50% of their free space (which would dynamically change of course). When they access a file off the "Internet Drive", they locate it off a computer somewhere that has it, and download it and store it in their cache, so that further accesses to the same file happen at the speed of the local disk. When you modify the file it is replicated to x other computers elsewhere, thus if any one computer fails (or is just "offline") then it can be restored from another computer.  x can be selected on a per file basis (maybe 1 for temporary files, 3 for "normal" files, 10 for important files).

Each person gets their own personal "Internet Drive", so when you install the software you have a blank drive where you can store files.  However you can make a "seed" file of another file or directory.  These seed files can be sent to someone else via email, or posted on a website or whatever.  The recipient "plants" the "seed" file in their drive and they get access to that file or directory.   Want to share those photos with your family? Put them in a directory and send them a seed of the directory.   Want to access your Internet Drive from work?  Just send yourself a seed at work that you can plant.
Except in special circumstances, people have a small working set of files that they use, only a few hundred mb.  Of those they modify an even smaller set.  A lot of the larger files are usually shared amongst multiple computers, and are never modified.  Files that are modified are usually only modified by one person, and that person can't be in two places at once.

If your computer is disconnected from the Internet then you can still continue to work as normal, when you reconnect to the Internet the computer will deal with expiring stale files and backing up modified ones.  If a conflict occurs, then you punt to the user, let them know that there is a conflict and let them deal with it, it should be a rare enough occurance that noone should worry.  Coda showed that this was quite possible.

iPod Nano: Stop Sign

Friday, May 26th, 2006

So I've recently recieved a second hand ipod nano (2GB) from my brother who upgraded to a large ipod video. He gave it to me dead flat, so I plugged it into my laptop so it could charge, and after a while it started saying "Do Not Disconnect". ah, it's registered itself as a USB mass storage and thinks it's in use. Fair enough, lets see what Linux thought of it. Linux found a two partitions /dev/sda1 and /dev/sda2. Lets try and mount them. /dev/sda1 failed to mount, didn't detect the partition type, fair enough, lets try /dev/sda2. /dev/sda2 mounted ok with some interesting stuff on it. Now what was /dev/sda1, lets try file(1), hrm, it doesn't detect it. Lets try less then…. wait, what's THAT?

perry@shine:/mnt$ sudo dd if=/dev/sda1 bs=512 count=1 | fold -w 16
{{~~  /-----\\
{{~~ /       \\
{{~~|         |
{{~~| S T O P |
{{~~|         |
{{~~ \\       /
{{~~  \\-----/
Copyright(C) 200
1 Apple Computer
, Inc.----------
----------------
----------------
----------------
----------------
----------------
---------------
]ih[@

It appears that the first sector of the first partition has an ascii art stop sign in it!

Now *THAT* is cool.

Code Smells

Friday, October 21st, 2005

What code smells do you look for in a project? Here are a few I’ve seen recently:

  • Crazy compiler flags. eg -O6 when the compiler only goes up to -O3.
  • Warnings during build.
  • Missing ./configure, you can get away with this if your project is small. if it’s more than one .c file you probably want one.
  • Programs that ask you questions about how to compile the program. How am I supposed to automate this for packaging?
  • Fails to build (!)
  • Unnecessary code (particular casts). This suggests that the programmer doesn’t actually understand what they are doing.
  • Creating your own protocol/fileformat/convention/library for doing things when there are perfectly good systems already in place. A mail system that doesn’t speak SMTP?
  • The directory layout. Does it untar into one large directory? Does the src go in it’s own directory? is the src broken up into libraries/plugins in their own directories?
  • Support for only one db (usually mysql). You obviously don’t understand SQL if you can’t write a program that uses
  • No obvious ChangeLog.
  • No publically available RCS.
  • Random undocumented constants.
  • Random projects

    Wednesday, September 7th, 2005

    I really don’t like writting blogs about random cruft that isn’t relevant to anyone, so I don’t blog what I saw a dog do to this other dog today, or anything like that, I try and keep this place for my rants (as suggested by the title). But today I feel I’ve accomplished some stuff that I should advertise a little. I probably should do this more often, it at least means I “release” my programs properly.

    Conference bot

    A bot that links together google talk users into one public conference room.

    This bot has been used as the basis of many bots for google talk

    TV Renamer

    Guesses the episode numbers of a set of files and renames them to a standard pattern.

    TR

    TR is of course my traceroute mesh program used for visualising multiple paths through the Internet. See also my whois webscript for doing various common lookups and queries.

    World Wide Chat

    Wednesday, June 29th, 2005

    Today I was thinking about IRC, and how it’s dead (prompted by a good friend of mine). IRC doesn’t know it’s dead yet, it’s wondering around still thinking it’s alive. But it’s dead inside. Where it matters. IRC is archaic. It predates ISO-10646 (Unicode) which is annoying. But the fact that it predates ISO-8859 (latin-#) is even worse. It’s bizarre case mapping rules, that are even incorrectly specified in the RFC however make it ripe for confusion! It’s centralised in a spanning tree of servers that can be easily attacked. It’s fragmented into little fiefdoms called “Networks”, and has more political intrigue than a byzantine brothel. IRC is extremely vunerable to denial of service attacks. Because of IRC’s centralised nature denial of service attacks on any part of IRC causes major disruption to the entire network. There is only one reason that IRC still exists today and that is because there is no other medium on the Internet that allows for realtime chat amongst multiple participants that is keyed off a name. Some Instant Messanging networks allow for multiuser chats, but the only way to join the chatroom is to be invited by a current participant, as opposed to IRC’s /join #channelname. The fact that users mistakenly confuse “IRC” the protocol and “mIRC” the de facto windows client, shows that it’s not IRC itself that they are attached to, but a well written client that provides a service that they cannot get from any other application. I’d replace IRC with a well written (preferably crossplatform) client that doesn’t use spanning trees, and doesn’t have the concept of a “network”. I’d model it more on the idea of a “URL” or “email address”. You can DoS any part of the email infrastructure. You can DoS peoples mail servers and stop them getting mail. You can DoS peoples mailing list servers and stop everyone on the list getting mail. There are a few innocent casualtys when this occurs, but nothing compared to the number you get on IRC networks. I’d throw in user registration for good measure. People that know me well can probably guess where I’m going with this. My proposal is to replace IRC with Jabber. It has all the requirements, it can support multiuser chat with people only knowing the username, support for newer standards such as UTF8, XHTML, etc, and has a good community behind it. So perhaps someone needs to write a good client that supports Jabber, some kind of good scripting language and is “channel” orientated, not roster orientated. Maybe even in Java. You could use Rhino for scripting.

    Java’s walled garden

    Wednesday, May 18th, 2005

    One of the things that I realised the other day is that java is a very closed system. Java has lots of technologies that only work if everyone is using java. Java RMI, Java serialisation, Java Messenging services etc, it’s hard to get these to interoperate with say python, perl, ruby, C, C++ etc. I guess this is one of the reasons that I’ve never been particularly taken with java development, it’s just too hard to get it to work as part of a team of larger programs.

    Server Name Indication, or how to virtual host SSL.

    Friday, March 25th, 2005

    So after reading chipux’s blog entry on TLS Upgrade in HTTP/1.1 I decided that I should get on and do some coding for mozilla, and have an attempt at implementing this. It would solve a problem that I’ve had for ages of virtual hosting SSL connections.

    I quickly remembered why I hate state machines, and how complicated HTTP really is, and how complicated SSL is, and trying to do both together is just even more complication. But then, someone pointed out Mozilla bug 116168 (TLS server name indication extension support in NSS). After reading RFC 3546: Transport Layer Security (TLS) Extensions. I decided that it’s probably the better way to go. It allows for virtual hosting more than just HTTP, but SMTP, IMAPS, POPS, LDAPS etc. The bug for this is Mozilla Bug 116169: Browser support for TLS server name indication. So I scrapped my earlier implementation of TLS Upgrade and started implementation on this. It turned out to be very easy, only 20 or so lines of very simple code. The most complicated function is strlen(3). The only problem I had was that the ss->url actually contains a hostname, not a url. Solved.

    Now, for a minor diversion. openssl doesn’t seem to support Server Name Indication. So the usual apache SSL libraryes (which use openssl) can’t support Server Name Indication. But chipux to the rescue again, with his mod_gnutls module for apache. This module uses gnutls instead of openssl for providing SSL/TLS support. And gnutls does support Server Name Indication.

    So now I have to test my module, and that involves compiling a more up to date version of apache. Sigh.

    The evils of RFC1918

    Thursday, January 27th, 2005

    We were discussing today people using RFC1918 for operational networks. People suggested that with good filtering on the edge of the network (disallowing packets entering or leaving with rfc1918 source or destination addresses) that RFC1918 addresses are perfectly usable. I suggest that this is dangerous. People using RFC1918 addresses for routers is especially dangerous. A router that must respond with an error (such as fragmention required, or ttl exceeded) that has only RFC1918 addresses can only respond with a packet sourced from an RFC1918 address. This packet will be dropped at the next “site” border. This now creates a Path MTU Discovery blackhole, or a traceroute “*” on the graph. The former problem is actually a big one. A Path MTU Discovery blackhole is a major problem on the Internet, and is difficult to diagnose. It presents as weird behaviour where you can get small files over the link, but the connection “freezes” if you get more data over the link. [Update: This can occur if you don't announce your routers IP's into BGP too] There is no way to track down someone who’s using an RFC1918 address short of logging into every router and monitering it’s ports to figure out which interface the packet(s) are coming in from and repeating the procedure further upstream. This makes reporting broken machines leaking RFC1918 addresses onto the public internet nearly impossible. When the boundary drops are not properly put in place you get all kinds of problems including 2 to 3% of queries at the f root nameserver being from RFC1918 addresses in 2001. People can even multiply the problem by adding RFC1918 to the global DNS database. There are situations where people publish a host that resolves to an RFC1918 address as a low numbered MX. The idea is that the person won’t be able to connect to the host, so they’ll deliver to a higher mx which will then attempt to connect to the lower mx and succeed. Obviously giving no thought as to what happens if the sender can reach a machine with that address, and what will happen if it’s running a mail server. Other problems occur with RFC1918 that aren’t directly associated with using RFC1918 addresses such as bogus queries against the root name servers to reverse lookup rfc1918 addresses. Up to 10% of queries against the root name servers are looking for the reverse lookups for rfc1918 addresses. I’ve personally experienced sshing into a machine that had a rfc1918 address and ending up on a router somewhere inside someone elses network. Since then I beefed up the boundary checks on my network to include reject routes to avoid packets destined to rfc1918 out of my network space. People always swear that they are never going to connect to anyone elses network, and thus it doesn’t matter that they use non unique space. This is proven false time and time again, mostly by the use of vpns but occasionally by merging of companies or by other means. Sorting out a the mess of trying to talk to multiple organisations all using the same RFC1918 address space is a nightmare. Even trying to get transit across someones network is frought with danger as they are almost certainly using rfc1918 somewhere, so your packets are unlikely to reach the far end of their network. I have personally seen situations where someone has brought in a computer with a common RFC1918 address (192.168.1.1) and plugged into a network bringing it down, as that was an address also used by the default gateway. This would be avoided if people didn’t always end up using the same addresses. This can be ameliorated by using addresses out of 172.16.0.0/12 which seems to be mostly unused, presumably because the non multiple of 8 prefix makes calculating netmasks complicated. The major reason we have RFC1918 space is because we just don’t have enough IPv4 addresses to get everyone. A lot of machines should have realworld addresses but don’t, so they are hidden behind NAT and are using RFC1918 addresses because they are the only addresses that can be assured not to be used legitimately elsewhere on the Internet. APNIC (and a lot of others) claim that getting an IPv4 range is very easy. I’d be inclined to believe them in the cases they expect you to be (an ISP or large organisation). I’ve never experienced trying to get IP’s from APNIC. The only thing I’d want IP addresses for would be the Metanet which would almost deserve them. However I don’t have any way of providing routing for those IP’s at a reasonable price, and I dont’ think apnic will give me addresses that I don’t intend to make globally routable. Hrm, perhaps I’ll ask at NZNOG. Also, while in theory getting IPv4 addresses is free, there are a lot of hidden costs. There is a lot of paperwork to fill out (showing how you plan to use the space), Also to get more than one range or a large range, you must be a member of Apnic which is a reasonably pricey proposition for an organisation that doesn’t have the Internet as a primary service. RFC1918 space is often given as a “Security Measure”. This is bogus. The security you get from rfc1918 space is from the address space being filtered. If you cannot configure your filters correctly, then you will be left exposed no matter what address space you use. It’s just as likely that people will be able to route into your network with filtered rfc1918 space, as they will with filtered public address space. Most of the problems on modern networks are quite happy to jump over firewalls, and attack rfc1918 address space as well as they attack any other. Most of the time this is due to people bringing in compromised laptops, fetching infected webpages, reading trojaned emails, using weak passwords, or just simply falling for a phishing scam. So, in conclusion, RFC1918 is bad. There are lots of bad failure modes, including addresses leaking via IP packets (As sources or destinations), inside packets (in DNS queries) or being near malicious in their use of DNS. The small size of the RFC1918 ranges lead to collisions between people which can cause all kinds of weird issues. To resolve these problems we need more addresses so we don’t need to rely on private address space.

    Bridging the Multicast Last Mile.

    Saturday, January 15th, 2005

    Assuming your ISP has multicast, and even assuming that they deliver multicast to you over DSL or some such., the chances of your computer being able to do multicast is slim. I haven’t seen a NAT box that supported multicast routing. Some support RIP for simple routing, but I’ve never seen any that support multicast routing. I could be blind, there might be billions of little routers out there that do multicast routing, but I’ve never noticed them.

    So if your a content producer, you just don’t bother with multicast, nobody has it, nobody wants it, and it’s too difficult to setup and maintain for the benefit you gain (since you don’t gain anything, any setup is considered too difficult).

    Now I’ve not done much research in the multicast world so I can’t speak as authoratively here as I usually do, but anyway, here’s my idea. Streaming clients and the like should first attempt to connect using multicast, and almost always will fail. Oh well, at least this means if you do have multicast support you don’t need to do anything crazy.

    Step two: If you can’t connect using multicast you look up a multicast exploder server, preferably via anycast, but perhaps by a configured ip. You open a control channel to them and state which multicast stream you’re interested in, they open up a port and you send a udp packet to it saying “Yep! Here I am!”. This packet is important as it tells the remote end what IP:Port to send the multicast data to, that works even if you’re behind NAT. The multicast exploder then sends the multicast stream to you unicast over the “connection” you just made. You have to maintain the connection as you usually do with multicast by sending keepalive messages over the control channel.

    Step three: If you can’t connect to a multicast exploder, fall back to using the normal unicast methods.

    This makes it easy for people who are isolated from the multicast segments of the Internet to easily join them and communicate. It pushes the “Explosion” of traffic out further towards the edges of the Internet (although not as far as multicast itself does). It’s based on similar principles to Teredo and 6to4 which I’ve discussed before, and it should be nearly trivial to implement. So now I guess I have to implement it and convince the IETF to RFC it or something. But first my auth server (see my previous post)

    Update: Woot! Someone’s been working on it already

    Authentication on the World Wide Web

    Thursday, January 13th, 2005

    I’m sick of typing usernames and passwords into webpages all across the Internet. Like most people for most “low security” sites I use the same password. I use my browser to remember passwords for me so I don’t have to. I’m lazy. But the thing that irritates me more than having to login to all these different sites is the fact that I have to implement new authentication systems all the time. Create a new website? Write yet another authentication system.

    So I thought I should do my own Single Sign On system. My goals:

    • It should be secure (duh!).
    • You should be able to run your own SSO server.
    • It should only do autentication. It says “this user is l33tkiddie but doesn’t do anything about telling you l33tkiddies email address, or credit card information or even name. As a website it’s up to you to collect their details. I’m just doing authentication
    • You have the right to not be able to be autenticated without your explicit permission. (Although you may waive this right)
    • Be simple enough that anyone can use any server. You don’t need to pay $10,000US per year for permission to use it. It’s an open protocol, anyone can add support for SSO, anyone can run their own SSO server for their own users.
    • Authentication servers are not tied down to username/passwords, they are free to authenticate you via other methods (client side SSL certs would be one example).
    • You don’t need to trust the site that’s authenticating you. ie, you don’t ever give them your password
    • Make phishing attacks hard
    • You can have multiple identities. eg, Isomer the Undernet Oper, Perry the programmer, and l33tkiddie the warez courier.

    Things I’d like (but not so sure about at the moment)

    • Sites not being able to aggregate user data between themselves. Each site gets a userid token which represents you, and each site gets a different token. Two sites can’t figure out you’re the same person without outside help
    • The ability for sites to correspond with you without your email address, and for you to know which site sent you the message and be able to revoke a sites ability to contact you at any time.

    Things I’m relying on:

    • DNS is secure (ha!)
    • SSL is secure (heh)
    • Users aren’t terminally stupid. (bwahahaha)

    So my idea of how you’d implement this is as follows:

    • You goto sitea.example.com and it asks you for your username. You type in perry@secure.meta.net.nz
    • sitea.example.com sends a redirect to https://secure.meta.net.nz/auth?user=perry@secure.meta.net.nz&return=http://sitea.example.com/login&amp.ccokie1= where cookie1 is a sha1(user:secret:time) where secret is just some secret that sitea knows.
    • secure.meta.net.nz authenticates the user, and sends a redirect back to http://sitea.example.com/login&user=perry@secure.meta.net.nz&cookie2= Where cookie2 is a sign(hash(user,cookie1,return),secure.meta.net.nz’s) where it’s signed by secure.meta.net.nz’s private key, which sitea verifies by fetching http://sitea.example.com/public.key (and caching)

    So, can anyone see any fundamental flaws in this? This should be safe against MitM attacks (I think). Fishing attacks can be thwarted by showing a different style of login screen based on the username and a secret to secure.meta.net.nz. Unless the fishing site knows secure.meta.net.nz’s secret they can’t replicate the correct login screen and thus the phishing attack is avoided.

    Anyway, I’d like to hear peoples opinions. I’m going to try and code it up. The biggest problem at the moment is that things like php don’t seem to have public key functions.

    Update: Some notes from various people:

    • The antiphishing stuff would provide a different page to every user based on their username. This could change the colours, the layout of the page, the wording, the font (!), icons around the page etc.
    • Authentication doesn’t necessarily require a password, you could show people 10 pages with a grid of 10×10 images on them. The user selects the right image in each page, this gives 100**10 different combinations and perhaps would have more entropy than passwords. (since people are unlkely to use their birthdate).
    • Authentication could also use SSL client certificates from CACert or something.
    • I forgot the timestamp to send back to sitea this prevents replay attacks.
    • I forgot to send the original cookie1 back to sitea unmodified so that sitea doesn’t requre any extra storage (like syncookies)

     

    Dynamic languages and C

    Tuesday, December 21st, 2004

    I’ve been tinkering about an idea for a while. It’s fairly easy given an API specification to write glue code so that you can call from your favourite dynamic language into C. Swig is a program that given an API will generate all the glue code you need to make that library available in a variety of langauges (python, perl, java, ruby, tcl, guile and heaps of others). The problem is you have to sit down and write this glue yourself. But do you? When compiled with debugging information C libraries all contain information about the types of all the symbols in the file. C++ is even easier, the name mangling standards provide all the type information you could ever want.

    So, every so often I tinker around with writing a python library that will use pythons import hooks to let you import arbitary C libraries and call through transparently into the functions contained within. There are however several stumbling blocks.

    First is that calling an arbitary C function is difficult. Oh sure dlsym(3) can return a pointer to whatever symbol you can care to name, but you still need a way of passing it arbitary parameters. gcc has __builtin_apply() but that seems uh quirky at best, and isn’t really designed with this in mind. Then, after that I’ve got to write something to parse .so files and read the stabs debugging information. Now I’m planning on doing this in python calling through to libbfd (which I can manually provide the symbol table and type information for).

    Anyway, just my thoughts. Thought I should document them somewhere as I don’t think I’ve mentioned them here anywhere.

    IPv6 and in Internet (again!)

    Sunday, December 12th, 2004

    This seems to be a regular topic for me, oh well. Anyway, I’ve not mentioned Teredo here before so I thought I really should mention it. Teredo is a nifty protocol that lets you get a realworld IPv6 address on a machine that is behind most common forms of NAT. Teredo means that even if you are behind a NAT box that doesn’t easily support uPNP or manual configuration is too hard, or too dynamic to be easily done, you can get a real world address (even if it is IPv6 only) and use that.

    So why would you want a realworld IP address? Well, the major reasons are VoIP and P2P. If you have a realworld IP address then people can call you or you can more actively participate in a modern P2P network (where more active participation in usually rewarded with faster download speeds).

    Teredo is trival to setup for the gazillions of Microsoft clients out there, all you need to do is type two commands at the command line (documented on the wlug wiki). From a programmer point of view it’s easy to use, all you need is to support IPv6. No platform specific API’s or anything, practically every machine worth mentioning supports IPv6 these days and the code isn’t hard to modify from IPv4 only to IPv6 capable.

    So why aren’t people flocking all over Teredo and 6to4? Coz the clients don’t support it. I can’t fathom why they don’t support it, it should be a trivial thing for them to do and would bring huge advantages to them. If even Bittorrent supported IPv6 fully then there would be a way forward for people who are stuck behind NAT, and theres a possibility that >30% of the Internet traffic would be IPv6 today. Thats enough traffic for people to sit up and take notice and start getting serious about their IPv6 deployments. No more horrible tunnel brokers with uptime that makes a mole look positively high flying. No more routes crisscrossing the pacific (and sometimes the atlantic too).

    Bittorrent seems to support IPv6 reasonably easily in the protocol, and the support is in some of the clients, but the trackers don’t support IPv6. And without the trackers to tell the other clients about IPv6 clients there is no IPv6 swarm. Sigh.

    So one of my goals this summer is to get at least one tracker up and going with IPv6 support so we can experiment with Teredo and discover what the problems are, and so we can shine a torch out of the maze of limited IPv4 addresses and into the wide open expanse of IPv6 with no naive NAT gatekeepers.

    The (un?)reliability of the Internet

    Tuesday, October 26th, 2004

    I’ve been tinkering with some of the tools I’ve been working on. I have my Traceroute Mesh tool which I wrote ages ago. With some help from AJ from Maxnet, I got a BGP feed, and started doing some more interesting things. I wrote a simple Whois Interface, with syntax colour highlighting, and information live from BGP. It’s kinda useful to see what prefixes are announced by an AS, and what AS announces what prefix, as well as looking up abuse information. I had fun writing this, especially implementing the leaky bucket rate limiting code.

    I also wrote a tool that takes a destination prefix/AS, and shows the all the paths I’ve ever seen to that prefix/AS (astr). This can be used when the Internet is “broken” to figure out how the internet actually works when it’s working, or to look at historic data. Yay. However it gave me an idea, looking at almost any prefix it shows multiple paths. So how unstable is the internet?

    Well, the answer seems to be “very” unstable. For each prefix I measured the longest time it had between route changes, and plotted this. This shows for instance that 50% of prefixes change more frequently than once every 2 days.

    Why an interpreted language isn’t for us.

    Wednesday, September 29th, 2004

    So I’ve been making several postings recently about why I think that Mono and Java are the wrong choice for the Open source community. I believe we should work towards a stable, and portable ABI that all languages can adhere to and can use to form a common base. I know that I’d love to be able to write code in python that can call c++, c, perl and ruby functions as easy as it can call a python function. So, a short summary of the articles: