2012-12-05

Writing a stream to a zipfile in Python, harder than you think!

So here's the problem, you have a stream (a file-like object) in Python and you want to spool the contents of it into a zip archive. Sounds like a common requirement? It turns out to be very hard. I propose a solution here with hooks.

There are two methods for writing data to a zip file in the Python zipfile module.

ZipFile.write(filename[, arcname[, compress_type]])

and

ZipFile.writestr(zinfo_or_arcname, bytes[, compress_type])

The first takes the name of a file, opens it and spools the contents in to the archive in 8K chunks. Sounds like a good fit for what I want except that I have a file-like object, not a file name, and ZipFile.write won't accept that. I could create a temporary file on disk and write my data to that, then pass the name of the file instead but that supposes (a) that I have access to the file system for writing and (b) I don't mind spooling the data twice, once to the disk and once back out again for storage in the archive.

Before you protest, the ZipFile object only requires a file-like object with support for seek and tell, it doesn't actually have to be a file in the file system so (a) is still a valid scenario. We will have to ditch any clever ideas of spooling a zip file directly over network connections though. A closer look at the implementation shows us that once the data has been compressed and written out to the archive the stream is wound back to the archive entry's header to update information about the compressed and uncompressed sizes. Still, even if you are buffering the output at least you are dealing with the smaller compressed data and not the original uncompressed source.

So if ZipFile.write doesn't work for streams what about using ZipFile.writestr instead? This takes the data as a string of bytes (in memory). For larger files this is unlikely to be practicable. I did wonder about tricking this method with a string-like object but even if I could do this the method will still attempt to create an ordinary string with the entire compressed data which won't work for large streams.

Solution 1

The first solution is taken from a suggestion on StackOverflow. The idea is to wrap the ZipFile object and write a new method. Clearly that would be something good for the module maintainers to consider but it requires considerable copying of code. If I'm going to be so dependent on the internals of the ZipFile object implementation I might as well look to see if there is a better way.

Solution 2

Looking at the ZipFile implementation the write method is clearly very close to what I want to do. If only it would accept a file-like object! A closer look reveals that it only does two things with the passed filename. It calls os.stat and then, shortly afterwards, calls open to get a file-like object.

This got me thinking whether or not I could trick the write method in to accepting something other than the name of a file. I created an object (which I called a VirtualFilePath) and gave it a stat and open method. The implementation is not important, but this object essentially wraps my file-like object simulating these two operating system functions.

Unfortunately, I can't pass a VirtualFilePath to the operating system open function. I'll get an error that it wasn't expecting an instance. The same goes for os.stat. However, I can write hooks to intercept these calls and redirect the calls to my methods if the argument is a VirtualFilePath. This is basically what my solution looks like:

import os,__builtin__

stat_pass=os.stat
open_pass=__builtin__.open

def stat_hook(path):
 if isinstance(path,VirtualFilePath):
  return path.stat()
 else:
  return stat_pass(path)

def open_hook(path,*params):
 if isinstance(path,VirtualFilePath):
  return path.open(*params)
 else:
  return open_pass(path,*params)

class ZipHooks(object):
 hookCount=0
 
 def __init__(self):
  if not ZipHooks.hookCount:
   os.stat=stat_hook
   __builtin__.open=open_hook
  ZipHooks.hookCount+=1
  
 def __enter__(self):
  return self
 
 def __exit__(self, type, value, traceback):
  self.Unhook()
      
 def Unhook(self):
  ZipHooks.hookCount-=1
  if not ZipHooks.hookCount:
   os.stat=stat_pass
   __builtin__.open=open_pass

This code adds hooks which detect my VirtualFilePath object when it is passed to open or stat and redirects those calls. To make it easier to manage the hooks we create a ZipHooks object with __enter__ and __exit__ methods allowing it to be used in a 'with' statement like this:

with ZipHooks() as zh:
 # add stuff to an archive using VirtualFilePath here

There's one final detail to clear up. stat is supposed to return the size of the file but what if I don't know it because I'm reading data from a stream? In fact, closer inspection of the ZipFile.write method's implementation shows that it doesn't really rely on the size returned by stat as it monitors both compressed and uncompressed sizes and re-stuffs the header when it back-tracks.

The only other bits of stat that ZipFile.write is interested in is the modification date of the file and the mode (which it uses to determine if the file is really a directory). So if your file-like object isn't very file-like at all it won't matter too much because you only have to fake these fields in the stat result.

2012-08-14

Thomson Routers from plusnet: problems using Gmail over wifi?

Summary

If you came to this page because of a Google search for this problem then here is my advice in a nutshell:

  1. Turn off the "Web Browsing Interception" feature in your router by setting it to "disabled".
  2. Set your wifi to b/g operation only to reduce data speeds - you probably don't need more than 54Mb/s if you are on ADSL
  3. Power the router off, count to 30 and power it back on.
  4. If the problem returns, repeat step 3.
  5. If the problem comes back frequently consider using a cheap timer plug to power cycle the router automatically in the middle of the night.

Oh, while you're here, could you just look at my PC?

Everyone who works in any job that is vaguely computer related will know that sometimes, when visiting a friend or relation's house the conversation will come around to some little problem they are having with their PC or home network. I had just such an experience at the weekend. I would normally have attempted to post the technical workaround to the plusnet forum where the particular issue I was debugging is being discussed but you have to be a plusnet customer to use the forums and I'm just a visiting techie so I'm posting what I know here.

The Symptoms

The scenario is a simple home network with a Thomson TG585 v8 ADSL router supplied by plusnet and an ordinary Dell laptop. I actually got a report of this problem earlier in the year. The main symptom was that email with attachments couldn't be sent. As the laptop owner spoke to me on a cordless wireless phone I could hear some interference on the line each time the laptop tried to send the email.

The First Diagnosis

This network is in a fairly isolated spot with only one other house within wifi range so interference from other wifi networks seemed unlikely. However, the house does have a cordless phone with a signal booster to reach to outbuildings. As is often the case with wifi issues you have to suspect interference, some older cordless phones are simply unusable alongside 2.4GHz routers. The obvious test is to instruct the user to plug their laptop directly in to one of the ethernet ports on the back of the router. Sure enough, the problem went away.

So is wifi really incompatible with this make of phone? The phone is of a modern design and is supposed to be compatible with wifi but perhaps that booster is the problem? To make the cable-based workaround easier the router was moved nearer the laptop. This took it further away from the phone and the booster - I had hoped that this might improve wifi connectivity enough for attachments to work but the problem remained. Needless to say, next time I visited the house the subject came up!

The problem gets worse

I must confess, I had forgotten about the problem completely until I arrived at the house and attempted to send an email via their wifi network from my own laptop. I was able to receive all my email no problem, web browsing was fast but I could not send messages through either my work email or through my Gmail account. Could this be related to the problem with sending attachments? Here is someone asking a similar question: Gmail, Windows Live Mail and Plusnet problems?

The problem appears so black-and-white I began to suspect that it must be a block at the ISP until I read this thread on plusnet's community forum: Mail cannot send attachments. The thread ends with an answer from a plusnet employee but the original problem from someone using Apple Mail (as I do) goes undiagnosed.

Eventually one of my smaller email messages managed to sneak out of my laptop at the Nth time of trying. Intermittent problems may be the worst to diagnose but at least this showed me that it is just a really bad connectivity problem.

I'm not alone

At this point I went back to my first diagnosis. I tried every channel, I powered off every other wireless device I could find (including the phones) and I still had the same trouble. This is beginning to look like a bug in the router. I checked the firmware version (8.2.7.8) and found a new thread that matched my symptoms: Thomson TG585V8 poor upload speed over wifi and then the more general: Wireless connectivity on the Thomson V8. The second of these has 13 pages of discussion and brings the problem right up to date: but with no resolution!

The solution

To recap, we have a router that is perfectly capable of sending (downloading) data over wifi at high speed but is very bad at receiving it during uploads. I started suspecting some type of buffer overflow or error in one of the protocol stacks but restricting it to b or b/g operation didn't fix the issue either. I then started looking at other tools and settings in the router and came upon "Web Browsing Interception". I have no idea what this means but it was set to automatic. I found one post on a DSL forum which was enough to raise suspicion: Web Browsing Interception?. I disabled it and applied the settings, almost immediately my email started going through as normal.

The same feature is cited in this thread relating to different Thomson hardware (which I found later): "Solved: Thomson Speedtouch Modem, disconnects often and slows down browsing".

So did this solve the problem with attachments? No. At least not directly. Turning this setting off stopped the router interfering with mail clients making outbound connections to Gmail and Microsoft mail products. But I was still getting larger attachments stuck at about the 100Kb mark. It may be that all I have done is free up enough resource (RAM or CPU) to send small messages by turning off a feature of the router software. In which case, it seems likely that a resource leak of some kind is to blame in these routers. To test the theory I powered the router on and off, waited ages for it to restart (it really is slow) and then tried again. Everything worked fine - all attachments sending without error, even large ones.

So is this Web Browsing Interception feature to blame? Possibly, the router has been powered on and off before without fixing the problem so it seems likely that this feature is either causing or exacerbating a resource leak in the unit.

2012-07-08

The demise of iGoogle: is this the beginning of the end for widgets?

What's happening to iGoogle? - Web Search Help:

So I woke up this morning and found a warning on my home page.  iGoogle is going away it calmly announced.  November 2013, but earlier than that if you are using a 'mobile' device.

So that means my home page is going away.  I can finally give up on the idea that Google will come up with a decent iGoogle experience on the iPad, or even on my Android phone.  I may have just upgraded to a slightly larger 15" MacBook but it is still portable - it must be, I've just carried it from the UK to New Orleans ready for OSCELOT and Blackboard's annual 'world' conference.  In fact, thanks to United Airlines' upgrade programme I was even able to plug it in and use it for most of the flight.


So what am I going to miss about my home page?  I'll miss the "Lego men" theme but I can probably live without Expedia reminders which count me down to my next trip.  In fact, if they would only just say something more appropriate than "Enjoy your holiday" I might miss that gadget a bit more.  I have quick access to Google Reader but I increasingly find myself reading blogs on my phone these days - it just seems like the right thing to do on the tube (London's underground railway).  I'm not sure about Google bookmarks - I assume they're staying so I might have to make them my home page instead.  Finally, I'm not sure I've ever actually chatted to anyone through iGoogle.

So I'm over iGoogle, not as easily as Bookmark lists but nothing I can't handle.

Is this the end of widgets?


iGoogle was one of the more interesting widget platforms when it launched.  You can write your own widgets, get them published to their gadget list so that other users can download them and install them on their iGoogle page.  They're small, simpler than full-blown computer applications on the desktop, simpler and smaller than complete websites.  iGoogle is a platform which reduces the barrier to entry for all sorts of cool little apps.  It is particularly good for apps that allow you to access information stored on the web, especially if it is accessible by JSON or similar web services.

You may notice a strong resemblance to mobile apps.  Google certainly have and this is the main reason why iGoogle is going away.  It is no longer the right platform.  People with 15" screens organizing lots of widgets on their browser-based home page are an anachronism.  These days people organize their apps on 'home screens', flipping between them with gestures.  They don't need another platform.  Apple have already seen this coming, in fact, they are having a significant influence on the future.  There is already convergence in newer versions of Mac OS.

There's a lot of engineering involved in persuading browsers to act as a software platform.  The browser doesn't do it all for you (and plugins do not seem like the solution because they are browser specific).  There are a number of widget toolkits available for would-be portal engineers but the most popular portals tend to have their own widget implementations (just look at your LMS or Sharepoint).

For many years I've been watching the various widget specifications emerge from the W3C, lots of clever engineers are involved (those I know I hold in high regard) but I'm just beginning to get that sickening feeling that it is all going to have been a learning experience.  At the end of the process we may have a technically elegant, well-specified white elephant.

As someone who has spent many years developing software in the e-Learning sector I've always found it hard to draw the line between applied research which is solely for the purposes of education and applied research which is more generally applicable but is being driven by the e-Learning sector.  As a community, we often stretch the existing platforms and end up building out new frameworks only to have to throw stuff away as the market changes underneath us.  The web replaced HyperCard and Toolbook in just this way - some of the content got migrated but the elaborate courseware management systems (as we used to call them) all had to be thrown away.













'via Blog this'

2012-07-02

QTI Pre-Conference Workshop: next week!

Sadly I won't be able to make this event next week but I thought I'd pass on a link to the flyer in case there is anyone still making travel plans.

http://caaconference.co.uk/wp-content/uploads/CAA-2012-Pre-Conference-Workshop.pdf

I'm still making the odd change to the QTI migration tool - and the integration with the PyAssess library is going well.  This will bring various benefits like the ability to populate the correct response for most item types when converting from v1.  So if you have a v1 to v2 migration question coming out of the workshop please feel free to get in touch or post them here.

'via Blog this'

2012-06-19

What is ipsative assessment and why would I use it? | Getting Results -- The Questionmark Blog

What is ipsative assessment and why would I use it? | Getting Results -- The Questionmark Blog:

Having recently registered for the eAssessment Scotland conference I was reminded that last year I learnt a new word there: ipsative assessment.

This is a nice summary on the subject from John Kleeman.

'via Blog this'

2012-06-15

Viewing OData $metadata with XSLT

In a recent post I talked about work I've been doing on OData. See Atom, OData and Binary Blobs for a primer on Atom and some examples of OData feeds.

In a nutshell, OData adds a definition language for database-like Entities (think SQL tables) to Atom and provides conventions for representing their properties (think SQL columns) in XML. Their definition language is based on ADO.NET, yes they could have chosen different bindings which would have made more work for their engineers, less for the rest of us and improved the chances of widespread adoption. But it is what it is (a phrase I seem to be hearing a lot of recently).

One of the OData-defined conventions is that data services can publish a metadata document which describes the entities that you are likely to encounter in the Atom feeds it publishes. This can be useful if you want to POST a new entry and you don't know what type of data property X is supposed to have. To get the metadata document you just get a special URL, for the sample data service published by Microsoft:

http://services.odata.org/OData/OData.svc/$metadata

To see what this might look like in the real world you can also look at the $metadata document published as part of the Netflix OData service I used as the source of my examples last time.

http://odata.netflix.com/v2/Catalog/$metadata

Wouldn't it be nice to have a simple documented form of this information? The schema on which it is based even allows annotations and Documentation elements. I browsed the web a bit but couldn't find anyone who had done this so I wrote a little XSLT myself. Here is a picture of part of the output from transforming the Netflix feed

Now there is an issue here. One of the things that I've commented on before is the annoying habit of specification writers to change the namespace they use when a new version is published. I can see why some people might do this but when 90% of the spec is the same it just causes frustration as tools which look for one namespace have to have significant revisions just to work with a minor addition of some optional elements.

As a result, I've published two versions of the XSLT that I used to create the above picture. If somebody out there knows how I can do all of this in one XSL file without lots of copying and pasting I'd love to know.

The first xslt uses the namespace from CSDL 1.1 and can be used to transform the metadata from the sample OData service published by Microsoft. The second xslt uses the namespace from CSDL 2.0 and must be used to transform the metadata from Netflix. If you get a blank HTML file with just a heading then try the other one. When clicking on these links your browser may attempt to render them as HTML, you can "View Source" to see the XSLT as plain text.

Here is how I've used these files on my Mac:

$ curl http://services.odata.org/OData/OData.svc/\$metadata > sample.xml
$ xsltproc odata2html-v1p1.xsl sample.xml > sample.html

$ curl http://odata.netflix.com/v2/Catalog/\$metadata > netflix.xml
$ xsltproc odata2html-v2.xsl netflix.xml > netflix.html

This transform could obviously be improved a lot, it only shows you the Entities, Complex Types and Associations though I am rather proud of the hyperlinking between them.

2012-06-07

Launching learning activities with AICC CMI-5: GET or POST?

With the AICC working on a new specification to update the existing use of the CMI data model and replace the ageing (but still popular!) HACP it seems like a good opportunity to fix the concept of launch via a web browser to harmonise learning technology standards with best practice in mainstream applications. By the way, if you are curious HACP is defined in the 1993 document: [CMI001] CMI Guidelines for Interoperability AICC.

The Learning Management System (LMS) typically does the launch by creating a special page that is delivered to the user's web browser containing a form (or a naked hyperlink with query parameters) to trigger the activity in a remote system. There are some subtle things going on here because best practice in HTML has been based on the assumption that the server that generated the form is the one that will receive the submission whereas our community tends to rely on cross-site form submission (XSFS perhaps?).

Anyway, you can read my recommendation to the AICC in the open forum created by them to discuss the CMI-5 draft specification. And of course, if you haven't had a look yet I'd encourage you to have a quick look over the specification yourself. The more people review and comment on these documents the better.

Link: Topic: 8.1 Web (Browser) Environment

The topic I just started with my new proposal on how the LMS should launch a learning activity in future. I'm recommending that the POST method is used with GET supported for backwards compatibility only so if you support AICC standards please read and review my proposal because you might have to do some work on your launch code to support CMI-5 if my proposal is accepted.

Link: CMI-5 Forum Home Page

The home page of the AICC CMI-5 public forum, see all threads about the proposed specification here. Note that only registered users can see the interactive specification viewer in the forums but there is a general public download on the main CMI-5 page:

Link: CMI-5 Home Page

You can download the full specification as a PDF from here.

2012-06-05

Lion, wxPython and py2app: targeting Carbon & Cocoa

About this time last year I wrote a blog entry on installing wxPython and py2app on my Mac running Snow Leopard. Well I have since upgraded to Lion and the same installation seemed to keep working just fine. This weekend I've actually upgraded to a new Mac and I thought this would be an excellent opportunity to grapple with this installation again and perhaps advance my understanding of what I'm doing.

Apple's Migration Assistant (a.k.a. Setup Assistant) had other ideas. You have to hand it to them, it took about 2-3 hours with the two machines plugged together and everything was copied across and working without any intervention. They really do make it easy to buy a new Mac and get productive straight away.

So is this blog post the Python equivalent of the "pass" statement? Just a no-op?

Well not quite. At the moment, I'm building my QTI migration tool using the Carbon version of wxPython which means forcing Python to work in 32bit mode. That's getting a bit out-dated now and I surely can't take advantage of my new 8GB Mac? I need to embrace 64bit builds, I need to figure out how to build for the Cocoa version of wxPython and I need to figure out how to do this while retaining my ability to create the 32-bit build for older hardware/versions of Mac OS.

So here is how I now recommend doing this...

Step 1: Install Python

The lesson from last time was that you can't rely on the python versions installed by Apple to do these builds for you. If you run python on a clean Lion install you'll get a 64-bit version of python 2.7.1:

$ which python
/usr/bin/python
$ python
Python 2.7.1 (r271:86832, Jul 31 2011, 19:30:53) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys;"%X"%sys.maxsize
'7FFFFFFFFFFFFFFF'

Thanks to How to tell if my python shell is executing in 32bit or 64bit mode? for the tip on printing maxsize.

I want to keep python pointing here because the command-line version of the migration tool, and the supporting Pyslet package (which can be used independently) need to work 'out-of-the-box'. The Migration Assistant had helpfully copied over my .bash_profile which included modifications made by my custom Mac binary install of Python 2.7 last year. The modifications are well commented and help to explain the different paths we'll be dealing with:

# Setting PATH for Python 2.7
# The orginal version is saved in .bash_profile.pysave
PATH="/Library/Frameworks/Python.framework/Versions/2.7/bin:${PATH}"
export PATH

Firstly, note that the Mac binaries install in /Library/Frameworks/ whereas the pre-loaded Apple installations are all in /System/Library/Frameworks/. This is a fairly subtle difference so a little bit of concentration is required to prevent mistakes. Anyway, as per the instructions above I restored my .bash_profile from .bash_profile.pysave and confirmed (as above) that I was getting the Apple python.

It seems like 2.7.3 is the latest version available as a Mac binary build from the main python download page. This will make it a bit easier to check I'm running the right interpreter! So I downloaded the dmg from the following link and ran the installer: http://www.python.org/ftp/python/2.7.3/python-2.7.3-macosx10.6.dmg. For me this was an upgrade rather than a clean install. The resulting binaries are put on the path in /usr/local/bin. By default, the interpreter runs in 64bit mode but it can be invoked in 32-bit mode too:

$ /usr/local/bin/python
Python 2.7.3 (v2.7.3:70274d53c1dd, Apr  9 2012, 20:52:43) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys;"%X"%sys.maxsize
'7FFFFFFFFFFFFFFF'
$ which python-32
/usr/local/bin/python-32
$ python-32
Python 2.7.3 (v2.7.3:70274d53c1dd, Apr  9 2012, 20:52:43) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys;"%X"%sys.maxsize
'7FFFFFFF'

This might be enough, but the wxPython home page does recommend that you use different python installations if you want to run both the Carbon and Cocoa versions. So I'll repeat the installation with a python 2.6 build. The current binary build is 2.6.6, this is missing some important security fixes that have been included in 2.6.8 but it looks safe for the migration tool. I downloaded the 2.6 installer from here. When I ran the installer I made sure I only installed the framework, I don't want everything else getting in the way.

I now have a python 2.6 installation in /Library/Frameworks/Python.framework/Versions/2.6

$ /Library/Frameworks/Python.framework/Versions/2.6/bin/python2.6
Python 2.6.6 (r266:84374, Aug 31 2010, 11:00:51) 
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys;"%X"%sys.maxsize
'7FFFFFFF'

Notice that this is a 32-bit build. Apple actually ship a 64-bit build of 2.6.7 so care will be needed, typing python2.6 at the terminal will not bring up this new installation.

To make life a little bit easier I always create a bin directory in my home directory and add it to the path in my .bash_profile using lines like these:

PATH=~/bin:${PATH}
export PATH

This is going to come in very handy in the next step.

Step 2: setuptools and easy_install

setuptools is required by lots of Python packages, it is designed to make your life very easy but it takes a bit of fiddling to get it working with these custom installations. It's an egg which means it runs magically from the command line, I'll show you the process of installing it on python 2.6 but the instructions for putting it in 2.7 are almost identical (it's just a different egg).

I downloaded the egg from here: http://pypi.python.org/packages/2.6/s/setuptools/setuptools-0.6c11-py2.6.egg#md5=bfa92100bd772d5a213eedd356d64086 and then took a peek at the top of the script:

$ head -n 8 setuptools-0.6c11-py2.6.egg 
#!/bin/sh
if [ `basename $0` = "setuptools-0.6c11-py2.6.egg" ]
then exec python2.6 -c "import sys, os; sys.path.insert(0, os.path.abspath('$0')); from setuptools.command.easy_install import bootstrap; sys.exit(bootstrap())" "$@"
else
  echo $0 is not the correct name for this egg file.
  echo Please rename it back to setuptools-0.6c11-py2.6.egg and try again.
  exec false
fi

I've only shown the top 8 lines here, the rest is binary encoded gibberish. The thing to notice is that, on line 3 it invokes python2.6 directly so if I want to control which python installation setuptools is installed for I need to ensure that python2.6 invokes the correct interpreter. That's where my local bin directory and path manipulation comes in handy.

$ cd ~/bin
$ ln -s /Library/Frameworks/Python.framework/Versions/2.6/bin/python2.6 python2.6
$ ln -s /Library/Frameworks/Python.framework/Versions/2.6/bin/python2.7 python2.7

Now for me, and anyone who inherits my $PATH, invoking python2.6 will start my custom MacPython install.

$ sudo -l python2.6
/Users/swl10/bin/python2.6

Fortunately sudo is configured to inherit my environment. It was worth checking as this is configurable. I can now install setuptools from the egg:

$ sudo sh setuptools-0.6c11-py2.6.egg 
Password: [I had to type my root password here]
Processing setuptools-0.6c11-py2.6.egg
Copying setuptools-0.6c11-py2.6.egg to /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages
...

I did the same for python 2.7 (note that you need a different egg) and then added links to easy_install to my bin directory:

$ cd ~/bin
ln -s /Library/Frameworks/Python.framework/Versions/2.7/bin/easy_install easy_install-2.7
ln -s /Library/Frameworks/Python.framework/Versions/2.6/bin/easy_install easy_install-2.6

Step 3: wxPython

wxPython is a binary installer tied to a particular python version. However, I believe it uses a scatter gun approach to search for python installations and will install itself everywhere with a single click. That is the reason why it is better to run completely different versions of python if you want completely different versions of wxPython. In fact, if you find yourself with multiple installs there is a wiki page that explains how to switch between versions. But a little playing reveals that this refers to Major.Minor version numbers. It can't cope with the subtlety of switching between builds or switching between Carbon and Cocoa as far as I can tell so this won't help us.

My plan is to install the Carbon wxPython (which is 32bit only) for python 2.6 and the newer Cocoa wxPython for python 2.7. The wxPython download page has a stable and unstable version but to get Cocoa I'll need to use the unstable version. The stability referred to is that of the API, rather than the quality of the code. Being cautious I downloaded the stable 2.8 (Carbon) installer for python 2.6 and the unstable 2.9 Cocoa installer for python 2.7. Installation is easy but look out for a useful script on the disk image which allows you to review and manage your installations. To invoke the script you can just run it from the command line:

$ /Volumes/wxPython2.9-osx-2.9.3.1-cocoa-py2.7/uninstall_wxPython.py

When I was done with the installations it reported the following configurations as being present:

  1.  wxPython2.8-osx-unicode-universal-py2.6     2.8.12.1
  2.  wxPython2.9-osx-cocoa-py2.7                 2.9.3.1

(If, like me, you are upgrading from previous installations you may have to clean up older builds here.) At this point I tested my wxPython based programs and confirmed that they were working OK. I was impressed that the Cocoa version seems to work unchanged.

Step 4: py2app

With the groundwork done right, the last step is very simple. The symlinks we put in for easy_install make it easy to install py2app.

$ sudo easy_install-2.6 -U py2app
Password: [type your root password here]
Searching for py2app
Reading http://pypi.python.org/simple/py2app/
Reading http://undefined.org/python/#py2app
Reading http://bitbucket.org/ronaldoussoren/py2app
Best match: py2app 0.6.4

...[snip]...

Installed /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/altgraph-0.9-py2.6.egg
Finished processing dependencies for py2app

The plot is very similar for python 2.7.

I've now added the following to my setup script:

import wx
if "cocoa" in wx.version():
  suffix="-Cocoa"
else:
  suffix="-Carbon"

The suffix is then appended to the name passed to the setup call itself. The following command results in a 32bit, Carbon binary, compatible with OS X 10.3 onwards.

$ python2.6 setup.py py2app

While this command creates a Cocoa based 64bit binary for 10.5 and later.

$ python2.7 setup.py py2app

And that is how to target both Carbon and Cocoa in your wxPython projects.

2012-06-01

Atom, OData and Binary Blobs

I've been doing a lot of work on Atom and OData recently. I'm a real fan of Atom and the related Atom Publishing Protocol (APP for short). OData is a specification from Microsoft which builds on these two basic building blocks of the internet to provide standard conventions for querying feeds and representing properties using a SQL-like model.

Given that OData can be used to easily expose data currently residing in SQL databases it is not surprising that the issue of binary blobs is one that takes a little research to figure out. At first sight it isn't obvious how OData deals with them, in fact, it isn't even obvious how APP deals with them!

Atom Primer

Most of us are familiar with the idea of an RSS feed for following news articles and blogs like this one (this article prompted me to add the gadget to my blogger templates to make it easier to subscribe). Atom is a slightly more formal definition of the same concept and is available as an option for subscribing to this blog too. Understanding the origins of Atom helps when trying to understand the Atom data model, especially if you are coming to Atom from a SQL/OData point of view.

Atom is all about feeds (lists) of entries. The data you want, be it a news article, blog post or a row in your database table is an entry. A feed might be everything, such as all the articles in your blog or all the rows in your database table, or it may be a filtered subset such as all the articles in your blog with a particular tag or all the rows in your table that match a certain query.

Atom adheres closely to the REST-based service concept. Each entry has its own unique URI. Feeds also have their own URIs. For example, the Atom feed URL for this blog is:

http://swl10.blogspot.com/feeds/posts/default

But if you are only interested in the Python language then you might want to use a different feed:

http://swl10.blogspot.com/feeds/posts/default/-/Python

Obviously the first feed contains all the entries in the second feed too!

Atom is XML-based so an entry is represented by an <entry> element and the content of an entry is represented by a <content> child element. Here's an abbreviated example from this blog's Atom feed. Note that the atom-defined metadata elements appear as siblings of the content...

<entry>
  <id>tag:blogger.com,1999:blog-8659912959976079554.post-4875480159917130568</id>
  <published>2011-07-17T16:00:00.000+01:00</published>
  <updated>2011-07-17T16:00:06.090+01:00</updated>
  <category scheme="http://www.blogger.com/atom/ns#" term="QTI"/>
  <category scheme="http://www.blogger.com/atom/ns#" term="Python"/>
  <title type="text">Using gencodec to make a custom character mapping</title>
  <content type="html">One of the problems I face...</content>
  <link rel="edit" type="application/atom+xml"
    href="http://www.blogger.com/feeds/8659912959976079554/posts/default/4875480159917130568"/>
  <link rel="self" type="application/atom+xml"
    href="http://www.blogger.com/feeds/8659912959976079554/posts/default/4875480159917130568"/>
</entry>

For blog articles, this content is typically html text (yes, horribly escaped to allow it to pass through XML parsers). Atom actually defines three types of native content, 'html', 'text' and 'xhtml'. It also allows the content element to contain a single child element corresponding to other XML media types. OData uses this method to represent the property name/value pairs that might correspond to the column names and values for a row in the database table your are exposing.

Here's another abbreviated example taken from the Netflix OData People feed:

<entry>
  <id>http://odata.netflix.com/v2/Catalog/People(189)</id>
  <title type="text">Bruce Abbott</title>
  <updated>2012-06-01T07:55:17Z</updated>
  <category term="Netflix.Catalog.v2.Person" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
  <content type="application/xml">
    <m:properties>
   <d:Id m:type="Edm.Int32">189</d:Id>
   <d:Name>Bruce Abbott</d:Name>
    </m:properties>
  </content>
</entry>

Notice the application/xml content type and the single properties element from Microsoft's metadata schema.

Any other type of content is considered to be external media. But Atom can still describe it, it can still associate metadata with it and it can still organize it into feeds...

Binary Blobs as Media

There is nothing stopping the content of an entry from being a non-text binary blob of data. You just change the type attribute to be your favourite blob format and add a src attribute to point to an external file or base-64 encode it and include it in the entry itself (this second method is rarely used I think).

Obviously the URL of the entry (the XML document containing the <entry> tag) is not the same as the URL of the media resource, but they are closely related. The entry is referred to as a Media Link because it contains the metadata about the media file (such as the title, updated date etc) and it links to it. The media file itself is known as a media resource.

There's a problem with OData though. OData requires the child of the content element to be the properties element (see example above) and the type attribute to be application/xml. But Atom says there can only be one content element per entry. So how can OData be used for binary blobs?

The answer is a bit of a hack. When the entry is a media link entry the properties move into the metadata area of the entry. Here's another abbreviated example from Netflix which illustrates the technique:

<entry>
  <id>http://odata.netflix.com/v2/Catalog/Titles('13aly')</id>
  <title type="text">Red Hot Chili Peppers: Funky Monks</title>
  <summary type="html">Lead singer Anthony Kiedis...</summary>
  <updated>2012-01-31T09:45:16Z</updated>
  <author>
    <name />
  </author>
  <category term="Netflix.Catalog.v2.Title"
    scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
  <content type="image/jpeg" src="http://cdn-0.nflximg.com/en_us/boxshots/large/5632678.jpg" />
  <m:properties xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"
    xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices">
    <d:Id>13aly</d:Id>
    <d:Name>Red Hot Chili Peppers: Funky Monks</d:Name>
    <d:ShortName>Red Hot Chili Peppers: Funky Monks</d:ShortName>
    <d:Synopsis>Lead singer Anthony Kiedis...</d:Synopsis>
    <d:ReleaseYear m:type="Edm.Int32">1991</d:ReleaseYear>
    <d:Url>http://www.netflix.com/Movie/Red_Hot_Chili_Peppers_Funky_Monks/5632678</d:Url>
    <!-- more properties.... -->
  </m:properties>
</entry>

This entry is taken from the Titles feed, notice that the entry is a media-link to the large box graphic for the film.

Binary Blobs and APP

APP adds a protocol for publishing information to Atom feeds and OData builds on APP to allow data feeds to be writable, not just read-only streams. You can't upload your own titles to Netflix as far as I know so I don't have an example here. The details are all in section 9.6 of RFC 5023 but in a nutshell, if you post a binary blob to a feed the server should store the blob and create a media link entry that points to it (populated with a minimal set of metadata). Once created, you can then update the metadata with HTTP's PUT method on the media link's edit URI directly, or update the binary blob by using HTTP's PUT method on the edit-media URI of the media resource. (These links are given in the <link> elements in the entries, see the first example for examples.)

There is no reason why binary blobs can't be XML files of course. Many of the technical standards for education that I work with are very data-centric. They define the format of XML documents such as QTI, which are designed to be opaque to management systems like item banks (an item bank is essentially a special-purpose content management system for questions used in assessment).

So publishing feeds using OData or APP from an item bank would most likely use these techniques for making the underlying content available to third party systems. Questions often contain media resources (e.g., images) of course but even the question content itself is typically marked up using XML, as it is in QTI. This data is not easy to represent as a simple list of property values and would typically be stored as a blob in a database or as a file in a repository. Therefore, it is probably better to think of this data as a media resource when exposing it via APP/OData.

2012-05-22

Common Cartridge, Namespaces and Dependency Injection

This post is about coping with a significant change to the newer (public) versions of the IMS Common Cartridge specification.  This change won't affect everyone the same way, your implementation may just shrug it off.  However, I found I had to make an important change to the QTI migration tool code to make it possible to read QTI version 1 files from the newer form of cartridges.

There have been three versions of this specification now, versions 1.0, 1.1 and most recently version 1.2.  The significant change for me was between versions 1.0 (published October 2008) and 1.1 (revised May 2011).

Changing Namespaces

The key change between 1.0 and 1.1 was to the namespaces used in the XML files.  In version 1.0, the default namespace for content packaging elements is used in the manifest file: http://www.imsglobal.org/xsd/imscp_v1p1.

Content Packaging has also been through several revisions.  The v1p1 namespace (above) was defined in the widely used Content Packaging 1.1 (now on revision 4).  The same namespace was used for most of the elements in the (public draft) of the newer IMS Content Packaging version 1.2 specification too.  In this case, the decision was made to augment the revised specification with a new schema containing definitions of the new elements only.  The existing elements would stay in the 1.1 namespace to ensure that tools that recognise version 1.1 packages continue to work, ignoring the unrecognised extension elements.

Confusingly though, the schema definition provided with the content packaging specification is located here: http://www.imsglobal.org/xsd/imscp_v1p1.xsd whereas the schema definition provided with the common cartridge specification (1.0), for the same namespace, is located here: http://www.imsglobal.org/profile/cc/ccv1p0/derived_schema/imscp_v1p2.xsd.  That's two different definition files for the same namespace.  Given this discrepancy it is not surprising that newer revisions of common cartridge have chosen to use a new namespace entirely.  In the case of 1.1, the namespace used for the basic content packaging elements was changed to http://www.imsglobal.org/xsd/imsccv1p1/imscp_v1p1.

But this decision is not without consequences.  The decision to retain a consistent namespace in the various revisions of the Content Packaging specification enabled existing tools to continue working.  Sure enough, the decision to change the namespace in Common Cartridge means that some tools will not continue working.  Including my Python libraries used in the QTI migration tool.

From Parser to Python Class

In the early days of XML, you could identify an element within a document by its name, scoped perhaps by the PUBLIC identifier given in the document type definition.  The disadvantage being that all elements had to be defined in the same scope.  Namespace prefixes were used to help sort this mess out.  A namespace aware parser splits off the namespace prefix (everything up to the colon) from the element name and uses it to identify the element by a pair of strings: the namespace (a URI) and the remainder of the element name.

The XML parser at the heart of my python libraries uses these namespace/name pairs as keys into a dictionary which it uses to look up the class object it should use to represent the element.  The advantage of this approach is that I can add behaviour to the XML elements when they are deserialized from their XML representations through the methods defined on the corresponding classes.  Furthermore, a rich class hierarchy can be defined allowing concepts such as XHTML's organization of elements into groups like 'inline elements' to be represented directly in the class hierarchy.

If I need two different XML definitions to map to the same class I can easily do this by adding multiple entries to the dictionary and mapping them to the same class.  So at first glance I seem to have avoided some of the problems inherent with tight-coupling of classes.  The following two elements could be mapped to the same Manifest class in my program:

('http://www.imsglobal.org/xsd/imscp_v1p1', 'manifest')
('http://www.imsglobal.org/xsd/imsccv1p1/imscp_v1p1', 'manifest')

This would work fine when reading the manifest from the XML stream but what about writing manifests?  How does my Manifest class know which namespace to use when I'm creating a new manifest?  The following code snippet from the python interpreter shows me creating an instance of a Manifest (I pass None as the element's parent).  The instance knows which namespace it should be in:

>>> import pyslet.imscpv1p2 as cp
>>> m=cp.Manifest(None)
>>> print m

<manifest xmlns="http://www.imsglobal.org/xsd/imscp_v1p1">
 <organizations/>
 <resources/>
</manifest>

This clearly won't work for the new common cartridges.  The Manifest class 'knows' the namespace it is supposed to be in because its canonical XML name is provided as a class attribute on its definition.  The obvious solution is to wrap the class with a special common cartridge Manifest that overrides this attribute.  That is relatively easy to do, here is the updated definition:

class Manifest(cp.Manifest):
    XMLNAME=("http://www.imsglobal.org/xsd/imsccv1p1/imscp_v1p1",'manifest')

Unfortunately, this doesn't do enough.  Continuing to use the python interpreter....

>>> class Manifest(cp.Manifest):
...     XMLNAME=("http://www.imsglobal.org/xsd/imsccv1p1/imscp_v1p1",'manifest')
... 
>>> m=Manifest(None)
>>> print m

<manifest xmlns="http://www.imsglobal.org/xsd/imsccv1p1/imscp_v1p1">
    <organizations xmlns="http://www.imsglobal.org/xsd/imscp_v1p1"/>
    <resources xmlns="http://www.imsglobal.org/xsd/imscp_v1p1"/>
</manifest>

Now we've got the namespace correct on the manifest but the required organizations and resources elements are still created in the old namespace.

The Return of Tight Coupling

If I'm going to fix this issue I'm going to have to wrap the classes used for all the elements in the Content Packaging specification.  That sounds like a bit of a chore but remember that the reason why the namespace has changed is because Common Cartridge has added some additional constraints to the specification so we're likely to have to override at least some of the behaviours too.

Unfortunately, wrapping the classes still isn't enough.  In the above example the organizations and resources elements are required children of the manifest.  So when I created my instance of the Manifest class the Manifest's constructor needed to create instances of the related Organizations and Resources classes and it does this using the default implementations, not the wrapped versions I've defined in my Common Cartridge module.  This is known as tight coupling, and the solution is to adopt a dependency injection solution.  For a more comprehensive primer on common solutions to this pattern you could do worse than reading Martin Fowler's article Inversion of Control Containers and the Dependency Injection pattern.

The important point here is that the logic inside my Manifest class, including the logic that takes place during construction, needs to be decoupled from the decision to use a particular class object to instantiate the Organizations and Resources elements.  These dependencies need to be injected into the code somehow.

I must admit, I find the example solutions in Java frameworks confusing because the additional coding required to satisfy the compiler makes it harder to see what is really going on.  There aren't many good examples of how to solve the problem in python.  The python wiki points straight to an article called Dependency Injection The Python Way.  But this article describes a full feature broker (like the service locator solution) which seems like overkill for my coupling problem.

A simpler solution is to pass dependencies in (in my case on the constructor) following a pattern similar to the one in this blogpost.   In fact, this poster is trying to solve a related problem of module-level dependeny but the basic idea is the same.  I could pass the wrapped class objects to the constructor.

Dependency Injection using Class Attributes

The spirit of the python language is certainly one of adopting the simplest solution that solves the problem.  So here is my dependency injection solution to this specific case of tight coupling.

I start by adding class attributes to set class dependencies.  My base Manifest class now looks something like this:

class Manifest:
    XMLNAME=("http://www.imsglobal.org/xsd/imscp_v1p1",'manifest')
    MetadataClass=Metadata
    OrganizationsClass=Organizations
    ResourcesClass=Resources

    # method definitions and other attributes follow...

And in my Common Cartridge module it is overridden like this:

class Manifest(cp.Manifest):
    XMLNAME=("http://www.imsglobal.org/xsd/imsccv1p1/imscp_v1p1",'manifest')
    MetadataClass=Metadata
    OrganizationsClass=Organizations
    ResourcesClass=Resources

Although these look similar, in the first case the Metadata, Organizations and Resources names refer to classes in the base Content Packaging module whereas in the second definition they refer to overrides in the Common Cartridge Module (note the use of cp.Manifest to select the base class from the original Content Packaging module).

Now the original Manifest's constructor is modified to use these class attributes to create the required child elements:

    def __init__(self,parent):
        self.Metadata=None
        self.Organizations=self.OrganizationsClass(self)
        self.Resources=self.ResourcesClass(self)

The upshot is that when I create an instance of the Common Cartridge Manifest I don't need to override the constructor just to solve the dependency problem. The base class constructor will now create the correct Organizations and Resources members using the overridden class attributes.

I've abbreviated the code a bit, if you want to see the full implementation you can see it in the trunk of the pyslet framework.

2012-05-04

IMS LTI and the length of oauth_consumer_key

I ran in to an interesting problem today.  While playing around with the IMS LTI specification I ran into a problem with the restriction, in MySQL, on keys being 1000 bytes.


ERROR 1071 (42000): Specified key was too long; max key length is 1000 bytes


OAuth uses the concept of a consumer key to identify the system from which a signed HTTP request has been generated.  The consumer key can, in theory, be any Unicode string of characters and the specification is silent on the issue of a maximum length.  The LTI specification uses examples in which the consumer key is derived from the DNS name of the originating system, perhaps prefixed with some additional identifier.  A DNS name can be a maximum of 255 characters, but the character set of a DNS name is restricted to a simple ASCII subset.  International domain names are now allowed but these are transformed into the simpler form so the effective maximum for a domain name using characters outside the simple ASCII set is reduced.

It seems likely that an oauth_consumer_key is going to get used as a key in a database table at some point during your implementation.  The clue is in the name.

A field such as VARCHAR(255) seems reasonable as storage, provided the character set of the field can take arbitrary Unicode characters.   Unfortunately this is likely to reserve a large amount of space, MySQL reserves 3 bytes per character when the UTF-8 character set is used to ensure that worst case encoding is accommodated.  That means that this key alone takes up 765 bytes of the 1000 byte limit, leaving only 235 bytes for any compound keys.  If the compound key is also likely to be VARCHAR that's a maximum of VARCHAR(78), which seems short if the compound key is something like LTI's context_id which is also a size unrestricted arbitrary Unicode string.  The context_id identifies the course within the Tool Consumer so a combined key of oauth_consumer_key and context_id looks like a reasonable choice.

One possibility might be to collapse consumer key values onto ASCII using the same (or a similar) algorithm to the one used for international domain names (see RFC 3490).  This algorithm would then allow use of the ASCII character set for these keys with the benefit that keys based on domain names, even if expressed in the Unicode original form, would end up taking 255 bytes or less.  Doing the translation may add to the overhead of the look-up but the benefit of reducing the overall key size might pay off anyway.

2012-05-01

SOAP - how has it survived so long?

Pete Lacey’s Weblog : The S stands for Simple:

I just got handed this link by my development team and thought it was pretty funny.  Don't be put off by the long page, it is mainly comments, some of which are pretty funny in themselves.

I was still laughing when I saw that the post dates from 2006.  Suddenly it seems poignant instead, how come we are still wrestling with such a hard to fathom and implement 'simple' protocol when the truly simple protocol, HTTP, has been staring us in the face all these years?

'via Blog this'

2012-04-23

Fixing a broken Office Document Cache

So you are using Microsoft Sharepoint to collaborate with your friends.  You navigate to a document and select "Edit in Microsoft Excel" (choose your Office program of choice here) and Excel is launched, starts downloading the file and then pops up a dialog box to say:

Could not open 'https://yoursharepointserver.sharepoint.com/yoursite/Shared Documents/sales figures.xlsx'

Then you realise you are getting this error for every document you try and edit!  No other details, no feedback, nothing in the event viewer.  Internet search results all seem to send you down the trusted site, turn off security settings in internet explorer settings route, etc.  Well before you run your PC in 'please hack me' mode you might want to read on...

If your PC is like mine, your Document Cache is probably done for.  You might try deleting it using the Upload Center (Start > All Programs > Microsoft Office > Microsoft Office 2010 Tools > Microsoft Office 2010 Upload Center) as per these instructions from Microsoft.

You may just find that you get a screen like this:

Upload Center error screenshot

... and that deleting the cache in the settings screen causes the Upload Center to crash.  In which case, you will need these invaluable instructions from the (unofficial?) Microsoft Office Support blog:

Manually Rename Office Document Cache ~ Microsoft Office Support:


'via Blog this'

2012-02-02

-1 for Google+Tweet

Last September I recommended use of a Chrome Extension called Google+Tweet on this blog.

I no longer recommend this. Google+Tweet has recently become a source of annoying and hard to remove adware called dropinsavings$ which pops up over various websites. For more information see one of the replies to this thread:

how to delete "drop in savings" bug ...keeps dropping down each web site where I am trying to buy something. - Google Groups:

'via Blog this'

2012-01-30

We ignore the politics of the DNS at our peril

Most people who use the Internet will be aware of the Stop Online Piracy Act (or SOPA), if you hadn't heard of it before January 18th you are likely to have become aware of it through the co-ordinated blackout of the internet on that day.

Like other recent additions made by the legislature the proposed act seems to have a purpose which aligns poorly with the current system of law.  To a certain extent, a system of written legislation (in contrast to a system of common law) is like a vast computer program.  Sometimes a new law is required to model something new in society.  The invention of the printing press really did trigger a debate about the ownership of intellectual outputs, giving way to copyright law.

Sometimes all that is required is a modification to an existing law to help clarify the way the law deals with a new test case, essentially making that part of the model a little less abstract than it was before.  But when a new law, with a clear purpose, requires widespread modifications it seems to me to be a sign that the purpose of the new law is not well aligned with the architecture of the social model our legal system represents.

Much has been written and spoken about SOPA and even more on copyright in general.  I found Cory Doctorow's talk on The Coming War on General Purpose Computing an amusing but thought provoking take on this.  In particular, his talk persuaded me that we probably should try and think about information and communications technology (computers and networks) as something genuinely new.  New in the sense that the printing press was new.  New in the sense that they have changed our social norms.  New in the sense that we need a new part of our legal model to help us resolve disputes, deal with new types of harm and criminalise new types of behaviour.

But in this blog post, I don't want to concentrate so much on the copyright aspects of the proposed bill but on the international dimension, in particular the Domain Name System (DNS).  The DNS is an internationally agreed way to look up the network location of an internet server from its friendly name (like swl0.blogspot.com).  Of course, internationally agreed begs the question, by which parties?

Although a little disingenuous, it could be said that the DNS works by agreement between the US government and a US-registered corporation that has been granted a monopoly over the technical operation of the DNS by the US government.  The corporation is called ICANN (see also their entry in Wikipedia).  I actually think that ICANN is probably pretty well meaning, the choice of California as the territory in which it is incorporated represents a continuity with its predecessor IANA which was essentially created by network engineers to deal with the practical tasks of registration rather than the politics.

ICANN works by charging members an annual fee and, in exchange, it delegates the operation of one of the top-level domains to that member.  A top-level domain is the bit after the last 'dot' in the name, so the name www.walkers.co.uk (a popular brand of potato crisp in the UK) is in the top-level domain "uk".  ICANN reserves two-letter top-level domains for members that represent the countries that use that two-letter code.  In fact, the UK is an odd exception because our two-letter code is GB but UK was already widely in use as a country-specific top-level domain in pre-internet networking.  I once heard a story that the choice of UK was nothing to do with the abbreviation of United Kingdom but actually because of the importance of the University of Kent in the early networking infrastructure.  It may be a myth but a page from the BBC on the History of its Internet Service suggests there may also be some truth in it.

In the US, where the internet was invented and from where it is still controlled, top-level domains were used directly such as by government (.gov) and military (.mil) registrants.  The history of .edu is a little more confusing but from 2001 onwards its use has been restricted to US institutions (just after I applied, unsuccessfully, to get a UK institution registered in .edu).  The situation is different for .com, .org, .net and all the other fancy new top-level domains.  Registration is open internationally.

But the advantage of using the country-codes in the DNS is that it makes it easy to see who you are dealing with, including helping you understand the likely legal system that you'll use to resolve disputes with them.  I know that if I'm interacting with a website that has a name ending in "co.uk" then there is a UK-registered company subject to UK law involved.

Internet users in the US are not so fortunate.  In fact there is a top-level .us domain reserved for US organizations but it was originally sub-divided by state.  That would actually make sense given the way the American legal system is largely state based, with commercial agreements and even taxation being subject to state law.  But although .us registrations are now open to any US-based institution it is rarely used (a quick search indicates that Apple have registered apple.us but Microsoft have not bothered with microsoft.us).  This tendency to ignore the country domains means that most people (in the US but increasingly elsewhere) have no idea what legal basis their internet dealings have.  Indeed, many country domains are now misused purely for their lexical appearance (witness Libya's .ly domain and the little known Pacific island of Tuvalu which owns the .tv domain).

So what does all this have to do with SOPA?  Well for some time now the DNS has got dragged in to legal disputes between companies with the result that websites have been disappearing from the internet all over the world based solely on the verdict of a US court.  SOPA would have made (or should I say, will make?) it much easier for this sort of thing to happen.

One of the reasons that law makers, and in particular the copyright publishers, want to put sites off the air is because internet users can't tell the difference between content legally distributed by legitimate American businesses and content that is essentially illegally imported into the US each time you click a link or download a file.  Turning off access to foreign sites containing offending material seems to make more sense than pursuing the humble user - a move which has been very unpopular when tried by the RIAA.

Although probably a lost opportunity, it would have been great if the public could have been given a clearer distinction between sites hosted outside their legal jurisdiction and those within it.  Imagine if your browser warned you like this: "You are downloading this file from a site registered in <Country>.  By downloading this file you take full responsibility for complying with US laws regarding trade with <Country>.  Do you want to continue?".  It isn't just information coming in to the US that is the problem either, for years some computer programs could not be exported from the US due to their use of strong cryptography.

The likely effect of continued SOPA-style DNS seizures in the US is the break up of the DNS itself with countries like China doing something similar to block US sites that host information that is considered illegal by the Chinese government.  This type of information ban does not have a good history of being effective, even before the internet made it ridiculously easy to share information.  In 1985 the UK government tried to ban publication of the book Spycatcher.  The book continued to be published in America (and even in Scotland) and so the UK government found itself unable to prevent the information in the book becoming widely known.  More recently the internet has been used to mercilessly break UK court injunctions, most notably concerning the behaviour of a top footballer (soccer player) with a previously clean-living image.  In that case, the campaign to tweet the news in violation of the injunction seemed to gain momentum as an expression of wilful resistance to information censorship.
 
Anyway, SOPA only makes the break up of the DNS more likely.  If, or when -- arguably this break up is already happening with Google now providing an alternative DNS -- it does break up it will be interesting to see which patterns emerge to replace it.

One model is that each country would continue to have its own registry and choose which sources of foreign information its service providers are allowed to provide access to, mandating use of the centrally approved database.  This model of central control will be attractive to some governments but looks impractical in countries where the genie is already out of the bottle.

Alternatively service providers may just take control themselves to compete with Google.  They already field requests for non-existant sites and attempt to provide (sponsored) links to the sites you might have been looking for.  Your ISP could run the system much more like the telephone book/directory enquiries services of the pre-internet age.  Looked at from the point of view of a business, if you want to sell products to an ISPs customers you'll have to pay each and every one of them - this isn't far off what Google is already doing with search terms.  It also provides a way for cable TV operators (who are often also ISPs) to get back in on the act without resorting to 'packet-shaping' to penalise people that won't partner with them.  Developments along these lines make the current system of DNS look very cheap.

But perhaps names don't matter anymore?  The confusion between address and trademark has dogged the DNS anyway and QR codes are rapidly taking the place of the domain name on adverts.  One of the design criteria for URLs was that they could be written on the back of an envelope.  But with printing being so cheap and QR codes being small enough to print on the back of an envelope it doesn't matter what the URL looks like anymore.

Yet another alternative is the 'app' model.  Access to information is increasingly through mobile apps that link directly to information sources on the internet.  The URL is replaced in these models with an icon on the screen of your device.  Businesses will have to pay the store for their listing of course.

Unfortunately, these alternatives may address parts of the problem (such as trademark infringement) but sites still end up being resolved to a network location one way or another.  We know from experience that attempts to control information are not very effective.  If we ignore the cross-border politics behind the DNS we may just destroy the DNS and shift the fight to the network routing layer instead.  That would cause even more trouble.

2012-01-20

Explore QTI in depth

Explore QTI in depth: "Xatapult"

Interesting little article explaining the basics of the QTI v2 data model - I think this type of document provides a much better overview than the documentation that comes with the specification itself.

2012-01-09

Happy New Year from Apple

You may have heard about Apple's concerns over the batteries in the first generation iPod Nano's, that's the nano that Steve Jobs famously pulled from the little pocket in his jeans helping to explain the mystery of what the little pocket in jeans is actually for.  The result was a recall programme and I dutifully sent off my two first gen nanos for repair or replacement.

Well there was some discussion on Apple's forums about exactly what would be sent back by Apple given that the first generation is now over 5 years old.  The first people to get their returns reported that Apple had indeed found enough spare units to ship identical replacements - but they seemed to quickly run out and I was notified that both of mine were stuck in the repair programme waiting for replacement units.

I have to admit, I did like the design of my first generation nano.  I used mine consistently up to the point of the recall and, even though the battery was indeed failing to hold enough charge to be useful when on the move, it is hard to throw something like that away.  Perhaps Apple misjudged this level of attachment, lesser electronic items would surely have been discarded years ago and they may have had enough spare parts to go around.  (I predict good prices for 'new' first generation nanos on eBay.)

Anyway, Apple have dragged me from the noughties into the new decade: this morning I got a knock on the door from the UPS man and took delivery of two fancy new 6th generation units as replacements.  I'm sorry to see the click wheel go as I preferred the tactile feedback it gave and I fear that if I put the new nano in the 'little pocket' of my jeans I'll need tweezers to get it out again.  But I'm not averse to shiny new things and I was beginning to look pretty stupid using an iPad as a portable MP3 player so...

Thanks Apple and Happy New Year!