John O'Donohue laid to rest today in Fanore

20080112a.png John O’Donohue, poet and philosopher, is being laid to rest in my home parish of Fanore in the Burren, County Clare today. John passed away suddenly last week while on holiday in France. I remember John as a friendly, kind and intelligent man, and I was lucky enough to attend some of his inspirational sermons when he visited Fanore church during his time as a priest. He received his PhD from Tubingen in 1990, was a renowned expert on the philosopher Hegels, and he wrote many influential books on Celtic spirituality. John was certainly the most important ambassador for the the Burren and Connemara that I can think of, and he also spoke Irish as his native language while living in County Galway. I know that many others will join with me in sending sincere thoughts to his family and friends at this time.

You can also read this tribute to John on the Huffington Post, and view some recent articles about John from the Galway Advertiser (1, 2). KLCS-TV, a PBS station in Los Angeles, will feature a special tribute to John at 8:00 PM tonight on the show “Between the Lines”. There will be a public memorial service for John in Galway on February 2nd. More information on John is available from his official website and (in German) from You can also listen to interviews with John from NPR in 1999 and 2005.

Brewster Kahle's (Internet Archive) ISWC talk on worldwide distributed knowledge

Universal access to all knowledge can be one of our greatest achievements.

The keynote speech at ISWC 2007 was given this morning by Brewster Kahle, co-founder of the Internet Archive and also of Alexa Internet. Brewster’s talk discussed the challenges in putting various types of media online, from books to video:

  • He started to talk about digitising books (1 book = 1 MB; the Library of Congress = 26 million books = 26 TB; with images, somewhat larger). At present, it costs about $30 to scan a book in the US. For 10 cents a page, books or microfilm can now be scanned at various centres around the States and put online. 250,000 books have been scanned in so far and are held in eight online collections. He also talked about making books available to people through the OPLC project. Still, most people like having printed books, so book mobiles for print-on-demand books are now coming. A book mobile charges just $1 to print and bind a short book.
  • Next up was audio, and Brewster discussed issues related to putting recorded sound works online. At best, there are two to three million discs that have been commercially distributed. The biggest issue with this is in relation to rights. Rock ‘n’ roll concerts are the most popular category of the Internet Archive audio files (with 40,000 concerts so far); for “unlimited storage, unlimited bandwidth, forever, for free”, the Internet Archive offers bands their hosting service if they waive any issues with rights. There are various cultural materials that do not work well in terms of record sales, but there are many people who are very interested in having these published online. Audio costs about $10 per disk (per hour) to digitise. The Internet Archive has 100,000 items in 100 collections.
  • Moving images or video was next. Most people think of Hollywood films in relation to video, but at most there are 150,000 to 200,000 video items that are designed for movie theatres, and half of these are Indian! Many are locked up in copyright, and are problematic. The Internet Archive has 1,000 of these (out of copyright or otherwise permitted). There are other types of materials that people want to see: thousands of archival films, advertisements, training films and government films, being downloaded in the millions. Brewster also put out a call to academics at the conference to put their lectures online in bulk at the Internet Archive. It costs $15 per video hour for digitisation services. Brewster estimates that there are 400 channels of “original” television channels (ignoring duplicate rebroadcasts). If you record a television channel for one year, it requires 10 TB, with a cost of $20,000 for that year. The Television Archive people at the Internet Archive have been recording 20 channels from around the world since 2000 (it’s currently about 1 PB in size) – that’s 1 million hours of TV – but not much has been made available just yet (apart from video from the week of 9/11). The Internet Archive currently has 55,000 videos in 100 collections,
  • Software was next. For example, a good archival source is old software that can be reused / replayed via virtual machines or emulators. Brewster came out against the Digital Millennium Copyright Act, which is “horrible for libraries” and for the publishing industry.
  • The Internet Archive is best known for archiving web pages. It started in 1996, by taking a snapshot of every accessible page on a website. It is now about 2 PB in size, with over 100 billion pages. Most people use this service to find their old materials again, since most people “don’t keep their own materials very well”. (Incidentally, Yahoo! came to the Internet Archive to get a 10-year-old version of their own homepage.)

Brewster then talked about preservation issues, i.e., how to keep the materials available. He referenced the famous library at Alexandria, Egypt which unfortunately is best known for burning. Libraries also tend to be burned by governments due to changes in policies and interests, so the computer world solution to this is backups. The Internet Archive in San Francisco has four employees and 1 PB of storage (including the power bill, bandwidth and people costs, their total costs are about $3,000,000 per year; 6 GB bandwidth is used per second; their storage hardware costs $700,000 for 1 PB). They have a backup of their book and web materials in Alexandria, and also store audio material at the European Archive in Amsterdam. Also, their Open Content Alliance initiative allows various people and organisations to come together to create joint collections for all to use.

Access was the next topic of his presentation. Search is making in-roads in terms of time-based search. One can see how words and their usage change over time (e.g., “marine life”). Semantic Web applications for access can help people to deal with the onslaught of information. There is a huge need to take large related subsets of the Internet Archive collections and to help them make sense for people. Great work has been done recently on wikis and search, but there is a need to “add something more to the mix” to bring structure to this project. To do this, Brewster reckons we need the ease of access and authoring from the wiki world, but also ways to incorporate the structure that we all know is in there, so that it can be flexible enough for people to add structure one item at a time or to have computers help with this task.

20071113b.jpg In the recent initiative ““, the idea is to build one webpage for every book ever published (not just ones still for sale) to include content, metadata, reviews, etc. The relevant concepts in this project include: creating Semantic Web concepts for authors, works and entities; having wiki-editable data and templates; using a tuple-based database with history; making it all open source (both the data and the code, in Python). has 10 million book records, with 250k in full text.

I really enjoyed this talk, and having been a fan of the Wayback Machine for many years, I think there could be an interesting link to the SIOC Project if we think in terms of archiving people’s conversations from the Web, mailing lists and discussion groups for reuse by us and the generations to come.

Latest news from Patrick Tilley, author of "The Amtrak Wars"

Got a very interesting comment from Patrick Tilley, author of “The Amtrak Wars” amongst other things, which I am re-quoting below for a wider audience:

Just came across your website and wanted to thank you for your sympathetic review of The Amtrak Wars – and all the readers who posted comments.

You might call this a blast from the past. To end speculation, I am not dead and am still writing. True, this July I will turn 79 but the brain still works and, as long as there is breath in my body, Book Seven is still a runner – along with Eight and Nine.

The original scheme was for 12 books but to my great regret I let things silde. Life intervened…”Event’s, dear boy.” as MacMillan reportedly said to explain why things didn’t go as planned during his premiership.

I think three books will see me out – and make everybody happy.

Warmest regards to the fans of Amtrak everywhere – Patrick Tilley
(Him wot rote ’em all…)

Great to hear from you Patrick, and I’m looking forward to this, as I’m sure many others are! (For those of you who haven’t read “The Amtrak Wars”, I have an old review. This site was named “Cloudlands” in June 1997 after a location in Amtrak, and my online nickname “Cloud-Warrior” or “Cloud” for short comes from a character in the series.)

Read Tim Berners-Lee's "Weaving the Web"

Perhaps it was the fact that we are meeting Sir Tim Berners-Lee soon that at last prompted me to read “Weaving the Web” (combined with the fact that a plane journey is a good time for reading), but in any case I managed to read this book on my flight from Shannon to Boston on Tuesday.

It’s a wonderful story of how the visionary efforts of one and then a few like-minded souls can leverage many, many others towards an amazing vision in a relatively short period of time. The Web is still just a teenager, but it must be marked in dog years squared or something because it is so much more than I’m sure even Tim himself could have imagined it would be 17 years ago.

I am usually a fairly slow reader, but I just seemed to get into a flow reading this book and during that I marked out some parts of interest to today’s Web and some of the work that I am involved in. There’s also a lot of prescient stuff, for example, the browser / editor systems he describes are like today’s blogs, the annotation services are like, and the collaborative tools are our wikis. I’d like to quote / paraphrase some parts and comment briefly on them (maybe this will be interesting for you readers, maybe not). [Let me also say that while these are just some things that personally interest me, if they resonate with you I’d advise you to get the book and find out what other ideas may form…]

Connecting disparate things, like discussions and pages

“I had to show how this system could integrate very disparate things, so I provided an example of an Internet newsgroup message, and a page from my old Enquire program.”

“All [W3C] mail is instantly archived to the Web with a persistent URI.”

Years later and this is still so relevant. It’s an aim of SIOC to integrate data from Usenet newsgroups and mailing lists and of course other discussions with any other relevant pages on the Web. And some more related thoughts, if you can imagine full histories of communities of interest and their discussions being semantically described…

“When new people joined a group they would have the legacy of decisions and reasons available for inspection. When people left the group their work would already have been captured and integrated. As an exciting bonus, machine analysis of the web of knowledge could perhaps allow the participants to draw conclusions about management and organisation of their collective activity that they would not otherwise have elucidated.”

And by describing exactly who says what (expert finding), another problem is solved:

DOT ie by Alex French

I bought DOT ie by Alex French, Mercer Press last week. Was happy of course to see a page about with a nice screenshot from the main page of the site 🙂 (Frozen in time at the top of the latest posts list in the screenshot is rymus on the “Boards Beers Cork 4” thread in the “Cork City” forum.)

It’s a good little book, with chapters on “Getting Online”, “The Basics of Web Browsing and Email”, “Advanced Topics”, and “Protecting Yourself Online”. The part I found most useful was chapter 5, the “Web Directory”, which has sites of interest to newbies or experienced users of the Web in Ireland. Some of these I already knew, and some I now want to check out…