Category Archives: US

http://dmoz.org/Regional/North_America/United_States/

Open government and Linked Data; now it's time to draft…

For the past few months, there have been a variety of calls for feedback and suggestions on how the US Government can move towards becoming more open and transparent, especially in terms of their dealings with citizens and also for disseminating information about their recent financial stimulus package.

As part of this, the National Dialogue forum was set up to solicit solutions for ways of monitoring the “expenditure and use of recovery funds”. Tim Berners-Lee wrote a proposal on how linked open data could provide semantically-rich, linkable and reusable data from Recovery.gov. I also blogged about this recently, detailing some ideas for how discussions by citizens on the various uses of expenditure (represented using SIOC and FOAF) could be linked together with financial grant information (in custom vocabularies).

More recently, the Open Government Initiative solicited ideas for a government that is “more transparent, participatory, and collaborative”, and the brainstorming and discussion phases have just ended. This process is now in its third phase, where the ideas proposed to solve various challenges are to be more formally drafted in a collaborative manner.

What is surprising about this is how few submissions and contributions have been put into this third and final phase (see graph below), especially considering that there is only one week for this to be completed. Some topics have zero submissions, e.g. “Data Transparency via Data.gov: Putting More Data Online”.

20090624b

This doesn’t mean that people aren’t still thinking about this. On Monday, Tim Berners-Lee published a personal draft document entitled “Putting Government Data Online“. But we need more contributions from the Linked Data community to the drafts during phase three of the Open Government Directive if we truly believe that this solution can make a difference.

For those who want to learn more about Linked Data, click on the image below to go to Tim Berners-Lee’s TED talk on Linked Data.

(I watched it again today, and added a little speech bubble to the image below to express my delight at seeing SIOC profiles on the Linked Open Data cloud slide.)

We also have a recently-established Linked Data Research Centre at DERI in NUI Galway.

20090624a

Reblog this post [with Zemanta]
Advertisements

Nova Spivack visits DERI, NUI Galway and talks about Twine: Radar Networks' semantic social software product in beta

20080325b.png In association with the IT Association of Galway, DERI recently invited Radar NetworksNova Spivack to speak at our research institute in the National University of Ireland, Galway (Nova also gave a keynote talk at BlogTalk 2008 in Cork).

Nova is CEO of one of the companies that is practically applying Semantic Web technologies to social software applications. Radar have a beta product called Twine which is a “knowledge networking” application that allows users to share, organise, and find information with people they trust. People create and join “twines” (community containers) around certain topics of interest, and items (documents, bookmarks, media files, etc., that can be commented on) are posted to these twines through a variety of methods. The seminar room was full of both “DERIzens” and members of Galway’s IT community for Nova’s talk on the Semantic Web and Twine (see his slides here), and after a lengthy question-and-answers session, this was followed by some presentations to Nova of ongoing research work in DERI.

20080325c.png I personally find Twine very interesting, and as well as using it to gather information about SIOC, I intend to use it to gather and publish personal interests that I think will be of interest to the public (once it leaves beta). As well as producing semantic data (just stick “?rdf” onto the end of any twine.com URL), Twine features some cool functionality that elevates it beyond the social bookmarking sites to which it has been compared, including an extensive choice of twineable item types, twined item customisation (“add detail”) and the “e-mail to a twine” feature, all of which I believe are extremely useful. (I have a few Twine invites left for readers of my blog; drop me an e-mail if you need one.)

There is also the community aspects of twines. I forsee that these twines will act as the “social objects” (see presentation by Jyri) that will draw you back to the service, in a much stronger manner than other social bookmarking sites currently do (due to Twine’s more viral nature, its stronger social networking functionality, better commenting, and a more identifiable “home” for these objects). Of course, having more public users will help, but from experience I know that it is a good idea to build on a core group of regular users (in Twine’s case, mainly techies) before increasing the user base too much.

It’s been an exciting few months in terms of announcements relating to commercial Semantic Web applications. As I mentioned recently in an interview with Rob Cawte for the web2.0japan.com blog, this is becoming obvious with the attention being given to startup companies in this space like Powerset, Metaweb (Freebase) and Radar Networks (Twine), and also since many big companies including Reuters (Calais API), Yahoo! (semantically-enhanced search) and Google (Social Graph API) have recently announced what they are doing with semantic data. There has been a lot of talk recently about the social graph (notably from Google’s Brad Fitzpatrick), which looks at how people are connected together (friends, colleagues, neighbours, etc.), and how such connections can be leveraged across websites. On the Semantic Web with vocabularies like FOAF, SIOC, etc., it is not just people who are connected together in some meaningful way, but documents, events, places, hobbies, pictures, you name it! And it is the commercial applications that exploit these connections that are now becoming interesting…

(Edit: Nova Spivack has blogged about his visit.)

Keynote speakers lined up for BlogTalk

I’m happy to announce that we have four interesting and varied keynote speakers lined up for the BlogTalk 2008 conference on social software in Cork this March.

  • Nova Spivack – Founder and CEO, Radar Networks
    Nova is the entrepreneur behind the Twine “knowledge networking” application, which allows users to share, organise, and find information with people they trust. He will talk about semantic social software for consumers.
  • Rashmi Sinha – Founder, Uzanto
    Rashmi led the team that produced SlideShare, a popular presentation-sharing service that some have described as “YouTube for PowerPoint”. She will talk about lessons learned from designing social software applications.
  • Salim Ismail – Head of Brickhouse, Yahoo!
    Salim is a successful investor and entrepreneur, with expertise in a variety of early-stage startups and Web 2.0 companies including Confabb and PubSub. He will talk about entrepreneurship and social media.
  • Final speaker has been selected but has yet to be 100% confirmed.

You can see further details and longer biographies of the keynote speakers at 2008.blogtalk.net/invitedspeakers. We will also have two invited panel sessions, the details of which will be announced shortly.

Videos of "Paddy's Valley" to VC pitches

Well done to the Paddy’s Valley Irish entrepreneurs who gave great pitches to venture capitalists in the Bay Area. And thanks to Damien for the videos. Here are the links.

There are also some “behind the scenes” vodcasts at paddyvalley.blip.tv and an associated social network at paddysvalley.ning.com.

Web 2.0 Expo Tokyo: Evan Williams, co-founder of Twitter – “In conversation with Tim O’Reilly”

The first talk of the day was a conversation between Tim O’Reilly and Evan Williams.

Evan started off by forming a company in his home state of Nebraska, then moved to work for O’Reilly Media for nine months but says he never liked working for other people. A little later on he formed Pyra, which after a year had Blogger as its main focus in 1999. They ran out of money in the dot com bust, had some dark times and he had to lay off a team of seven in 2000. He continued to keep it alive for another year and built it back up. Then Evan started talks with Google and sold Blogger to them in 2003, continuing to run Blogger at Google for two years. He eventually left Google anyway, says that it was partially because of his own personality (working for others), and also because within Google Blogger was a small fish in a big pond. Part of the reason for selling to Google in the first place was that they had respect for them, it was a good working environment, and also they would be providing a stable platform for Blogger to grow (eventually without Evan). But in the end, he felt that he’d be happier and more effective outside Google.

So he then went on to start Odeo at Obvious Corp. Because of timing and the fact that they got a lot of attention, they raised a lot of money very easily. He ran Odeo as it was for a year and a half. With Jack Dorsey at Odeo / Obvious, they began the Twitter project. Eventually Evan bought out his investors when he realised Odeo had possibly gotten it wrong as it just didn’t feel right in its current state.

Tim asked Evan what is Twitter and what Web 2.0 trends does it show off? Evan says its a simple service described by many as microblogging (a single Twitter message is called a tweet). That is, blogging based on very short updates with the focus on real-time information, “what are you doing?” Those who are interested in what someone is doing can receive updates on the Web or on their mobile. Some people call it “lifestreaming”, according to Tim. Others think it’s just lots of mundane, trivial stuff, e.g. “having toast for breakfast”. Why it’s interesting isn’t so much because the content is interesting but rather because you want to find out what someone is doing. Evan gave an example of when a colleague was pulling up dusty carpets in his house, he got a tweet from Evan saying “wine tasting in Napa”, so that its almost a vision of an “alternative now”. Through Twitter, you can know very minute things about someone’s life: what you’re thinking, that you’re tired, etc. Historically, we have only known that kind of information for a very few people that you are close to (or celebrities!).

The next question from Tim was how do you design a service that starts off as fun but becomes really useful? A lot of people’s first reaction in relation to Twitter is “why would I do that”. But then people try it and find lots of other uses. It’s much the same motivation (personal expression and social connection) as other applications like blogging, according to Evan. A lot of it comes from the first users of the application. As an example, Twitter didn’t have a system allowing people to comment, so the users invented one by using the @ sign and a username (e.g., @ev) to comment on other people’s tweets (and that convention has now spread to blog comments). People are using it for conversation in ways that weren’t expected. [Personal rant here, in that I find the Twitter comment tracking system to be quite poor. If I check my Twitter replies, and look at what someone has supposedly replied to, it’s inaccurate simply because there is no direct link between a microblog post and a reply. It seems to assume by default that the recipient’s “previous tweet by time” is what a tweet sender is referring to, even when they aren’t referring to anything at all but rather are just beginning a new thread of discussion with someone else using the @ convention.]

Tim said that the team did a lot for Twitter in terms of usability, by offering an API that enabled services like Twittervision. Evan said that their API has been suprisingly successful, and there are at least a dozen desktop applications, others that extract data and present it in different ways, various bots that post information to Twitter (URLs, news, weather, etc.), and more recently a timer application that will send a message at a certain time period in the future for reminders (e.g., via the SMS gateway). The key thing with the API is to build a simple service and make it reusable to other applications.

Right now, Twitter doesn’t have a business model: a luxury at this time, since money is plentiful. At some point, Tim said they may have to be acquired by someone who sees a model or feels that they need this feature as part of their offering. Evan said they are going to explore this very soon, but right now they are focussed on building value. A real-time communication network used by millions of people multiple times a day is very valuable, but there is quite a bit of commercial use of Twitter, e.g., Woot (the single special offer item per day site) have a lot of followers on Twitter. It may be in the future that “for this class of use, you have to pay, but for everyone else it’s free”.

20% of Twitter users are in Japan, but they haven’t internationalised the application apart from having double-byte support. Evan says they want to do more, but they are still a small team.

Tim then asked how important is it to have rapid application development for systems like Twitter (which is based on Ruby on Rails)? Most Google’s applicationss are in Java, C++ and Python, and Evan came out of Google wanting to use a lightweight framework for such development since there’s a lot of trial and error in creating Web 2.0 applications. With Rails, there are challenges to scaling, and since Twitter is one of the largest Rails applications, there are a lot of problems that have yet to be solved. Twitter’s developers talk to 37 Signals a lot (and to other developers in the Rails community); incidentally, one of Twitter’s developers has Rails commit privileges.

Tim says there’s a close tie between open source software and Web 2.0. Apparently, it took two weeks to build the first functional prototype of Twitter. There is a huge change in development practice related to Web 2.0. A key part of Web 2.0 is a willingness to fail, since people may not like certain things in a prototype version. One can’t commit everything to a single proposition, but on the flip side, sometimes you many need to persist (e.g., in the case of Blogger, if you believe in your creation and it seems that people like it).

So, that was it. It was an interesting talk, giving an insight into the experiences of a serial Web 2.0 entpreneur (of four, or was it five, companies). I didn’t learn anything new about Twitter itself or about what they hope to add to their service in the future (apart from the aformentioned commercial opportunities), but it’s great to have people like Evan who seem to have an intuitive grasp on what people find useful in Web 2.0 applications.

Brewster Kahle's (Internet Archive) ISWC talk on worldwide distributed knowledge

Universal access to all knowledge can be one of our greatest achievements.

The keynote speech at ISWC 2007 was given this morning by Brewster Kahle, co-founder of the Internet Archive and also of Alexa Internet. Brewster’s talk discussed the challenges in putting various types of media online, from books to video:

  • He started to talk about digitising books (1 book = 1 MB; the Library of Congress = 26 million books = 26 TB; with images, somewhat larger). At present, it costs about $30 to scan a book in the US. For 10 cents a page, books or microfilm can now be scanned at various centres around the States and put online. 250,000 books have been scanned in so far and are held in eight online collections. He also talked about making books available to people through the OPLC project. Still, most people like having printed books, so book mobiles for print-on-demand books are now coming. A book mobile charges just $1 to print and bind a short book.
  • Next up was audio, and Brewster discussed issues related to putting recorded sound works online. At best, there are two to three million discs that have been commercially distributed. The biggest issue with this is in relation to rights. Rock ‘n’ roll concerts are the most popular category of the Internet Archive audio files (with 40,000 concerts so far); for “unlimited storage, unlimited bandwidth, forever, for free”, the Internet Archive offers bands their hosting service if they waive any issues with rights. There are various cultural materials that do not work well in terms of record sales, but there are many people who are very interested in having these published online. Audio costs about $10 per disk (per hour) to digitise. The Internet Archive has 100,000 items in 100 collections.
  • Moving images or video was next. Most people think of Hollywood films in relation to video, but at most there are 150,000 to 200,000 video items that are designed for movie theatres, and half of these are Indian! Many are locked up in copyright, and are problematic. The Internet Archive has 1,000 of these (out of copyright or otherwise permitted). There are other types of materials that people want to see: thousands of archival films, advertisements, training films and government films, being downloaded in the millions. Brewster also put out a call to academics at the conference to put their lectures online in bulk at the Internet Archive. It costs $15 per video hour for digitisation services. Brewster estimates that there are 400 channels of “original” television channels (ignoring duplicate rebroadcasts). If you record a television channel for one year, it requires 10 TB, with a cost of $20,000 for that year. The Television Archive people at the Internet Archive have been recording 20 channels from around the world since 2000 (it’s currently about 1 PB in size) – that’s 1 million hours of TV – but not much has been made available just yet (apart from video from the week of 9/11). The Internet Archive currently has 55,000 videos in 100 collections,
  • Software was next. For example, a good archival source is old software that can be reused / replayed via virtual machines or emulators. Brewster came out against the Digital Millennium Copyright Act, which is “horrible for libraries” and for the publishing industry.
  • The Internet Archive is best known for archiving web pages. It started in 1996, by taking a snapshot of every accessible page on a website. It is now about 2 PB in size, with over 100 billion pages. Most people use this service to find their old materials again, since most people “don’t keep their own materials very well”. (Incidentally, Yahoo! came to the Internet Archive to get a 10-year-old version of their own homepage.)

Brewster then talked about preservation issues, i.e., how to keep the materials available. He referenced the famous library at Alexandria, Egypt which unfortunately is best known for burning. Libraries also tend to be burned by governments due to changes in policies and interests, so the computer world solution to this is backups. The Internet Archive in San Francisco has four employees and 1 PB of storage (including the power bill, bandwidth and people costs, their total costs are about $3,000,000 per year; 6 GB bandwidth is used per second; their storage hardware costs $700,000 for 1 PB). They have a backup of their book and web materials in Alexandria, and also store audio material at the European Archive in Amsterdam. Also, their Open Content Alliance initiative allows various people and organisations to come together to create joint collections for all to use.

Access was the next topic of his presentation. Search is making in-roads in terms of time-based search. One can see how words and their usage change over time (e.g., “marine life”). Semantic Web applications for access can help people to deal with the onslaught of information. There is a huge need to take large related subsets of the Internet Archive collections and to help them make sense for people. Great work has been done recently on wikis and search, but there is a need to “add something more to the mix” to bring structure to this project. To do this, Brewster reckons we need the ease of access and authoring from the wiki world, but also ways to incorporate the structure that we all know is in there, so that it can be flexible enough for people to add structure one item at a time or to have computers help with this task.

20071113b.jpg In the recent initiative “OpenLibrary.org“, the idea is to build one webpage for every book ever published (not just ones still for sale) to include content, metadata, reviews, etc. The relevant concepts in this project include: creating Semantic Web concepts for authors, works and entities; having wiki-editable data and templates; using a tuple-based database with history; making it all open source (both the data and the code, in Python). OpenLibrary.org has 10 million book records, with 250k in full text.

I really enjoyed this talk, and having been a fan of the Wayback Machine for many years, I think there could be an interesting link to the SIOC Project if we think in terms of archiving people’s conversations from the Web, mailing lists and discussion groups for reuse by us and the generations to come.