Category Archives: Data Portability

BlogTalk 2009 (6th International Social Software Conference) – Call for Proposals – September 1st and 2nd – Jeju, Korea


BlogTalk 2009
The 6th International Conf. on Social Software
September 1st and 2nd, 2009
Jeju Island, Korea


Following the international success of the last five BlogTalk events, the next BlogTalk – to be held in Jeju Island, Korea on September 1st and 2nd, 2009 – is continuing with its focus on social software, while remaining committed to the diverse cultures, practices and tools of our emerging networked society. The conference (which this year will be co-located with Lift Asia 09) is designed to maintain a sustainable dialog between developers, innovative academics and scholars who study social software and social media, practitioners and administrators in corporate and educational settings, and other general members of the social software and social media communities.

We invite you to submit a proposal for presentation at the BlogTalk 2009 conference. Possible areas include, but are not limited to:

  • Forms and consequences of emerging social software practices
  • Social software in enterprise and educational environments
  • The political impact of social software and social media
  • Applications, prototypes, concepts and standards

Participants and proposal categories

Due to the interdisciplinary nature of the conference, audiences will come from different fields of practice and will have different professional backgrounds. We strongly encourage proposals to bridge these cultural differences and to be understandable for all groups alike. Along those lines, we will offer three different submission categories:

  • Academic
  • Developer
  • Practitioner

For academics, BlogTalk is an ideal conference for presenting and exchanging research work from current and future social software projects at an international level. For developers, the conference is a great opportunity to fly ideas, visions and prototypes in front of a distinguished audience of peers, to discuss, to link-up and to learn (developers may choose to give a practical demonstration rather than a formal presentation if they so wish). For practitioners, this is a venue to discuss use cases for social software and social media, and to report on any results you may have with like-minded individuals.

Submitting your proposals

You must submit a one-page abstract of the work you intend to present for review purposes (not to exceed 600 words). Please upload your submission along with some personal information using the EasyChair conference area for BlogTalk 2009. You will receive a confirmation of the arrival of your submission immediately. The submission deadline is June 27th, 2009.

Following notification of acceptance, you will be invited to submit a short or long paper (four or eight pages respectively) for the conference proceedings. BlogTalk is a peer-reviewed conference.

Timeline and important dates

  • One-page abstract submission deadline: June 27th, 2009
  • Notification of acceptance or rejection: July 13th, 2009
  • Full paper submission deadline: August 27th, 2009

(Due to the tight schedule we expect that there will be no deadline extension. As with previous BlogTalk conferences, we will work hard to endow a fund for supporting travel costs. As soon as we review all of the papers we will be able to announce more details.)


Application Portability
Content Sharing
Data Acquisition
Data Mining
Data Portability
Digital Rights
Folksonomies and Tagging
Human Computer Interaction
Recommender Systems
RSS and Syndication
Semantic Web
Social Media
Social Networks
Social Software
Transparency and Openness
Trend Analysis
Trust and Reputation
Virtual Worlds
Web 2.0
Reblog this post [with Zemanta]

"The Social Semantic Web": now available to pre-order from Springer and Amazon

Our forthcoming book entitled “The Social Semantic Web”, to be published by Springer in Autumn 2009, is now available to pre-order from both Springer and Amazon.


An accompanying website for the book will be at

Prize winners visualise Irish online life in the SIOC Data Competition

The winners of the SIOC (pronounced “shock”) data competition being run by DERI at the National University of Ireland, Galway have been announced. The competition ran from September to October 2008, and the brief was to produce an interesting creation based on a data set of discussion posts reflecting ten years of Irish online life from, Ireland’s largest community website. The competition had about sixty registrants and there were eight final submissions of very high quality.

First prize

The top winning submission was entitled “SIOC.ME: A Real-Time Interactive Visualisation of Semantic Data within a 3-D Space”. The entry illustrates how 3-D visualisations may be harnessed to not only provide an interactive means of presenting or browsing data but also to create useful data analysis tools, especially for manipulating the “semantic” (meaningful) data from online communities and social networking sites. The entry was submitted by Darren Geraghty, a user interface and interaction designer, and it was praised by the judges for the huge amount of effort that went into creating it. A video of the application may be viewed here and a demonstration of the tool can be seen at

Second prize

In second place was a visualisation application called “boardsview” by Stephen Dolan of Trinity College Dublin. This is an interactive, real-time animation where one can watch the historical content from many discussion forums changing in real or compressed time. In this application, you can zoom into a particular forum to see individual users posting messages or to see threads being created and destroyed.

Third prize

Third prize was awarded to the “Forum Activity Graph” by Drew Perttula from California. This entry was a visualisation showing the popularity of forums on as represented by coloured rivers of information, which were then rendered and displayed using Google Maps.

Other final submissions included:

  • Forum Map Demonstration” by Tristan Webb and Ian Dickinson of HP Labs Bristol, a demonstration of self-organising maps applied to an information navigation problem in a big community site,
  • WebThere: Semantic APML Profiles” by Brian MacKay from Pennsylvania, a service for creating and maintaining profiles of user interests and attention preferences in social websites,
  • Find Something Interesting” by ITT Dublin’s Alexandra Roshchina and Aleksey Kharkov, an application to provide recommendations of the most interesting posts and threads based on interest-matching and graph-mining techniques,
  • ChartBoards” by Martin Harrigan of TCD, a tool for examining community trends via term frequencies, and,
  • Visualising the Community Culture with Charts” by Eoin McLoughlin of TCD, where various graph types were used to simplify the huge amount of available community data to something that could allow someone to easily grasp its size and depth.

The competition was judged by an independent panel of three experts: Ian Davis, Chief Technology Officer with Talis; Harry Halpin, researcher at the University of Edinburgh and chair of the W3C GRDDL working group; and Peter Mika, researcher at Yahoo! Research Barcelona and author of the book “Social Networks and the Semantic Web”. The first prize is an Amazon voucher for $4000; second prize is a voucher for $2000; third prize is a voucher for $1000.

"The distributed social web"

I read an interesting Gartner talk summary by Ross Dawson about the distributed social web, via another blog post by Chris Saad. Building blocks like OpenID, oAuth and microformats are mentioned in both posts, and I wanted to pipe up on behalf of the Semantic Web (if I may)…

A distributed social web is one of the ultimate goals of projects like FOAF and SIOC. Both FOAF and SIOC have recently been listed by Yahoo! SearchMonkey as recommended vocabularies (FOAF for personal profiles and social networks and SIOC for blogs, discussion forums and Q&A sites). Ross, if you like this topic, then you’ll probably love ideas like SMOB (Semantic Microblogging), where people can keep their microblog entries in their own space and then push them to as many Twitter-like aggregation services as they want. See my post on this here.

Also, here’s a slidedeck about SIOC for the uninitiated:

See also:

Tales from the SIOC-o-sphere #8

20080403a.png It’s time for another installment from the world of SIOC!

Previous SIOC-o-sphere articles:


If you wish to contribute to the next article, join the SIOC Twine and use the tag “siocosphere9” when you add items.

My week in California

I had a nice productive week in San Jose / San Francisco last week, where I attended the Semantic Technologies Conference 2008 (SemTech 2008) and some other nearby events. SemTech 2008 had a record attendance of over 1000 people, and it was great to meet up with old friends and new (some of whom I had often conversed with online but not in real life).

  • 20080528a.jpg Arriving on Sunday afternoon, Uldis, Stefan and I prepared for our SemTech 2008 tutorial. On Monday, we gave the tutorial entitled “The Future of Social Networks on the Internet: The Need for Semantics“, inspired by our IEEE Internet Computing article from last year. You can get the slides here. We talked about how a combination of FOAF and SIOC could be used to represent and interlink people and social objects within and across social websites. The tutorial was well received and we had some interesting questions afterwards…
  • On Tuesday morning, I chaired a late-breaking DataPortability interest group session, where I quizzed Chris Saad on the initiative and we had a good discussion with Daniela Barbosa, Danny Ayers, Ian Davis, Henry Story, Uldis and others. Afterwards, I attended the keynote talks by Nova Spivack and Eric Miller. You may already have seen my reports here and here respectively.
  • On Tuesday afternoon, I met with Sanjay Sabnani, CEO of CrowdGather and friend Chris. CrowdGather is a big network of medium to large message board sites that includes the huge General Mayhem community. (Disclaimer: I am on the CrowdGather Inc. board of advisors.) That evening, we met Ashely and went along to the SF Beta event (“The San Francisco Web 2.0 Mixer”), where I saw some interesting demos including Hitchsters (share taxi trips to the airport). After dinner, we had drinks with TouristR‘s Conor Wade, LeFora co-founder Vinnie Lauria and friend David. Unfortunately, I was pretty much “wiped” with jet lag by then.
  • 20080528c.jpg 20080528b.jpg On Wednesday, I took it easy. From the lovely Hotel Kabuki in Japantown, I wandered up Fillmore to see what old breakfast haunt Galette had become (it’s now La Boulange). I skipped on to another breakfast favourite, Ella’s, and had something of a mammoth breakfast (yes, those three plates of food in the picture!) that kept me going for the day. After a spot in Kinokuniya, where I picked up the latest in the Alita: Last Order manga series, I walked on and drove over the Golden Gate Bridge, and then headed back south again for an evening spent with family in the locality.
  • On Thursday, I attended some more SemTech 2008 talks in the morning including Steven Forth et al. from Monitor presenting about Team Learning on Semantic Mediawiki and also part of the FISHBOWL SemTech Reflections discussion session. In the afternoon, a team of us DERI researchers headed up to Radar Networks in San Francisco where we presented some of our work and brainstormed on things we could do together.

20080528d.jpg And I flew back on Friday, arriving back in Galway on Saturday. San Francisco is still a very special place to me, and I look forward to a proper family holiday there in the next year or three. Funnily enough, on Sunday I was driving behind a car with a California license plate on a Galway road – it was a long way from home!

Now, it’s catch-up time again. We’ve had a busy few weeks here in DERI what with our major funding review (which was held on-site a fortnight ago), so a lot of stuff went by the wayside (if I haven’t replied to you yet, please accept my apologies as I have a backlog of e-mail to get through and also my phone SIM card died this morning).

So what else is happening? I had an interview with Maryrose Lyons yesterday for the latest Brightspark Consulting newsletter, and today I’m correcting some exam papers that were put on a very long finger. I also got a copy of Jonathan Zittrain’s “The Future of the Internet – And How to Stop It” in the post which I’m looking forward to reading soon…

SemTech 2008: Nova Spivack (Radar Networks) – "Experience from the Cutting Edge of the Semantic Market"

Nova Spivack of Radar Networks gave a keynote talk at the 2008 Semantic Technologies Conference this morning.

He started off by giving some background to Twine. Twine is a service that lets you share what you know. When Nova pitched the original idea for the underlying platform to VCs in 2003, he was told that it was a technology in search of a problem. Thanks to DARPA and SRI, Nova had carried out some research in this field for a few years. The intial proposal to VCs was to develop next-generation personal assistants based on the Semantic Web. After the initial knock back, Nova went out again to raise funding, and Paul Allen stepped in as the first outside angel with Vulcan Capital.

Radar started working on the first commercial version of the underlying platform and also began work on the Twine application. The platform underneath Twine is not something they’ve talked about much so far, and they will discuss it (not at this conference) in the Fall. Radar also want to allow non-Semantic Web savvy people to build applications that use the Semantic Web without doing any programming.

Twine was announced last October at the Web 2.0 Summit. They began the invite-only beta soon after that. The focus of Twine is interests. It’s a different type of social network. Facebook is often used for managing your relationships, LinkedIn for your career, and Twine is for your interests. He called it “interest networking” as opposed to social networking.

With Twine, you can share knowledge, track interests with feeds, carry out information management in groups or communities, build or participate in communities around your interests, and collaborate with others. The key activities are organise, share and discover.

Twine allows you to find things that might be of interest to you based on what you are doing. The key “secret sauce” is that everything in Twine is generated from an ontology. The entire site – user interface elements, sidebar, navbar, buttons, etc. – come from an application ontology.

Similarly, the data is modelled on an ontology. Twine isn’t limited to these ontologies. Radar are beginning the process of bringing in other ontologies and using them in Twine. Later, they will allow people to make their own ontologies (e.g. to express domain specific stuff). In the long run, the community infrastructure will allow people to have a more extensible infrastructure.

Twine does natural language processing on text, mainly providing auto tagging with semantic capabilities. It has an underlying ontology with a million instances of thousands of concepts to generate these tags (right now, they are exposing just some of these). Radar are also looking at statistical analyses or clustering of related content, more of which we will see in the Fall (mainly, which people, items and interests are related to each other). For example, “here are bunch of things that are all about movies you like”. Twine uses machine learning to create these clusters.

Twine search also has semantic capabilities. You can filter bookmarks by the companies they are related to, or filter people by the places they are from. Underneath Twine, they have also done a lot of work on scaling.

Consumer prime-time launch of Twine is slated for the Fall. A good few bugs still have to be addressed, but Nova says there has been a “wonderful flowering of participation and friendships” in Twine. Many networks of like-minded people with common interests are being formed, and it is very interesting to see this take place. Nova himself has 500 contacts in Twine, and just 300 in Facebook. He now uses it as his main news source. David Lewis (the top Twiner) has 1000+ contacts in Twine. David Lewis (also at the conference) has nearly 1500 contacts in Twine.

Twine wants to bring semantics to the masses, and is not just aiming at Semantic Web researchers: it has to be mainstream. The main common thread in feedback received is that the interface needs to be simplified more. (Nova says he shaved his head as part of this new simpler interface :-)) Someone who knows nothing about structured data or auto tagging should be able to figure out in a few minutes or even seconds how to use it. It takes a few days at the moment to get a sense of the value, but Nova says it can be very addictive when you get into it.

Individuals are the first market, even if you are on your own and don’t have any friends 🙂 It is even more valuable if you are connected to other people, if you join groups, giving a richer network effect. The main value proposition is that you can keep track of things you like, people you know, and capturing knowledge you think is important.

Motley Fool recently talked about Google killers. Twine is not one, according to Nova, as it is not trying to index the entire Web. Twine is about the information that you think is important, not everything available. Twine also pulls in related things (e.g. from links in an e-mail), capturing information around the information that you bring in.

When groups start using Twine, collective intelligence starts to take place (by leveraging other people who are researching stuff, finding things, testing, commenting, etc.). It’s a type of communal knowledge base similar to other things like Wikia or Freebase. However, unlike many public communal sites, in Twine more than half of the data and activities are private (60%). Therefore privacy and permission control is very important, and it goes deep into the Twine data.

Initially Radar had their own triple store, an LGPL one from the CALO project. They found that it didn’t scale towards web-scale applications, and it didn’t have the levels of transaction control you’d need from an enterprise application. They decided to go for a SQL database (PostgreSQL) with WebDAV. However, relational databases weren’t optimised for the “shape” of data that they were putting into it, so it needed to be tweaked. They’ve had no performance issues so far, but they may move to a federated model next year. Twine uses an eight-element tuple store (subject-predicate-object, provenance, time stamp, confidence value, and other statistics about the triple or item itself). They can do predicate inferencing across statements, access control, etc. The platform is all written in Java, and Twine then sits on top of that.

Next he talked about the Twine beta status. There have been 20000 beta testers in last 30 days, 9000 twines created, 150000 items added, 60% of twines are private, and new features are being added every four weeks (in point releases). Some of the feature requests they’ve received include import capabilities, interoperability with other apps, and the ability to use other ontologies.

Twine will stay in invite beta for the summer. Soon, they will take off the password door to the public twines, so that they will all be visible to search engines. Radar will be SEO-ing the content automatically, so you will see more “walk-ins” after that happens. They will still be able to control who gets an account, but stuff will be publicly accessible.

In the Fall, Radar will open it so that anyone can open an account. You will be able to really customise Twine, to author and develop rich semantic content. Nova says that Twine will then be a step beyond blogs and wikis when it happens (but he can’t say much about the new stuff for now).

Next, there were some questions.

Q: The first one was about privacy. What if you add something and then later you decide that you want to delete it – is it really deleted or does Twine keep it around?

A: Nova answered that currently, it is not really deleted, it goes into a non-visible triple. But they will be doing that (really deleting it) soon.

Q: What is the approach to interoperability with Twine? What other types of semantic applications will Twine work with?

A: Today, Twine works with e-mail (in / out), RSS (get feeds out), and browsers (e.g. for bookmarking). There have been lots of requests for interoperability with mindmaps, various databases, enterprise applications, etc., so Radar are giving it a lot of thought. Twine has to provide APIs. They have a REST and a SPARQL API: they are not fully ready just yet, but by end of the year Twine will have a usable REST API. Unfortunately, Radar can’t handle the long tail of requests for features, there’s just too much, but an API will help people to make their own add-ons.

Then there’s the ontology level. You will be able to get the data about you or related to you out of Twine in RDF. You should also be able to get stuff out using other ontologies that are common, e.g. using FOAF, SIOC (yay!), or Dublin Core.

They are also looking at specific adaptors that they need to build. For example, this includes importers for, Digg, desktop bookmark files, Outlook contacts, and a bunch of others. They will be rolling out some of these in the Fall timeframe. Also, there may be a demand for Lotus Notes interoperability – or Exchange – possibly. Radar may actually look at other semantic applications like Freebase that they could interoperate with first. They have already hardcoded in some interoperability with Amazon for example.

Q: When Radar went to VCs and were turned down, was Twine part of the pitch? (For the second time around with Paul Allen, the questioner presumed that Nova did have it as part of the pitch.)

A: In 2003, Radar had a desktop-based semantic tool called “Personal Radar”. It was basically a Java-based P2P “Twine” using RDF. It had lots of eye candy and visualisations. The VCs said “semantic what?” and it was extremely hard to explain P2P, Semantic Web, RDF, and knowledge sharing to them. He said the VCs are mainly interested in when you are going to make money for them. But most of his pitch was blue sky, with no business plan, demonstrating a piece of technology, and pushing the fact that he knows people will need it. Paul Allen was more visionary, and he really believes adding structure to the Web is inevitable. He was willing to take a bet before they were in business. Then they went on to get Series A funding. The VCs said it was too early, but they eventually got it. Series B wasn’t as hard, and it fell into place in a matter of weeks, so it was a good round.

Even though there’s a lot of talk about the Semantic Web in the press and on the Web, most VCs are still figuring it out now and they are interested in making just one bet in the space. The main thing you need to avoid is being a platform without having any applications to show. It has to be compelling, where you can envisage users using them. Valley VCs are jaded about platforms.

Q: As one imports information from various places, what exactly is there in Twine that will prevent a person having to merge any duplicate objects?

A: Nova said there is limited duplication detection at the moment, but this will be improved in a few months. Most people submit similar bookmarks and it is reasonably straightforward to identify these, e.g. when the same item is arrived at through different paths on a website and has different URLs.

Q: Ivan Herman from the W3C asked if Radar were considering leveraging the linked open data community?

A: Nova said that DBpedia would be one of those main sources of data that they want to integrate with – the FOAF-scape, the SIOC-o-sphere, and DBpedia. Wikipedia URIs are already being used to identify tags, and this is something they will leverage.

Q: How can copyright be managed in Twine?

A: Nova said that it’s thanks to the Digital Millennium Copyright Act (DMCA). It provides a safe harbour if you cannot reasonably prevent against anything and everything being uploaded (and are unaware of it). Twine’s user agreement says please do not add other people’s copyright material. Fair use is okay, and if you share something copyrighted, it is better to have a blurb with a link to the main content. Therefore, Twine is using the same procedure as in other UGC sites.

Q: How are Radar going to make money?

A: Twine is focused on advertising as the first revenue stream. Twine has semantic profile of users and groups, so it can understand their interests very well. Twine will start to show sponsored content or ads in Twine based on these interests. If something is extremely relevant to your interests, then it is almost like content (even if it is sponsored). They will be pilot testing this advertising soon.

Q: Have Radar been approached by Google, Facebook, as the value proposition for Twine is very interesting?

A: Nova said they are not trying to compete with Facebook (right now!), but rather they are trying to find the magic formula that will work for Twine right now. Facebook has a lot of fluffy stuff: vampires, weird games, etc. Nova said he’d prefer to spin the bottle with a real person. Twine will focus on professional people who have a stronger need for a particular interest, doing things technically that are outside the scope of what they are doing at the moment.

Q: Why does Twine use tuple storage: why is it not using a quad?

A: Nova said it’s faster in their system, so for performance reasons they decided to avoid reification.

(I will also post my notes from Eric Miller’s keynote in the next day or three.)

SemTech sessions related to data portability / IEEE Computing article on portable data

It’s been a busy few weeks for with announcements from many sides including Google (Friend Connect), Facebook (Connect) and MySpace (Data Availability). Next week, the Semantic Technologies Conference will be held in San Jose, California, and you can bet that discussions around the need for portable data will be scattered throughout.

  • On Monday, Stefan, Uldis and I will present a tutorial (which will also cover data portability aspects of ontologies such as SIOC and FOAF) entitled “The Future of Social Networks: The Need for Semantics“.
  • On Monday evening at 8 PM, there will be an informal meetup of some people in the Fairmont Hotel’s Lobby Lounge, so if you have an interest in data portability, feel free to join us.
  • On Tuesday at 7:15 AM, I will chair a “Data Portability Interest Group” meeting. Attendees will include Chris Saad, Daniela Barbosa, Henry Story, and yours truly.
  • Then on Tuesday afternoon at 2:00 PM, Jim Benedetto, Senior Vice President of Technology with MySpace will talk about “Data Availability at MySpace“.

Last month, IEEE Computing published an article by Karen Heyman entitled “The Move to Make Social Data Portable“. I was interviewed for the piece along with Michael Pick (social media expert), Duncan Riley (b5media), John McCrea (Plaxo), Craig Knoblock (ISI), Chris Saad (, Dave Treadwell (Microsoft), Kevin Marks (Google), Chris Kelly (Facebook), Marc Canter (Broadband Mechanics), and Bill Washburn (OpenID). Technology solutions mentioned included RSS, OpenID, OAuth, microformats, RDF, APML, SIOC and FOAF. Here are my original answers to Karen’s questions.

Continue reading SemTech sessions related to data portability / IEEE Computing article on portable data

Prototype for distributed / decentralised microblogging using semantics

Download the paper and get the code.

Try out our anonymous client and server demos for SMOB.

Michael Arrington of TechCrunch wrote an interesting blog post on Monday about a “decentralised Twitter”, which was picked up by Dave Winer, Marc Canter and Chris Saad amongst others.

20080512a.png I’m happy to say that we have recently described and shown how this can work. Alex has been the driving force behind a paper that we (Alexandre Passant, Tuukka Hastrup, Uldis Bojars and I) have written for SFSW 2008, demonstrating (a prototype called SMOB for) distributed / decentralised microblogging:

Microblogging: A Semantic Web and Distributed Approach

The prototype uses FOAF and SIOC to model microbloggers, their properties, account and service information, and the microblog updates that users create. A multitude of publishing services can ping one or a set of aggregating servers as selected by each user, and it is important to note that users retain control of their own data through self hosting.

The aggregate view of microblogs use ARC2 for storage / querying and Exhibit for the user interface. Security and privacy are open issues, but can be addressed in some part by requiring OpenID authentication.

The SMOB prototype code (both the semantic microblogging publishing client and server-based web service) is available here. You can install your own client and post to our demo server (set up today by Tuukka) here. There are some pictures below of it in use:

Latest updates rendered in Exhibit

Map view of latest updates with Exhibit

Global architecture of distributed semantic microbloggging

Related posts: