SemTech sessions related to data portability / IEEE Computing article on portable data

It’s been a busy few weeks for with announcements from many sides including Google (Friend Connect), Facebook (Connect) and MySpace (Data Availability). Next week, the Semantic Technologies Conference will be held in San Jose, California, and you can bet that discussions around the need for portable data will be scattered throughout.

  • On Monday, Stefan, Uldis and I will present a tutorial (which will also cover data portability aspects of ontologies such as SIOC and FOAF) entitled “The Future of Social Networks: The Need for Semantics“.
  • On Monday evening at 8 PM, there will be an informal meetup of some people in the Fairmont Hotel’s Lobby Lounge, so if you have an interest in data portability, feel free to join us.
  • On Tuesday at 7:15 AM, I will chair a “Data Portability Interest Group” meeting. Attendees will include Chris Saad, Daniela Barbosa, Henry Story, and yours truly.
  • Then on Tuesday afternoon at 2:00 PM, Jim Benedetto, Senior Vice President of Technology with MySpace will talk about “Data Availability at MySpace“.

Last month, IEEE Computing published an article by Karen Heyman entitled “The Move to Make Social Data Portable“. I was interviewed for the piece along with Michael Pick (social media expert), Duncan Riley (b5media), John McCrea (Plaxo), Craig Knoblock (ISI), Chris Saad (, Dave Treadwell (Microsoft), Kevin Marks (Google), Chris Kelly (Facebook), Marc Canter (Broadband Mechanics), and Bill Washburn (OpenID). Technology solutions mentioned included RSS, OpenID, OAuth, microformats, RDF, APML, SIOC and FOAF. Here are my original answers to Karen’s questions.

Why is data portability important to users and to companies?

I think for users there are three things of importance. Firstly, people are tired of constantly having to repeat their personal profile definitions across a range of social media and social networking sites. Secondly, having to search for your contacts – colleagues or friends – is becoming increasingly tiring as new sites appear. Thirdly, if you decide you want to change services from one platform to another, there are very few easy-to-use mechanisms for bringing your content items with you (photos, blog posts, whatever). But most importantly, users like to think that they have full control over their own data – that means having the freedom to bring their data with them if they choose to use it elsewhere.

Where do you see it going in the immediate future and long-term?

In the immediate future, I think that DataPortability will act as a rallying point for solutions to some common set of scenarios that we are encountering today: migrating social network profiles, locating friends, transporting content items. It can do this by building on the vast amount of work that has been done in related efforts such as the Semantic Web (projects like FOAF, SIOC, etc.), the microformats community, OpenID, RSS / Atom, OPML, etc. Rather than inventing new standards, they can build on existing published formats or “de-facto” standards that have emerged (standing on the shoulders of giants, as it were).

In what ways do you see the big players getting involved?

There is a growing feeling that the big players need to support the wishes of the users in this direction, as expressed by the “Bill of Rights for Users of the Social Web” published about 5 or 6 months ago (by Smarr, Canter, Scoble and Arrington) and I think companies are realising that providing mechanisms for data portability doesn’t just necessarily mean that users will leave your site en masse. By providing open methods to access data on sites, via APIs or query mechanisms or embedded markup (e.g., Facebook’s FBQL/FBML, the Twitter or Flickr APIs), the big players are allowing others to build new and interesting applications on top of their sites which encourage users to stay on board. And, the users also feel happy in knowing that they have access to their data if they need it, building loyality as opposed to anger against restrictive user data agreements. Lastly, these companies can open up avenues for an influx of new users who can easily bring their data over via data portability mechanisms.

Why, in your opinion, are they getting involved?

With all such initiatives, it is important to be aware of all implementation strategies and technical / policy blueprints when they are being drafted, so that opinions can be expressed at a formative stage (rather than finding something untenable if you joined later on). There is also a publicity opportunity for the big players, since those who do not support such an initiative may be seen as opposing the views of many users who want to have transparency when joining new services or sites.

Could you give a brief technical overview of how your ontology work will contribute to the effort, and how it is (or perhaps isn’t) different from the sort of ontology work that goes on when one tries to integrate databases?

Sure. SIOC, or Semantically-Interlinked Online Communities, is a Semantic Web ontology for representing rich data from social websites – enabling reuse and interoperability of this data. It has recently achieved significant adoption through its usage in a variety of commercial and open-source software applications (see here), and is commonly used in conjunction with the FOAF (Friend of a Friend) vocabulary for expressing personal profile and social networking information. The SIOC ontology has been published as a W3C Member Submission, submitted by 16 organisations.

The SIOC ontology started with concepts used to describe discussion forums such as blogs and forums. Users create Posts organised in Forums which are hosted on Sites. These concepts are now subclasses of higher-level concepts, used to describe many other types of social websites (e.g., photo and bookmark sharing services, collaborative workareas): data spaces (sioc:Space), containers (sioc:Container) and content items (sioc:Item). SIOC can be used to provide a representation of all content items created by a person (via their user accounts) on various social websites, and this can be nicely combined with the FOAF profile of that person who holds the associated user accounts.

One of the problems with combining social media data is in knowing what accounts the user holds on different social media sites so that one can access information about the content created by the user on each of these sites. A combination of the FOAF vocabulary and SIOC can be used to describe content created by a person across several different sites by including a list of their social media site accounts in personal FOAF profiles and using SIOC to express user-created content on these sites.

A sample DataPortability scenario involves using the YADIS communications protocol to discover an identity for a particular person, that then returns a YADIS/XRDS document indicating which identities that person prefers to use (e.g., referencing a OpenID account and associated FOAF profile), and what services those identities are held on. Then, the WRFS abstraction model can be used to find out what containers the returned identities hold on those services. SIOC is an ideal representation method for describing the content of those containers and the items, and the structure / connections therein.

For example, in this picture (the vocabulary terms are shown in dark grey: foaf:knows, sioc:User, etc.) Bob holds user accounts on various social websites (two shown for clarity, but there may be many more), and via those accounts he creates content items (usually within containers of some sort, e.g., in a bookmark folder, personal weblog, message board or image gallery) on those sites. He should be able to port not only his social graph (in this case, his connections to Alice and Carol), but also his personal containers / sets of content items and perhaps even associated comment replies.

But SIOC isn’t just for personal containers of data. Another issue for efforts such as the workgroup is whether methods can be used to port not just personal sets of data but communities of data. SIOC was initially intended to provide a way to describe the content from online communities, like mailing lists, message boards, etc. It was soon used for people’s blogs (since the post plus reply structure is very similar to community discussions; it’s just that the first poster is usually one person in blogs), and more recently for other personal sets of Web 2.0-type content items. But if someone runs a community site, and they decide that they want to port their group from one place to another, SIOC can be used to fully describe the structure (and content if combined with other vocabluaries) of most communities.

How this is all going to interoperate?

I would say that the less new APIs/new ontologies that are written or described, the better. A lot of work has been put into various related APIs and ontologies that can enable data portability, and the focus should be on showing how certain typical scenarios or use cases can be solved using a combination of pieces that we have already. For example, on this page, Josh Patterson shows how a combination of YADIS, XRDS, WRFS, SIOC and FOAF can be used to discover and collect one’s data from a variety of sources.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s