I am happy to announce that the judges for the boards.ie SIOC Data Competition are:
We had about sixty registrants and eight final submissions of very high quality. We will announce the winners in a few weeks time…
The Social Semantic Web
Open to the public, no attendance fee
The Social Web – social networking services, blogs and wikis – has captured the attention of millions of users as well as billions of dollars in investment and acquisition. As more social websites form around the connections between people and their objects of interest, more intuitive methods are needed for representing and navigating the content in these sites. Also, to better enable user access to multiple sites, interoperability among social websites is required. This talk will describe the semantic technologies that can be used to interconnect both people and objects on the Social Web.
John Breslin, BE (Electronics), PhD, MIET – www.johnbreslin.org
John Breslin is a lecturer at the Department of Electronic Engineering in the College of Engineering and Informatics at the National University of Ireland, Galway. He is also an associate researcher and leader of the Social Software Unit at the Digital Enterprise Research Institute (DERI) in NUI Galway, the world’s largest Semantic Web research institute. He is the founder of the SIOC project, which aims to interlink online community sites using semantic technologies, and which has been deployed in over 50 applications including Yahoo! SearchMonkey. The Irish Internet Association presented him with Net Visionary awards in 2005 and 2006 for the Irish community website boards.ie, which he co-founded in 2000.
For further information contact: Mark on 087 1251858 / firstname.lastname@example.org
or the Institution of Engineering and Technology Ireland Network.
I was interviewed last week by Colin Meek from Journalism.co.uk on the topic of “Web 3.0” and what it means for journalists… You can read the full article in two parts (1, 2). My original answers are part of an interview on their Insite blog. I also had the chance to talk about various DERI offerings in the Semantic Web area including SIOC, SWSE, Sindice, Semantic Radar, etc.
Colin also asked me about other readable data that is being crawled by Semantic Web search engines like Sindice, SWSE or Swoogle. These search engines can usually match keywords in any data that has been crawled or integrated into a semantic store, not just people. It could be from structured information about people, places, dates, library documents, blog items or topics, whatever. In fact, there is no limit to the types of things that can be indexed and searched – since RDF (an open data model that can be adapted to describe pretty much anything) is used as the data format. Anyone can reuse existing RDF vocabularies like SIOC to publish data, or they can publish data using their own custom vocabularies (e.g. to describe stamp collecting or Bollywood movie genres or whatever), or they can combine public and custom vocabularies (e.g. take FOAF and your own vocabulary about soccer to describe players and managers on a soccer team). Geotemporal information is particularly useful across a range of domains, and provides nice semantic linkages between things. For example, having geographic information and time information is useful for describing where people have been and when, for detailing historical events or TV shows, for timetabling and scheduling of events, etc., and for connecting all of these things together (“I’m travelling to Edinburgh next week: show me all the TV shows of relevance and any upcoming events I should be aware of according to my interests…”).
The keyword searches in the Sindice search engine allow you to find more information on where resources of interest are (searching for “john breslin” will point to all public pages that contain semantic information about yours truly). Sindice also has an API that can provide results in a resuable (semantic) format that can be leveraged by other applications. Alternately, SWSE (Semantic Web Search Engine) shows you semantic information about the object of interest (e.g. my phone number, my friends, etc.) which may be derived from multiple sources (this information on me comes from tens of sources consolidated together via unique identifiers for me or through what’s called “object consolidation”).
For me, this article highlighted the fact that the Semantic Web community needs to be very aware that one of the key features of the Social Web for journalists and for many others is the ability to find a lot of personal and sensitive information on people, and with the advent of “Web 3.0”, we need to realise that (“with great power comes great responsibility”) the availability of contextual and semantically-related information is going to become even more apparent, and people will talk about it in both positive and negative terms. Educating site owners about what semantic data they may be publishing (knowingly or unknowingly, even if it’s just RSS feeds) is needed, and developers should determine exactly what opt-in or opt-out mechanisms are required before implementing semantic solutions. Users also should be aware of the benefits and other potential uses of their semantic data.
I think now is the time to avert any scares, because in reality, the data that is on the Web or the Social Web can be used in new ways anyway, whether metadata is present or not (some facts can be derived). Google have recently implemented some discussion forum parsing algorithms to determine how many posts are on a thread, how many users posted on that thread and when the last post was made. You can see this in a search result I did for “irish pubs boards.ie” below. It’s not complete, and probably relies on identifying certain HTML structures for non-Google discussion sites, e.g. you can see two threads in the middle that don’t show details of the total posts or commenters. But it’s moving towards the SIOC vision of providing more metadata about discussions on the Web to help you in finding more relevant information – whether the site owners want to provide Semantic Web data or not!
Making data available semantically enables computers to help us do things we cannot easily do (or cannot do at all) right now, and this is what makes it so powerful. We also need to think more towards educating people about the benefits as well as how we can minimise any hazards. Is this a job for W3C SWEO? As my colleague James Cooley said: “I think scientists thought the benefits of GM food were so obvious that there was no case to make. Then you got Frankenstein Food and the game was up.”
For journalists interested in the Semantic Web, I’d recommend reading this paper entitled “SemWebbing the London Gazette” by Jeni Tennison and John Sheridan which describes how they have exposed information from their newspaper website using RDFa so that it becomes easy to re-use (slides here). You can also view some interesting slides by Colin Meek from a seminar he gave to journalists about the Social Web in Olso a few days ago. It’s in three parts (1, 2, 3). I’ve embedded the third part (on the Semantic Web) below…
Other posts referencing this article:
It’s time for another installment from the world of SIOC!
Previous SIOC-o-sphere articles:
…will be the 1st September. I sincerely apologise for the delay; due to technical difficulties (we needed a signup mechanism in place), my holidays during the first two weeks of August, and settling into the new job.
To enter, you should sign up for a user account at data.sioc-project.org; we will ring to confirm your details; then after your account is enabled, you will be able to access the data sets from the 1st September. We will also have an entry submission system available from that date (in case you make something really cool on the first day)! You can make as many submissions as you wish, but use of the data sets is restricted to the duration of the competition and during the demonstration period in November…
Please note that the start date for this competition has been delayed while we install a secure authentication mechanism for accessing the data sets
The Digital Enterprise Research Institute (DERI) at NUI Galway is running a unique competition from 1st August to 30th September 2008 in conjunction with boards.ie, Ireland’s largest discussion forum site. The competition is an open contest in which entrants can win over €4000 in Amazon.com vouchers by submitting an interesting creation based on a data set of discussion posts from boards.ie over the past ten years:
Read the rules and find out more information on the contest at:
The data set (approximately 9 million documents) has been represented in the Semantically-Interlinked Online Communities (SIOC) open data format developed by DERI, NUI Galway for expressing the information contained in social websites (forums, mailing lists, blogs, etc.). Entrants may create whatever they feel is interesting based on this data: it could be a novel web application that makes use of the data set, a report on analyses performed on the data, a tool that allows one to visualise or browse the semantic structure, or whatever else the imagination can come up with!
The data reflects ten years of Irish online life, collected between 1998 and 2008 from boards.ie. boards.ie is one of Ireland’s busiest websites, with over a million unique visitors a month. The most popular discussion areas are ‘after hours’, soccer, motors, poker, and computers. Popular topic threads include one about a virtual pub (over 4000 pages), member discussions (2800 pages), poker stories (1800 pages), Liverpool rumours (1250 pages), recruitment in the Gardaí (800 pages long), and a freebie list (250 pages).
To enter the competition, go to data.sioc-project.org to access the data sets and view the guidelines. There will be three prizes for the top entries, as judged by an independent panel of three experts. The contest is open to anyone except current / former researchers with DERI and employees of boards.ie Ltd. One person may make multiple entry submissions. The closing date is the 30th September 2008.
The purpose of this contest is to generate interesting applications or creations that make use of community data represented in the SIOC Semantic Web format. All rights to these creations will remain with the contest participants (not including the underlying data, whose copyright remains with the creators). Neither DERI nor boards.ie Ltd. will acquire any commercial rights to these applications or creations as submitted through this contest. Up until now, this data has been publicly viewable, but it was difficult to leverage it without any added semantics due to the fact that it was embedded in heavily-styled HTML pages.
[DERI is a Centre for Science, Engineering and Technology (CSET) established at NUI Galway in 2003 with funding from Science Foundation Ireland (SFI). After five years of operation, DERI has become an internationally-recognised institute in Semantic Web research, education and technology transfer.]
Via Damien, I tried out Google Trends for boards.ie in comparison to the two main Irish newspaper sites.
Here are the stats for Ireland only from Google Trends (blue = boards.ie; red = ireland.com; yellow = independent.ie):
Here are the stats for all regions from Google Trends:
Here is the worldwide graph from Compete:
And finally, here is the worldwide graph from Alexa:
There are a lot of variations! Although ComScore also do rankings, I am not sure is the service publicly available, and Quantcast’s analysis seems more US-focussed. While some people are not so sure about the figures (1, 2), it is an indicator of sorts – even if it’s just to see if you are in the same league…
From Vexorg: “1,006,314 unique visitors for the month to yesterday”. Yay!