Category Archives: Korea

BlogTalk 2009 (6th International Social Software Conference) – Call for Proposals – September 1st and 2nd – Jeju, Korea


BlogTalk 2009
The 6th International Conf. on Social Software
September 1st and 2nd, 2009
Jeju Island, Korea


Following the international success of the last five BlogTalk events, the next BlogTalk – to be held in Jeju Island, Korea on September 1st and 2nd, 2009 – is continuing with its focus on social software, while remaining committed to the diverse cultures, practices and tools of our emerging networked society. The conference (which this year will be co-located with Lift Asia 09) is designed to maintain a sustainable dialog between developers, innovative academics and scholars who study social software and social media, practitioners and administrators in corporate and educational settings, and other general members of the social software and social media communities.

We invite you to submit a proposal for presentation at the BlogTalk 2009 conference. Possible areas include, but are not limited to:

  • Forms and consequences of emerging social software practices
  • Social software in enterprise and educational environments
  • The political impact of social software and social media
  • Applications, prototypes, concepts and standards

Participants and proposal categories

Due to the interdisciplinary nature of the conference, audiences will come from different fields of practice and will have different professional backgrounds. We strongly encourage proposals to bridge these cultural differences and to be understandable for all groups alike. Along those lines, we will offer three different submission categories:

  • Academic
  • Developer
  • Practitioner

For academics, BlogTalk is an ideal conference for presenting and exchanging research work from current and future social software projects at an international level. For developers, the conference is a great opportunity to fly ideas, visions and prototypes in front of a distinguished audience of peers, to discuss, to link-up and to learn (developers may choose to give a practical demonstration rather than a formal presentation if they so wish). For practitioners, this is a venue to discuss use cases for social software and social media, and to report on any results you may have with like-minded individuals.

Submitting your proposals

You must submit a one-page abstract of the work you intend to present for review purposes (not to exceed 600 words). Please upload your submission along with some personal information using the EasyChair conference area for BlogTalk 2009. You will receive a confirmation of the arrival of your submission immediately. The submission deadline is June 27th, 2009.

Following notification of acceptance, you will be invited to submit a short or long paper (four or eight pages respectively) for the conference proceedings. BlogTalk is a peer-reviewed conference.

Timeline and important dates

  • One-page abstract submission deadline: June 27th, 2009
  • Notification of acceptance or rejection: July 13th, 2009
  • Full paper submission deadline: August 27th, 2009

(Due to the tight schedule we expect that there will be no deadline extension. As with previous BlogTalk conferences, we will work hard to endow a fund for supporting travel costs. As soon as we review all of the papers we will be able to announce more details.)


Application Portability
Content Sharing
Data Acquisition
Data Mining
Data Portability
Digital Rights
Folksonomies and Tagging
Human Computer Interaction
Recommender Systems
RSS and Syndication
Semantic Web
Social Media
Social Networks
Social Software
Transparency and Openness
Trend Analysis
Trust and Reputation
Virtual Worlds
Web 2.0
Reblog this post [with Zemanta]

Talk by Barney Pell at ISWC 2007, CTO of Powerset

Barney Pell gave the opening talk of the day at ISWC this morning. Barney is former CEO, now CTO of natural language search company Powerset.

He talked about how natural language (NL) helps the Semantic Web (SW), especially both sides of the chicken-and-egg problem (the chicken AND the egg). On one side, annotations can be created from unstructured text, and ontologies can be generated, mapped and linked. On the other side, NL search can consume SW information, and can expose SW services in response to NL queries.

The goal of Powerset is to enable people to interact with information and services as naturally and effectively as possible, by combining NL and scalable search technology. Natural language search interprets the Web, indexes it, interprets queries, searches and matches.

Historically, search has matched query intents with document intents, and a change in the document model has driven the latest innovations. The first is proximity: there’s been a shift from documents being a “bag of keywords” to becoming a “vector of keywords”. The second is in relation to anchor text: adding off-page text to search is next.

Documents are loaded with linguistic structure that is mostly discarded and ignored (due to cost and complexity), but it has immense value. A document’s intent is actually encoded in this linguistic structure. Powerset’s semantic indexer extracts meaning from the linguistic structure, and Barney believes that they are just at the start of exciting times in this area.

Converging trends that are enabling this NL search are language technologies, lexical and ontological knowledge resources, Moore’s law, open-source software, and commodity computing.

Powerset integrates diverse resources, e.g. websites, newsfeeds, blogs, archives, metadata (“MetaSearch”), video, and podcasts. It can also do real-time queries to databases, where an NL query is converted into a database query. Barney maintains that results from databases drive further engagement.

He then gave some demos of Powerset. With the example “Sir Edward Heath died from pneumonia”, Barney showed how Powerset parses each sentence; extracts entities and semantic relationships, identifies and expands these to similar entities, relationships and abstractions; and indexes multiple facts for each sentence. He showed an interesting demonstration where multiple queries on the same topic to Powerset retrieve the same “facts”. The information on the various entities or relationships can come from multiple sources, e.g. information on Edward Heath or Deng Xiaoping is from Freebase and details on pneumonia comes from WordNet.

20071114a.png He gave an example of the search query “Who said something about WMDs?”. This is difficult to express using keyword search: to express that someone “said something” and that it is also about weapons of mass destruction. Barney also showed a parse for the famous wrestler / actor Hulk Hogan, with all the relations or “connections” to him (e.g., defeat) and the subjects or “things” that he is related to (e.g., André the Giant).

Powerset’s language technologies are the result of commercialising the XLE work from PARC, leveraging their “multidimensional, multilingual architecture produced from long-term research”. Some of their main challenges are in the areas of scalability, systems integration, incorporating various data and knowledge resources, and enriching the user experience.

He next talked about accelerating the SW ecosystem. Barney said that the wisdom of crowds can help to accelerate the Semantic Web. What starts as a broad platform gets deeper faster when it gets deployed at a large scale, realising a Semantic Web faster than expected. This drive comes from four types of people:

  • The first category is publishers, who upload their ontologies to get more traffic, and can get feedback to help with improving their content.
  • Users are the next group, as they will “play games” to create and improve resources, will provide feedback to get better search, and will create (lightweight, simple) ontologies for personalisation and organising their own groups.
  • There are also developers, who can package knowledge for specialised applications (e.g., for vertical search).
  • Finally, advertisers will want to create and upload ontologies to express all the things that should match their commercial offerings.

For the community, Powerset will provide various APIs and will give access to their technologies to build mashups and other applications. Powerset’s other community contributions are in the form of datasets, annotations, and open-source software.

Their commercial model is in relation to advertising (like most search engines) and licensing their technologies to other companies or search engines. Another related company (a friend of Barney’s) is [true Knowledge]™.

I’m still waiting for my Powerset Labs account to be approved; looking forward to getting in there and trying it out myself. Thanks to Barney for the great talk.

Brewster Kahle's (Internet Archive) ISWC talk on worldwide distributed knowledge

Universal access to all knowledge can be one of our greatest achievements.

The keynote speech at ISWC 2007 was given this morning by Brewster Kahle, co-founder of the Internet Archive and also of Alexa Internet. Brewster’s talk discussed the challenges in putting various types of media online, from books to video:

  • He started to talk about digitising books (1 book = 1 MB; the Library of Congress = 26 million books = 26 TB; with images, somewhat larger). At present, it costs about $30 to scan a book in the US. For 10 cents a page, books or microfilm can now be scanned at various centres around the States and put online. 250,000 books have been scanned in so far and are held in eight online collections. He also talked about making books available to people through the OPLC project. Still, most people like having printed books, so book mobiles for print-on-demand books are now coming. A book mobile charges just $1 to print and bind a short book.
  • Next up was audio, and Brewster discussed issues related to putting recorded sound works online. At best, there are two to three million discs that have been commercially distributed. The biggest issue with this is in relation to rights. Rock ‘n’ roll concerts are the most popular category of the Internet Archive audio files (with 40,000 concerts so far); for “unlimited storage, unlimited bandwidth, forever, for free”, the Internet Archive offers bands their hosting service if they waive any issues with rights. There are various cultural materials that do not work well in terms of record sales, but there are many people who are very interested in having these published online. Audio costs about $10 per disk (per hour) to digitise. The Internet Archive has 100,000 items in 100 collections.
  • Moving images or video was next. Most people think of Hollywood films in relation to video, but at most there are 150,000 to 200,000 video items that are designed for movie theatres, and half of these are Indian! Many are locked up in copyright, and are problematic. The Internet Archive has 1,000 of these (out of copyright or otherwise permitted). There are other types of materials that people want to see: thousands of archival films, advertisements, training films and government films, being downloaded in the millions. Brewster also put out a call to academics at the conference to put their lectures online in bulk at the Internet Archive. It costs $15 per video hour for digitisation services. Brewster estimates that there are 400 channels of “original” television channels (ignoring duplicate rebroadcasts). If you record a television channel for one year, it requires 10 TB, with a cost of $20,000 for that year. The Television Archive people at the Internet Archive have been recording 20 channels from around the world since 2000 (it’s currently about 1 PB in size) – that’s 1 million hours of TV – but not much has been made available just yet (apart from video from the week of 9/11). The Internet Archive currently has 55,000 videos in 100 collections,
  • Software was next. For example, a good archival source is old software that can be reused / replayed via virtual machines or emulators. Brewster came out against the Digital Millennium Copyright Act, which is “horrible for libraries” and for the publishing industry.
  • The Internet Archive is best known for archiving web pages. It started in 1996, by taking a snapshot of every accessible page on a website. It is now about 2 PB in size, with over 100 billion pages. Most people use this service to find their old materials again, since most people “don’t keep their own materials very well”. (Incidentally, Yahoo! came to the Internet Archive to get a 10-year-old version of their own homepage.)

Brewster then talked about preservation issues, i.e., how to keep the materials available. He referenced the famous library at Alexandria, Egypt which unfortunately is best known for burning. Libraries also tend to be burned by governments due to changes in policies and interests, so the computer world solution to this is backups. The Internet Archive in San Francisco has four employees and 1 PB of storage (including the power bill, bandwidth and people costs, their total costs are about $3,000,000 per year; 6 GB bandwidth is used per second; their storage hardware costs $700,000 for 1 PB). They have a backup of their book and web materials in Alexandria, and also store audio material at the European Archive in Amsterdam. Also, their Open Content Alliance initiative allows various people and organisations to come together to create joint collections for all to use.

Access was the next topic of his presentation. Search is making in-roads in terms of time-based search. One can see how words and their usage change over time (e.g., “marine life”). Semantic Web applications for access can help people to deal with the onslaught of information. There is a huge need to take large related subsets of the Internet Archive collections and to help them make sense for people. Great work has been done recently on wikis and search, but there is a need to “add something more to the mix” to bring structure to this project. To do this, Brewster reckons we need the ease of access and authoring from the wiki world, but also ways to incorporate the structure that we all know is in there, so that it can be flexible enough for people to add structure one item at a time or to have computers help with this task.

20071113b.jpg In the recent initiative ““, the idea is to build one webpage for every book ever published (not just ones still for sale) to include content, metadata, reviews, etc. The relevant concepts in this project include: creating Semantic Web concepts for authors, works and entities; having wiki-editable data and templates; using a tuple-based database with history; making it all open source (both the data and the code, in Python). has 10 million book records, with 250k in full text.

I really enjoyed this talk, and having been a fan of the Wayback Machine for many years, I think there could be an interesting link to the SIOC Project if we think in terms of archiving people’s conversations from the Web, mailing lists and discussion groups for reuse by us and the generations to come.

At the International Semantic Web Conference in Busan

I arrived in Busan on Sunday evening for the 6th International Semantic Web Conference in Busan, Korea. Busan is a great big place with three million people; very impressive as you drive into the city from the airport.

Yesterday, I chaired the 2nd International ExpertFinder Workshop (or FEWS, “Finding Experts on the Web with Semantics”), where we had six interesting and varied papers. The workshop had about 35 attendees, and this bodes very well for future events. We also had a meeting about the ExpertFinder initiative for FOAF afterwards. Thanks to the ISWC 2007 Metadata Chairs Tom and Knud, metadata from FEWS is available here.

From DERI, NUI Galway, both Tudor et al. (“SALT: Weaving the Claim Web”) and Andreas et al. (“YARS2: A Federated Repository for Querying Graph Structured Data from the Web”) have been nominated for the best student paper award. Best of luck to you! (With Hak-Lae et al., I also had a submission for the Semantic Web Challenge at the conference.)


Last night, members from DERI Galway and DERI Seoul had dinner at a famous local fish restaurant. Sebastian snapped some great pictures of our meal, and here’s a video of something wriggling that he and Andreas bravely ate…

Some more images from Busan: a lovely sea view from the Paradise Hotel and a city view from the other side, a Japanese-style Captain Kirk toilet, and maybe butter is good for your heart.