Saturday, November 04, 2006

OpenDocument and Microformats

Microformats allow you to embed chunks of semantic metadata into XHTML documents using simple extensions to the existing tags. Some simple examples are embedded event and people information using the hCalendar and hCard microformats. What's nice about this is that now machines can accurately parse and extract key entities out of web pages that embed microformat metadata. Microformats are really starting to take off now on the web, especially with the Web 2.0 sites. OpenDocument is an OASIS XML standard for storing and exchanging office documents (i.e. reports, spreadsheets, presentations, etc.). Given that most of the information being generated by organizations today are such office-type documents, if the OpenDocument format really takes off, it would be nice to be able to embed semantic metadata into these types of documents using Microformats. A lot of organizations are employing content-management type of solutions to be able to better manage and use their corporate information. A lot of these types of solutions employ sophisticated text mining components to extract key entities out of such documents so that they can be better categorized and searched against. Imagine how much more accurate such text mining could be if there were Microformat metadata embedded in the office documents. Not only would it improve the accuracy of the text mining, but it would open up all that corporate information to a whole new world of interesting things that could be done to it as we are seeing with information on the web.

, ,

No comments: