What we're really doing is designating our data as structured or unstructured. Let's start with structured data, which is really data that is organized in a structure so that it is identifiable. The most universal form of structured data is a database like SQL or
By comparison, unstructured data has no identifiable structure. Unstructured data typically includes bitmap images/objects, text and other data types that are not part of a database. Most enterprise data today can actually be considered unstructured. An email is considered unstructured data. Even though the email messages themselves are organized in a database, such as Microsoft Exchange or Lotus Notes, the body of the message is really freeform text without any structure at all -- the data is considered raw. Documents are another example of unstructured data. Although a Word document has some formatting attached to it, the content of the document is completely free form.
The nature of some data types, such as spreadsheets, is still a matter of debate. The spreadsheet itself has some structure, but the data you put into each cell of a spreadsheet, like Excel, is not regulated by the application.
Listen to the Unstructured data FAQ audiocast.
Go to the beginning of the Unstructured Data FAQ Guide.
This was first published in March 2007