2007-01-30

Symbolic Tagging: Tags 2.0

Tagging has become extremely popular these days, and with good reason - people naturally catalog things under various categories mentally, and are able to recall them by any of those routes. So, it makes sense to use a multiple-tagging system rather than a singular system like plain categories or folders.

Combining tagging with social networking, as Del.icio.us does, is particularly effective - a set of aggregate tags that allow a community to classify data for use by the whole. However, the problem with this is that not everyone uses the same tags to mean the same things.

The words are just symbols; tags demand meaning. It makes sense, in a simple system, to consider words and their meaning to be one and the same; however, as the system expands, it needs to understand the relationships between words and meanings. Enter symbolic tagging. Rather than making the tags the words and the words the tags, separate the two - after all, they are in fact separate.

From an architecture perspective, this means we need two constructs: the symbology (words) and the semantics (tags), with a 1..*:1..* relationship between the two. One symbol can have multiple semantics, and one semantic can have multiple symbols. As an example, let's say two symbols, "foo" and "bar" both share a semantic X. When a user searches for "foo", they will see all records associated with semantic X; the same when they search for "bar". Alternately, assume the symbol "baz" is attached to two semantics, X and Y. When a user searches for "baz", they will see all records associated with semantic X, Y, or both.

How does the system learn which symbols reflect which semantics? The same way it determines which records match which tags - community input. Take, for example, the following process for developing and refining a symbolic-semantic map.

1. As new records are added, the user adding them tags those records as they normally would. Any time a user inputs a tag that isn't already in the map, a new symbol is created for the tag, and a new semantic is created for it as well. At initialization, the two share a 1:1 relationship.
2. The user may opt to go through a refinement process, either manually initiated, or initiated as an additional step in the new-record process. This refinement process prompts the user with a list of tags they have used, and for each tag T, lists other tags associated with records which are also associated with tag T. The user may mark zero or more of these related tags as being synonymous with the tag in question.
3. For each tag being marked synonymous, if that tag is associated with more than one semantic, the user may chose one or more semantic associations between the two tags. The semantics can be identified by the list of tags associated with them. If none of the semantic lines is appropriate, the user may choose "other" to create a new semantic and link it to the two tags being compared. If the tag in question is only associated with one semantic, that semantic is automatically used, avoiding the additional step.

This allows the system to continue its organic social self-construction, while greatly improving the quality of the tag browsing/searching system as a whole, and imparting exponentially more meaning on the dataset itself, which can be used in other areas of research - the data from one large-scale implementation could prove invaluable for semantic computing, computer linguistics, and social networking research and development.

So, Del.icio.us - are you up to the task?

No comments: