A Comparative Study of LibraryThing Folksonomies and Library of Congress Subject Headings

In view of bringing this blog back to life. I have decided to post the dissertation I submitted as part of my Masters Degree in Information Science.

Please use the link below to see the properly formatted document including images and raw data.

Hopefully this will be of some use to those interested in tagging and folksonomies as a means of information retrieval and lead to some lively debate on the merits of tagging.

https://www.dropbox.com/s/7gvxoeniw8w73pu/Dissertation%20Final.pdf

CITY UNIVERSITY LONDON

A comparative study of LibraryThing folksonomies and Library of Congress Subject Headings.

Toby Sugden

September 2012

Submitted in partial fulfillment of the requirements for the degree of MSc

In Information Science

Supervisor: Ayse Goker

Abstract

Purpose of Study: This dissertation forms a comparative study between Library of Congress Subject Headings and the folksonomy tag clouds on LibraryThing. It aims to discover similarities and differences between the tags contained in LibraryThing tag clouds and the Library of Congress Subject Headings. Where user applied tags do not match the controlled vocabulary subject heading a process of coding is used to discover the broad categories of
information being tagged. Attention is also given to problematic tags, i.e. those that hinder the
information retrieval effectiveness of user applied tags.

Methodology: A system of coding was used to identify three broad categories of tags in relation to Library of Congress Subject Headings: Match, Partial Match and No Match. Where the tags were not a match to the subject headings further coding was used to draw out some general categories of information that these tags represent.

Results: This study found that 7% of tags were an exact match to Library of Congress Subject Headings and 19% were partial matches. The remaining 74% of tags were not a match to a Library of Congress Subject Heading. It was also discovered that 19% of tags were user specific while 24% repeated information already represented in the tag cloud by another term.

Table of Contents
Abstract ………………………………………………………………………………………………………………………………… 2

Introduction ………………………………………………………………………………………………………………………….. 5

Scope and Definition ………………………………………………………………………………………………………………. 6

LibraryThing …………………………………………………………………………………………………………………… 6

Library of Congress Subject Headings ………………………………………………………………………………… 7

Aims and Objectives……………………………………………………………………………………………………………….. 8

Methodology…………………………………………………………………………………………………………………………. 9

Overview of Methodology …………………………………………………………………………………………………… 9

Sampling……………………………………………………………………………………………………………………….. 9

Data Collection…………………………………………………………………………………………………………….. 10

Coding System …………………………………………………………………………………………………………….. 10

Exact Match …………………………………………………………………………………………………………………. 10

Partial Match……………………………………………………………………………………………………………….. 11

No Match …………………………………………………………………………………………………………………….. 11

Statistics ……………………………………………………………………………………………………………………… 12

Literature Review …………………………………………………………………………………………………………………. 12

Definition of Folksonomy …………………………………………………………………………………………………… 12

Strengths of Tagging and Folksonomies ………………………………………………………………………………. 13

Problems Associated with Tagging and Folksonomies …………………………………………………………… 14

The Long Tail and Tagging Consensus………………………………………………………………………………….. 14

Visual Representation of Subject Terms ………………………………………………………………………………. 15

Comparative Studies With Controlled Vocabularies………………………………………………………………. 16

Folksonomies and Information Retrieval……………………………………………………………………………… 18

Results of Coding …………………………………………………………………………………………………………………. 19

General Observations ……………………………………………………………………………………………………….. 19

Number of Headings/Tags ……………………………………………………………………………………………… 19

Keywords……………………………………………………………………………………………………………………… 19

Tag Contents ……………………………………………………………………………………………………………………….. 20

Exact matches ………………………………………………………………………………………………………………….. 21

Partial Matches ………………………………………………………………………………………………………………… 22

No Match ………………………………………………………………………………………………………………………… 24

Subject Information ………………………………………………………………………………………………………. 25

Genre Information ………………………………………………………………………………………………………… 27

Format Information ………………………………………………………………………………………………………. 28

Reading Lists ………………………………………………………………………………………………………………… 28

Series Information…………………………………………………………………………………………………………. 28

Author Name………………………………………………………………………………………………………………… 28

Problematical Tags ……………………………………………………………………………………………………………….. 29

Personal Tags …………………………………………………………………………………………………………………… 29

Repeated Tags………………………………………………………………………………………………………………….. 29

Discussion of Results …………………………………………………………………………………………………………….. 30

Answers to Research Questions …………………………………………………………………………………………….. 32

Conclusion…………………………………………………………………………………………………………………………… 33

Works Cited…………………………………………………………………………………………………………………………. 35

Appendix A – Reflection………………………………………………………………………………………………………… 38

Appendix B – Proposal ………………………………………………………………………………………………………….. 39

Working Title……………………………………………………………………………………………………………………….. 39

Introduction ………………………………………………………………………………………………………………………… 39

Aims and Objectives……………………………………………………………………………………………………………… 39

Scope and Definition …………………………………………………………………………………………………………….. 40

LibraryThing …………………………………………………………………………………………………………………. 40

Library of Congress Subject Headings ………………………………………………………………………………. 41

Definitions ……………………………………………………………………………………………………………………. 41

Research Context/Literature Review …………………………………………………………………………………… 42

Methodology……………………………………………………………………………………………………………………….. 43

Sampling………………………………………………………………………………………………………………………. 43

Data Collection ……………………………………………………………………………………………………………… 44

Coding System………………………………………………………………………………………………………………. 44

Statistics ………………………………………………………………………………………………………………………. 44

Evaluation ……………………………………………………………………………………………………………………. 45

Work Plan………………………………………………………………………………………………………………………… 46

Resources ………………………………………………………………………………………………………………………… 47

Ethics and Confidentiality ………………………………………………………………………………………………….. 47

Select Bibliography……………………………………………………………………………………………………………. 47

Appendix C – Collected Tags ………………………………………………………………………………………………….. 49

Introduction
In 2003, the social bookmarking website, Delicious popularised a form of categorisation known as tagging. Users were free to add their own keywords to their bookmarks to aid not only their own retrieval of items but also to allow other users to browse items based on the user
assigned keywords. Although this was not the first implementation of tagging, Delicious helped set a trend of socially created keyword indexing which now features prominently on a range of websites including blogs, Amazon.com, LibraryThing and library OPAC’s.

Whereas indexing has previously been a complex task performed by highly skilled professionals, social tagging allows any user to create their own indexing terms free from the controls of a fixed vocabulary. With its increasingly widespread usage across the web, social tagging and folksonomies have become an important area of study for information professionals. On the one hand social tagging seems to offer a means of dealing with the huge amount of information that is produced on the web. Moreover social tagging may provide a means of circumnavigating the problem of imposing a top-down vocabulary upon users which may not adequately reflect their needs.

However tagging systems are not without problems. Despite the fact that many of the strengths of tagging originate from its lack of control this also invariably leads to problems. As we shall see, with no system to deal with synonyms and homonyms, sets of tags can frequently contain several terms with duplicate meaning while personal tags can also hinder the value of tagging systems to others.

This paper aims to contribute to this fast growing area of study. Through an analysis of the tags applied to items on the social cataloguing website LibraryThing, it will look at the relationship between the folksonomic tag clouds on LibraryThing and the controlled vocabulary terms of
the Library of Congress Subject Headings.

Scope and Definition
Given the time and resource constraints of a Masters dissertation it is necessary to apply a number of limitations to the scope of this paper. Firstly this will be primarily a quantitative study into the relationship between folksonomies and controlled vocabulary systems. The primary mode of research will be the production of statistical measures of user applied tags and an evaluation of the said statistics. The application of a coding system created by the researcher and the subsequent evaluation will introduce some qualitative elements into the study. However it is not within the remit of this dissertation to perform in depth qualitative investigations into precisely why users choose certain tags, although certainly qualitative studies through questionnaires and interviews of tagging behaviour would provide valuable areas for future study.

As the tags and subject headings will be manually collected by the researcher, the sample size of bibliographical items from which to draw statistics will be naturally limited. Even so, it is hoped that the results of this small scale study will provide grounds for a more comprehensive experiment with larger sets of computer collected data.

The scope of this dissertation will cover tags drawn from one website (LibraryThing) and subject headings drawn from one controlled vocabulary system (Library of Congress Subject Headings). The reasoning for these choices is as follows:

LibraryThing
LibraryThing is a social cataloguing site. A registered user is able to create their own catalogue of books with bibliographic information drawn from the Library of Congress, all five national Amazon sites, and more than 690 world libraries (About us, LibraryThing.com). Users are then able to edit and organise the catalogue information in their personal library in any way they wish. The primary social aspect of LibraryThing (and the subject of this dissertation) is the ability for users to ‘tag’ items in their library.

LibraryThing differentiates between a ‘work’ and a ‘book’. The book information page in a users library refers to their personal copy. Users can edit any of the bibliographic information on this record to suit their particular needs and can apply tags to books to create their own system of organisation. A LibraryThing ‘work’ is the public record of a particular book. A ‘work’ brings together all different copies of a book, regardless of edition, title variation, or language and combines all of the tags that LibraryThing community users have applied to the book in their personal libraries (Some LibraryThing Concepts, LibraryThing.com). Therefore although individual users may use completely individualistic means of tagging their own items, they are collectively adding to a socially created folksonomy.

Tags are displayed in the form of a tag cloud representing the top thirty tags added to that particular work with the text size reflecting the number of users who have applied that particular term. There is also the option to show all of the tags beyond the top thirty. It is here that you tend to find the highly personalised tags such as ‘my top shelf’ where only one person has used the term and it therefore hasn’t rank high enough to appear in the main tag cloud.

LibraryThing seemed the ideal choice for harvesting user tags for a number of reasons. Firstly
LibraryThing was founded in 2005 and has since then build up a sizeable user base. This means

that there should be a large number of tags available and each item should have tags applied by more than one user. Perhaps one of the most important features of LibraryThing for this particular study is that it consists of bibliographic information. This makes it easier to compare to controlled vocabulary systems which are almost exclusively aimed at classifying bibliographic information.

Library of Congress Subject Headings
The Library of Congress Subject Headings are a controlled vocabulary of subject headings created and maintained by the United States Library of Congress for bibliographic records. Library of Congress subject headings are hierarchical in nature with the listings providing broader and narrower terms to use as necessary.

There are several reasons for choosing the Library of Congress Subject Headings as a comparison to LibraryThing tags. Firstly it is one of the most widely used and comprehensive controlled vocabulary subject description systems making it the most authoritative comparison to the socially created tags on LibraryThing. Secondly, the Library of Congress Subject Headings are primarily focused on describing the subject content of bibliographic material. As the tags from LibraryThing are also describing bibliographic material it seemed valid that comparisons could be drawn between the two. Another important reason for choosing the LCSH is that the subject headings are freely available as part of the Library of Congress Catalogue.

Aims and Objectives

The overall aim of this dissertation is to compare and contrast the tagging practices of users of
LibraryThing to the controlled vocabulary Library of Congress Subject Headings. In particular it aims to meet the following research questions:
1. What degree of overlap is there between the user produced tags on LibraryThing and
the controlled vocabulary Library of Congress Subject Headings?

The degree of overlap will be revealed through the coding categories Exact Match and Partial Match. Exact Match tags will indicate an extremely high level of overlap between user applied tags and subject headings while partial matches will indicate that users are tagging the same broad concepts as the Library of Congress Subject Headings however are using different terminology or a different level of specificity.

2. Where user tags are not a match to a Library of Congress Subject Heading, what categories of information are users tagging?

This research question will be answered by the results of the No Match category of tags. Through the process of coding, tags that are not a match to a Library of Congress Subject Heading will be categorised in order to discover some broad types of information that are tagged.

3. What proportion of user applied tags could prove to be problematic for browsing /
information retrieval?

The coding of tags that are not a match to Library of Congress Subject headings will also identify tags that could prove to be problematical for browsing and information retrieval. Problematical tags will include personal tags which are only relevant to the users that applied them and tags which duplicate the information of other tags already in the tag cloud of an item.

Methodology

Overview of Methodology

This dissertation aims to use coding methodology in order to compare and contrast tagging practices and controlled vocabulary subject headings. It will divide the sample of tags extracted from LibraryThing into three main categories: Exact Match, Partial Match and No
Match depending on their relationship to the Library of Congress Subject Headings. The Partial Match and No Match categories will then be subdivided again through further coding to further facilitate comparison.

The decision was taken to make this primarily a quantitative study into tagging practices in the hope that it will discover some general trends that could prove to be usual areas for further qualitative investigation. Although primarily a quantitative investigation, the use of coding does introduce qualitative elements.

The decision to employ a coding system was influenced by the methodology used by Wetterstrom in his paper, The Complementarity of Tags and LCSH – A Tagging Experiment and Investigation into Added Value in a New Zealand Library Context. As in Wetterstrom’s investigation a coding methodology seemed to be the best way of giving meaning to large quantities of tags and facilitating comparison to the Library of Congress Subject Headings.

Sampling
The chosen sampling method is to use cluster sampling where each cluster is taken to be a particular genre (where genre is taken to mean a broad subject area such as classic fiction, crime, romance etc.). In order to determine the choices of genre, the researcher used the genre’s that form the faceted classification scheme employed by WH Smiths and Waterstones online bookshops.

An initial pilot study was performed to provide an indication of the number of tags which could be collected. The pilot study discovered that the process of collecting tags and then coding
each tag individually was very time consuming. As such it was determined that feasibly no
more than eleven books across six genres could be collected. It was hoped that the chosen genres (biography/autobiography, mystery/thriller, history, science fiction/ fantasy/ horror, classic fiction) would provide an interesting cross-section of books and may highlight any differences in tagging practices across genres. The reasoning behind this sampling procedure was that it would be valuable to compare tagging practices across multiple genres. One of the purported strengths of folksonomies is that they may better reflect the terminology of particular communities. By sampling different genres we may be able to see whether specific terms are used by a particular fanbase of that genre.

Random sampling did not seem a feasible or useful option as the random sample could select items that have tags entered by only a single user or could select items that are available on LibraryThing but not the Library of Congress Catalogue.

Only the top 30 tags will be examined for the purposes of this study. These represent the most commonly applied terms by multiple users and therefore form a folksonomy. The LibraryThing website displays the top 30 tags on the main page of each book in the form of a tag cloud providing a visual representation of the popularity of each term depending of the font size and weight used. The choice to limit the sample of tags to the top 30 was made on the basis that most users would not choose to use the option to view all tags and that the time taken to code the full range of tags would limit the number of books that be used for the purposes of this study. In some cases where less tags have been applied to a particular book, there may be some tags in the top 30 which have only been applied once by one user.

Examining the tags in the tag cloud also enables us to see whether folksonomies are able to effectively filter out irrelevant personal tags. This will prove useful when trying to answer the third research question.

Data Collection
A cluster sample of books from the LibraryThing website were selected by the researcher. The top 30 tags and the number of users who applied the term were copied from the LibraryThing page of each book into notepad++. A macro was then used to remove any extra formatting such as brackets and position each tag on a new line with the user count tag delimitated. This information was then pasted into a Microsoft Excel spreadsheet with the title of the book.

The Library of Congress subject headings were retrieved for each book from the Library of Congress online catalogue (http://catalog.loc.gov/), processed in the same way as the tags and pasted into a separate page in the spreadsheet.

Coding System
In order to compare the tag results of LibraryThing to the subject headings of the Library of Congress Subject Headings and to produce meaningful results as to what types of information users are tagging, it was necessary to develop a coding system.

The first stage of the coding process was to divide the tags into three broad categories borrowed from the work of Wetterstrom. The three categories were:

1. Exact Match

2. Partial Match

3. No Match

Exact Match
A tag was coded as an exact match if the tag term occurred precisely as a Library of Congress subject heading term. Where the Library of Congress subject headings contained subdivision, it was elected that a tag would still be an exact match if it was included in the subdivision. For example the tag ‘United States’ would be an exact match to the subject heading, Statesmen — United States –Biography. To distinguish tags that were a match to subdivisions they were coded “MATCH (SUB)”

Partial Match
A partial match was deemed to be a tag that was describing the same concept as a Library of Congress Subject Heading but used different terminology or a different level of specificity. An initial round of coding was performed whereby a tag that was thought to be a partial match was designated the code “2”. After the initial round of coding a second round of coding was performed where further codes were added creating new categories of partial matches where necessary. The codes that were developed are as follows:

Synonym: The tag uses a term with the same meaning as the Library of Congress Subject
Heading including subdivisions. A thesaurus was used to help designate a term as a synonym.

Broader Term: The tag is describing a broader concept that contains within it the Library of
Congress Subject Heading

Narrower Term: The tag provides a greater degree of specificity than the Library of Congress
Subject Heading.

Inversion of Person or Place Name: The tag contains a person or place name also in the subject headings however lacks the headings inversion or specificity. (This subdivision was added during the coding process when it was noticed that a large number of users were tagging people and place names in a different format to the Library of Congress Subject Headings.

No Match
These were tags which discussed concepts that were not covered in the Library of Congress Subject Headings. Tags that were not a match were initially coded as “3”. These were then coded further into the following categories:

1) Subject – These were tags which described the subject matter of a book that was not referenced in the Library of Congress Subject Headings.

2) Literary Genre – These were tags that attempted to place the book into a recognisable
literary genre such as “thriller”, “classic” or “mystery”.

3) Author name – Tags which contained the name of the author of the book.

4) Book format – This category included tags which described the physically properties of a book such as paperback or hardback. References to digital formats were also included such as ebook, ipad or audiobook. This category also included any tags relating to movie adaptations of the book.

5) Personal tags – These were tags which were user specific and therefore of no value to other
users such as “read”, “to read” or “mine”. This category also contained tags where the
meaning was unknown.

6) Reading lists – tags under this category was added due to the number of users applying tags
such as “1001 books”.

7) Series information – If a tag contained the name of a series which contained the book in
question then this was coded with “7”.

8) Awards- These were tags which referenced an award such as “Booker Prize” or “Nebula
Award”

Unlike the three main categories Match, Partial Match and No Match which were borrowed from Wetterstrom’s study, the subdivisions of the No Match categories were created during the process of coding and emerged naturally from the data. This was seen as preferable to creating prescribed categories and attempting to fit the data into them.

The use of a coding system will introduce a qualitative and potentially biased element to the study as the researcher will be creating his own categories based on a personal interpretation of the data. One of the potential problems in using coding systems is that the researcher imposes categories that distort the data rather than allowing the data to influence the categories (Pickard, 2007). Although this will be to some extent unavoidable, similar studies using coding systems have still managed produced useful information (Wetterstrom, 2008).

Statistics
Once the data was collected Microsoft Excel was used to produce statistics.

Literature Review

Definition of Folksonomy
Initial discussion around folksonomies began in 2004 when Thomas Vander Val coined the
term folksonomy in response to the increasing use of tagging on the World Wide Web through sites such as Flickr and Delicious. For Vander Val, a folksonomy is “the result of personal free tagging of information and objects (anything with a URL) for one’s own retrieval.” Furthermore he claims that:

“The value in this external tagging is derived from people using their own vocabulary and adding explicit meaning, which may come from inferred understanding of the information / object. “People are not so much categorizing, as providing a means to connect items (placing hooks) to provide their meaning in their own understanding.”

Vander Val’s comments have now been integrated into the current definition of a folksonomy as “a classification system derived from user-generated electronic tags or keywords that annotate and describe online content” (Dictionary.com).

Lux, Granitzer and Kern (2009) distinguish two types of folksonomy, namely broad folksonomies and narrow folksonomies. Narrow folksonomies are based on tags added by the owner or creator of a resource and are primarily designed for individual users to retrieve their own items. For example on Flickr the creator of an image can assign their own tagging terms to facilitate retrieval of their own images. As a narrow folksonomy is for the use of only a single person, tags do not have to make sense to anyone other than the creator of the terms. Broad folksonomies which will be the focus of this dissertation instead allow for multiple users to assign tags to a single item or resource. When multiple users apply the same term these are then aggregated allowing for a ranking of terms based on popularity. Tag clouds are often used

as a means of visually displaying the popularity of terms within a broad folksonomy with the most commonly tagged terms appearing larger and bolder.

Folksonomies are best defined as a categorisation rather than classification system. In traditional classification an entity is systematically assigned to one class within a system of mutually exclusive and non-overlapping classes. (Jacobs: 2004, Halpin et al, 2007). Categorisation in contrast is the process of “dividing the world of experience into groups or categories whose members share some perceptible similarity within a given context” (Jacobs,
2004). The fact that categorisation is contextual means that categories are not fixed as in classification but rather variable and flexible.

Another significant difference between classification and categorisation is that classification schemes are also based upon the principle of author intent. The librarian is intended to be neutral in the process of classification giving emphasis to what the author intended rather than their own views (Peterson, 2006).

Based on the definitions provided by Jacobs tagging almost certainly qualifies as a categorisation rather than a classification system.

Strengths of Tagging and Folksonomies

The application of controlled vocabulary systems is an expensive and time consuming task that can only be performed by well trained professionals. In recent years, increasingly large amounts of web content have been generated. This is particularly the case with the advent of Web 2.0 where nearly anyone can become a producer of web content through technologies such as blogs. It is simply not feasible that this vast amount of information can be classified using traditional controlled vocabulary methods. For this reason some library and information professionals argue that tagging and folksonomies may provide a possible solution (Quintarelli, 2005).

Firstly folksonomies have a low barrier of entry. Unlike controlled vocabularies, there are no complex rules and no need to consult schedules and thesuri. Furthermore their popularity on the web means that tagging would be a familiar practice to many users (Fichter, 2006). Therefore by incorporating tagging into existing library systems the inclusive nature of tagging systems can help library patrons feel more involved (Fichter, 2006, Spiteri, 2006). Trant (2006) echoes this idea of tags leading to user engagement. In his paper on the role of tagging and folksonomy in art museums, Trant claims that “Folksonomy… Offer ways for art museums to engage with their communities and to understand what users of on-line museum collections see as important” (p. 1).

The fact that folksonomies draw their terminology from its users means that they have the potential to better reflect both the information needs and vocabulary of users. It has been argued that the incorporation of tags into library catalogues would supplement controlled vocabularies (Spiteri, 2006). Mathes supports this, arguing that user tagging systems could be used to inform librarians about user preferred vocabulary and subject terms which could then be integrated into the more formal systems of a library catalogue (Mathes, 2004). Trant (2006) supports this, claiming that tagging could be used to bridge the semantic gap between controlled and user vocabularies. Melissa Adler (2009) has argued that tagging systems can

empower user communities by allowing them to name resources in their own terms. Her study looked at whether tags can better reflect the terminology of the transgender community.
Adler found that there was a language gap between the transgender community and the authorised subject headings from the Library of Congress with the most frequently applied user terms on LibraryThing being far less common or non-existant in the subject headings. Although she acknowledges that there are problems associated with tags on their own, Adler recommends that such systems could be integrated into existing library catalogues to enhance organisation, representation and retrieval of transgender themed material.

Bates and Rowley (2010) discovered that the LibraryThing folksonomy offered improved discoverability and representation of LGBTQ resources over Library of Congress Subject Headings. However they felt that because LibraryThing users were generally from the United States of America it was somewhat limited in its representation of ethnic minority resources. The concluded that folksonomies like controlled vocabulary systems are still prone to bias in world view and subject representation.

Problems Associated with Tagging and Folksonomies

Despite some of the potential strengths of tagging its uncontrolled nature introduces several inherent weaknesses. Firstly the lack of synonym and homonym control invariably leads to inconsistencies with redundant synonymous tags providing what Thomas, Caudle and Schmitz term “noise” and homonym tags potentially misdirecting users (Thomas, Caudle and Schmitz,
2009). Furthermore, without a controlled vocabulary resources can fall under multiple synonymous terms meaning that browsing by one tag does not retrieve all the relevant items. The lack of grammatical and vocabulary control also means that plural and singular forms, conjugated words and compound words may be used as well as user specific terms that have no use to anyone else (Guy and Tonkin, 2006).

A further problem with folksonomies is that folksonomy based systems do not show hierarchical links between terms. Chan argues that this hierarchy is an important element for narrowing down searches and finding related terms (Chan, 1986).

Thomas, Caudle and Schmitz explored some of the problems associated with tagging in their article, Trashy Tags: Problematic Tags in LibraryThing. They discovered that the lack of rules governing the way people tag leads to inconsistences that hinder the efficacy of such systems. Their analysis of harvested LibraryThing tags revealed that more than a third of the inconsistencies were the result of tag variations where multiple tags with the same meaning were used on a book. Other types of inconsistencies such as spelling mistakes and personal tags were less apparent but still contributed to the ‘messiness’ of folksonomies. They concluded that these problems could be addressed by offering users suggestions and recommendations of tags or by allowing users to edit or combine tags.

The Long Tail and Tagging Consensus

Mathes (2004) has shown that tag distribution in broad folksonomies follows a long tail power law where the most commonly used tags are highly visible and therefore influence future

taggers to apply these terms while the long tail is formed of a large number of tags used by very few people. The power curve effect of folksonomies shows that generally a large

number of people agree on the use of a small number of terms while a small number of people choose to apply less popular terms. Robu, Halpin and Shepherd’s study (2009) has shown that as a result of this power curve distribution of tags, with a sufficient number of tags, an implicit form of consensus is reached by the users of the system around the tags that best describe the resource. Golder and Huberman (2006) confirmed this and found that a degree of consensus could be found with as few as 100 tags.

Visual Representation of Subject Terms

As we can see from the image above, a major difference between the tags featured on LibraryThing and the subject headings on the Library of Congress online catalogue is the way in which they are visually represented. On LibraryThing, tags are presented in the form of a tag cloud which displays the terms most commonly used. The popularity of a term is represented visually in the tag cloud through the use of a progressively larger and bolder font. In contrast, the Library of Congress Subject Headings are un-weighted and laid out in a simple ordered list.

The visual display of tags within a tag cloud should not be dismissed as purely aesthetic as it actively plays an important role in the way in which tags are applied and used for information retrieval. James Sinclair and Michael Cardew-Hall’s study The Folksonomy Tag Cloud: When is it Useful conducted an experiment where participants were given the option of using a tag cloud or a traditional search interface to answer a series of questions (Sinclair and Cardew-Hall,

2007). The study found that while users preferred to use the search interface to answer questions that required specific information, users chose the tag cloud in response to more general information seeking tasks. The results highlighted that the tag cloud was especially suited to browsing as the tags generally conveyed broad topics that had a high level of recall.

The fact that the subject terms are visually weighted depending on the number of users who choose to apply the term highlights the social aspect of tagging and that this is a categorisation system that is firmly rooted in Web 2.0. By web 2.0 we mean a view of the web as primarily collaborative, with users not simply acting as passive recipients of information, but rather actively creating and interacting information on the web (O’Reilly, 2005). In Web 2.0
Architectures the authors use the comparative example of web directories such as Yahoo’s web directory and tagging systems to highlight the difference between Web 1.0 and Web 2.0 (Governor, Nickull, Hinchcliffe, 2009). Web directories with their hierarchical classification devised and maintained by and small editorial team represent Web 1.0 while the collaborative and user led nature of tagging and folksonomy represent Web 2.0.

The example used by Governor, Nickull and Hinchcliffe is equally applicable to the relationship between the Library of Congress Subject Headings and LibraryThing. As with Web 1.0, the Library of Congress Subject Headings are restrictive rather than inclusive, relying on the decisions made by a small set of professionals. Furthermore it is built upon the assumption that there are particular relevant headings to be applied and that the authors intent is of primary importance rather than the information needs and information seeking behaviour of its users (Peterson, 2008). In contrast the folksonomy represents a means of categorisation built by users, for users. In this sense it is inclusive and representative of both the users vocabulary and their information needs and behaviour. Furthermore as Heejin Park has described in her paper A Conceptual Framework to Study Folksonomic Interaction, a folksonomy provides users with a social network to share tags and items. When a user assigns
a tag to an item each tag serves as a link to additional resources tagged the same way by other users. This means that users are indirectly linking up with other users who are sharing the
same tags. Through this shared terminology a folksonomy offers the opportunity for users to
more easily discover others who have similar interests and to learn of their resources (Park,
2011). This highlights the complex social dynamic of folksonomic systems, something that cannot be replicated in traditional systems such as the Library of Congress Subject Headings.

Comparative Studies With Controlled Vocabularies

Several comparative studies between tagging and controlled vocabularies have been performed.

Nowick and Mering compared user search terms on the topic of water quality with three controlled vocabularies including the Library of Congress Subject Headings (Nowick & Mering,
2003). They found that only 30-40% of the user search terms were a match with a LCSH.

Trant (2006) looked at tags in an art museum context, comparing user supplied tags to cataloguer created documentation. The Metropolitan Museum of Art conducted a series of tests to see if untrained cataloguers using tags could provide useful description and access

points to items in their collection. Trant found that 77% of the user applied tags were valid and related to the item in question. The tagging results showed high levels of tag consistency with the 5 most common terms being used by an average of 34.8% of taggers and the 3 most common terms were used by an average of 41.8% of taggers. This high level of agreement between taggers supports the idea that a stable folksonomy could emerge from user applied tags.

Wetterstrom’s paper The Complementarity of Tags and LCSH – A Tagging Experiment and Investigation into Added Value in a New Zealand Library Context (2009), used a similar coding methodology to this study in order to investigate whether incorporation of folksonomic systems into library catalogues can add value to controlled vocabularies. Wetterstrom was particularly interested in whether tags can add value to controlled vocabularies to New Zealand library patrons by better reflecting regional terms. The study asked 20 volunteers to tag a selection of books. The tags were then coded in comparison to the Library of Congress Subject Headings for the same books based on whether they were a match, a partial match or did not appear in the subject headings. Wetterstrom discovered that 75% of tags were not a match to a Library of Congress Subject heading and therefore complemented library catalogues by introducing new terms. Wetterstrom also found that the majority of non-
matching tags were either the result of popular language which made up 21.63% or indicated a
different point of view through a related term (19.29%). In terms of specificity Wetterstrom found that 14.16% were broader than the Library of Congress Subject Headings while 19.62% were narrower.

Rolla (2009) similarly performed a comparative analysis of tagging and Library of Congress Subject Headings this time using tags drawn from LibraryThing. Rolla found that 75% of tags on LibraryThing were describing the same broad concept as a Library of Congress Subject Heading however they used terminology not found in the subject headings. Rolla found that specificity ranged from very broad subject terms to narrow terms which drew out very specific topics which were not necessarily representative of the overall subject matter of the book.

Alenka Sauperl (2010) also performed a comparative study between a controlled vocabulary and tags this time looking at the UDC (Universal Decimal Classification). Because of the wide scope of the UDC, tags were not limited to those applied to bibliographic resources. Instead tags were drawn from Amazon, LibraryThing, Delicious and 43 Things. Tags were collected from each website and then categorised through a process of content analysis. The study showed that 90% of tags in delicious and 91% of tags in 43 Things represented concepts that could be expressed through the UDC. In contrast only 79% of tags in LibraryThing and 63% of tags on Amazon were a match. Under closer examination the study discovered that names were the most common category of tag on LibraryThing followed by topic and genre which were also represented in the UDC. Categories of tag which were not to be found in the UDC constituted 24% and included awards, series information, edition, evaluation, experience action, occasion and purpose, availability and related work. In contrast to Wetterstrom, Sauperl argues that folksonomies offer little value to the description of bibliographic information as very few of the concepts found in tags on LibraryThing were a match to UDC terms.

Folksonomies and Information Retrieval

A number of authors have looked at the value of tags as a form of searching and information retrieval. Kipp and Campbell performed a qualitative study as to how useful a selection of LIS educated users found tagging as a means of searching for information on PubMed and CiteULike (Kipp & Campell, 2010). Users seemed to find the greatest value of tags was as a means of serendipitous browsing from one article to the next and to locate related articles rather than as a means of direct searching. This seems to match the results of Sinclair and Cardew-Hall’s study on folksonomy tag clouds. Marliese, Caudle and Schmitz looked at what they termed the ‘messiness’ of tags and how these characteristics could influence their value for search and retrieval in library catalogues (Marliese, Caudle & Schmitz, 2010). Their findings suggest that the biggest problem with user created tags is the high number of variations of tags that already existed in the system (36%). They suggested that a means of reducing the number of ‘messy tags’ may be for tagging systems to provide suggestions and recommendations to give guidelines for creating tags. Morrison performed a user evaluation study of the search retrieval effectiveness of folksonomies on the World Wide Web (Morrison,
2008). User relevance judgements suggested that Folksonomy search results overlapped with those from the other systems, and documents found by both search engines and folksonomies were significantly more likely to be judged relevant than those returned by any single IR
system type.

Results of Coding

Before delving into an analysis of the coding of the LibraryThing tags it may be useful to make some general observations as to the similarities and differences between the user applied tags of LibraryThing and the Library of Congress Subject Headings.

General Observations

Number of Headings/Tags

The first observable difference between the user applied tags on LibraryThing and the Library of Congress Subject Headings is that there are significantly more tags applied to a book than there are subject headings. All of the selected books for this study contained at least 27 or 30 tags. These were the numbers of tags visible as part of the ‘tag cloud’ on each items LibraryThing page. In actual fact most of the items contained many more tags than those analysed here however they were not applied by enough users to constitute the top level tags forming the tag cloud. In contrast to the LibraryThing tags, the maximum number of subject headings was 9 (The Great Gatsby and Portrait of a Lady) while eight books only had one subject heading applied (A History of Britain, Europe: A History, Fifty Shades of Grey, A History of Britain, Jerusalem: The Biography, Les Miserables, Stalingrad and Terry Jones’ Barbarians). The mean average number of subject headings applied to the books in this study was 4.These results would appear to match those of Rolla who found that the average number of LibraryThing tags across his sample of books was 42.78 while the average number of subject headings was 3.80.

The number of tags in comparison to subject headings suggests that tagging systems could offer users a greater number of subject access points to an item. In theory this could facilitate improved information retrieval through search and browsing. However this only holds true providing that all of the tags are relevant to the item in question. If a book contains a high number of personal tags which are only applicable to the users who assigned them then the number of subject access points decreases. As the results show 19% of the tags in this study were deemed to be personal. These ‘trashy’ tags as termed by Thomas, Caudle and Schmitz will be explored further in the evaluation of the personal tags coded in this study.

Keywords

LibraryThing allows for more than one word to be used for each tag. Similarly Library of Congress Subject Headings generally are made up of strings of pre-coordinated terms. This means that a single tag or a single subject heading can consist of multiple searchable keywords. In order to find out how many keywords were contained within a single tag or subject heading, the following formula was used:

=IF(LEN(TRIM(Cellreference))=0,0,LEN(TRIM(Cellreference))-LEN(SUBSTITUTE(Cell refence,” “,””))+1)

This formula counted words that were separated by a space or comma. As a result it counted stopwords such as “a”, “and” etc which are usually ignored in information retrieval. However it is hoped that this will still provide a rough figure for comparison with other studies.

The mean average number of keywords per tag across the total of 1965 tags collected was
1.32 while the mean average number of keywords per subject heading was 3.5 across the total of 279 subject headings collected. The average number of keywords per book was 39.2 while the average number of subject heading keywords per book was 14.6.

These figures differ somewhat from those of Rolla’s study. Rolla found that the average number of keywords per book was 45.42 while the average number of subject heading keywords per book was 9.99 (Rolla, 2009). There are several factors that could have influenced the discrepancy between the results of this study and that of Rolla’s. Firstly as already stated, the number of keywords in this study was calculated automatically using a spreadsheet
formula that would incorporate stopwords while Rolla did not include such terms in his calculation. However this would have led to a higher number of keywords counted in this study. A more likely explanation is the choice of books in each study. Rolla used books that were retrieved using the search terms “nonfiction”, “Africa”, “history”, “Mexico” and “immigration”. As a result Rolla’s sample would primarily consist of non-fiction works while this study has used a cross section of different genres with a large number of fiction books.

Tag Contents

As outlined in the methodology, through a process of coding three main categories of tags were created, namely Exact Matches, Partial Matches and finally tags that were not a match to a Library of Congress Subject Heading.

Exact matches

Total Number of Exact Matches 142
Percentage of total tags 7
Number of exact matches that were subdivisions 67
Percentage of exact matches that were subdivisions 47

Out of the 1965 tags collected, a total of 142 tags were considered to be a match to any of the terms contained in the Library of Congress Subject Headings. This constitutes 7.1% of the total number of tags. Of these, 67 (47%) were a match to a Library of Congress subdivision rather than the main subject term.

The low percentage of exact matches is hardly surprising for several reasons. Firstly the terminology and phrasing used in the Library of Congress Subject Headings is very formal in comparison to the terminology used in the LibraryThing tags. Secondly given that there is a significantly higher ratio of tags to subject headings, even if there was a tag match to all of the subject headings this would still only constitute a relatively low percentage (14%) of the total tags collected.

The fact that nearly 50% of the exact matches were to subdivisions shows that users are generally tagging at a broader level than the Library of Congress Subject Headings. Out of the tags that were a match to a Library of Congress subdivision, 47 were a match to a form subdivision such as “biography” or “fiction”, 11 were a match to the topical subdivision “history”, 7 were broad time period subdivisions such as “20th Century”, while the remaining were matches to geographical subdivisions. These subdivisions are broad terms that lack specificity in and of themselves and are only used by the Library of Congress as part of a subject heading (Chan, 1986). The low percentage of exact matches could also signal that the Library of Congress may not be using terminology that reflects the preferred vocabulary of the wider community.

Partial Matches

Tags were coded as a partial match if they were seen to convey the same or similar broad concept as a Library of Congress Subject Heading however expressed this using different terms to the controlled vocabulary of the Library of Congress. In comparison to the exact match criteria, the decision making process for coding tags as partial matches was primarily qualitative in nature.

This study found that 363 of the 1965 tags (19%) were partial matches. This figure is somewhat higher than the number of partial matches found by Wetterstrom (15%). An explanation for
the difference in partial matches between this study and Wetterstrom’s is the different
definitions of partial matches used. While Wetterstrom used the Library of Congress specifications to determine whether a tag was a partial match, this study in contrast has taken a more qualitative approach, defining a partial match as either:

(a) Synonym

(b) Broader concept

(c) Narrower concept

(d) Inversion of person or place name

The fact that 41% of the partial matches were synonyms of a Library of Congress Subject Heading shows that there is a language gap between the way in which LibraryThing users and Library of Congress cataloguers are describing the same concept. Many of these synonyms were the result of the somewhat formal language used by the Library of Congress controlled vocabulary which does not reflect the more common everyday language that features in tags. For example users preferred to use terms such as “comedy” and “funny” to the controlled vocabulary term “Humorous Stories”. Similarly users chose to use terms such as “coming of age” rather than the somewhat antiquated term “Bildungsromane” used by the Library of Congress. Examples such as these reveal the language gap between ordinary users and controlled vocabulary systems.

Broader synonyms were terms that were expressing the same general subject as a Library of
Congress Subject Heading however lack the specificity of the subject heading. For example the

tag cloud for a biography of Mary Queen of Scots featured tags such as “Elizabeth I” or simply “Elizabeth” while the related subject heading provided by the Library of Congress was “Great Britain — History — Elizabeth, 1558-1603.” Here we can see that LibraryThing users are tagging a core element of the subject heading however are lacking the specificity of the elements Great Britain, history and the time period. As in the case of Exact Matches broad partial matches tended to be synonyms of Library of Congress subdivisions.

The very low number of narrower concepts coded is generally not that surprising. As most of the Library of Congress Subject Headings are coded to a high level of specificity there is very little opportunity to express these terms any narrower.

Differences in the description of person and place names constituted 15% of the partial matches with no single user tagging an author or place name in the format used by the Library of Congress. In the subject headings, names of persons are written in an inverted form whereby the last name precedes the first such as “Dahl, Roald”. The use of inverted headings has its roots in the use of the card catalogue where the choice of word to be used as the entry element was highly important (Chan, 1986). On the web such inversions are no longer necessarily and as such when users tagged a whole name they tend to do so in a natural order such as “Benjamin Franklin”. As well as lacking the inverted form of the Library of Congress Subject Headings, many users also favoured abbreviated forms of names such as “Abraham” for Abraham Lincoln or “Fry” for Stephen Fry. This highlights the informal nature of tagging in comparison to traditional methods of categorisation. A similarly informal approach can be
seen in user tags of place names. In the Library of Congress subject headings ambiguous place names are qualified by placing the country in brackets such as “Oxford (England)” while user tags lack this degree of specificity. However there are problems associated with the informality of people and place names in tags namely that they introduce a level of ambiguity known as polysemy whereby a word can have multiple different meanings (Golder and Huberman. The tag “Abraham” for instance which was used on the biography of Abraham Lincoln was also
used on several unrelated biblical books. This means that a user clicking the tag to find similar items about Abraham Lincoln retrieves the set of results found in fig. 6:

As we can see only some of the retrieved books are about Abraham Lincoln. In contrast clicking the Library of Congress Subject Heading “Lincoln, Abraham, 1809-1865.” retrieves a set of results that are all directed related and relevant to the original book. Ambiguous place names produce similar problems. This shows that in an information retrieval context the ambiguity of user applied tags can lead to serious problems of relevancy which hampers the browsing and search value of such systems.

Partial matches provide a useful insight into tagging practice on LibraryThing and their relationship to formal subject headings. Firstly when taken alongside the figures for exact matches, partial matches show that there is a general level of agreement over the subject matter of a book between the contributors to a folksonomy and the cataloguers at the Library of Congress. If we combine the figures of Exact Matches and Partial Matches we can see that
26% of the tags collected were discussing the same or similar concepts as the Library of Congress Subject Headings. Furthermore if we look at the number of matches or partial matches per book we can see that all of the books sampled have at least one match / partial match to a Library of Congress Subject Heading. The range of matches/partial matches to subject headings per book varied greatly. The highest number was 22 partial or exact matches (The Time Travelers Wife) while three of the books only shared one subject term with the subject headings. The average number of matches/partial matches per book was 8. Therefore if we assume that the formal subject headings provided by the Library of Congress accurately describe the core concepts of a book then we can see even without cataloguing guidelines and a controlled vocabulary, the users on LibraryThing are still managing to draw out some of the core subject terms.

The higher incidence of partial matches than exact matches offers support to Wetterstrom’s assertion that folksonomies can add value to controlled vocabulary subject headings by broadening their vocabulary and ensuring that user preferred terms are provided to supplement the controlled vocabulary terms. Similarly Spiteri (2005) has claimed that user applied tags can be used to develop controlled vocabularies that match the language and preferred terms of users, particularly for small scale controlled vocabularies such as those found on intranets.

No Match

Total number of tags that were not a match to
LCSH 1460
Percentage of total tags 74

The results (Fig. 7) show that the overwhelming majority of tags (74%) were not a match to a Library of Congress Subject Heading. This figure seems to match Wetterstrom’s result of 75% of tags not being a match to a formal subject heading suggesting that coding of such tags was performed in line with other similar studies.

As the results show tags that were not a match to a Library of Congress Subject Heading were broken into seven categories. This section will provide an analysis of each category in turn apart from personal tags which will be dealt with separately.

Subject Information

The most commonly applied tags (33%) that were not a match to a Library of Congress Subject Heading were tags which were subject terms not included in the formal subject headings. According to Wetterstrom, these tags are potentially complimentary to Library of Congress Subject Headings in that they provide terms that users are interested in but are not currently being met.

The content and specificity of these tags varied greatly. Broad historical time periods such as “medieval history”, “European History” and “Roman Empire” frequently appeared. The imprecise nature of terms such as these mean that they would not usually feature in the Library of Congress Subject Headings. Their popularity within the LibraryThing tag clouds however may show that users find them a more useful and accessible means of categorising and browsing. This is understandable given that terms such as these are often favoured in situations such as book shops and may be more familiar to users on LibraryThing. Moreover broader terms facilitate personal cataloguing where a user may not want all of their books on history spread out across narrow time periods that may potentially cover only a single book.

As in Rolla’s study it was found that there was no clear consensus as to how to express historical time periods. Nearly all of the tags relating to time periods were either broad named periods such as “Medieval” and “Middle Ages” or a particular century such as “19th century” or “15th century”. There were only five tags in the sample of this study that offered more precise time periods and even these were limited to decades such as “1930s”. This could indicate that users don’t require such narrow levels of specificity.

Other broad terms included broad thematic references such as “politics”, “religion” and “war”. As with broad time periods these terms lack the required specificity on their own to be used Library of Congress Subject Headings. Instead they would be combined with other terms such as “United States –Politics and government –1775-1783.”

The large number of broad subject terms in the results of No Match tags may indicate that users prefer a higher degree recall of their own items instead of precision. For example users may prefer being able to retrieve all of their books on medieval history through one tag rather than using a larger number of more specific tags.

As well as broad tags, LibraryThing users also tagged items with narrow or specific terms. Library of Congress cataloguing guidelines generally advise applying subject headings that are appropriate to the overall content of the book (Chan, 1986, Rolla, 2009). On this basis the granularity of subject terms tends to be rather broad. To use a simplistic example an encyclopedia of tropical fish would not include subject headings to the level of specific species of fish but rather the general subject heading “Tropical fish — Encyclopedias.”. This is because the overall subject of the book is not any specific species but tropical fish in general. However the results of this study have found that LibraryThing users introduced tags that act more in line with index terms, noting specific topics mentioned that do not necessarily constitute the overall subject matter of the book. For example the Library of Congress Subject Headings for the book The Classical World: An epic History From Homer to Hadrian were:

Civilization, Classical. Civilization, Greco-Roman. Mediterranean Region –Civilization.
These subject headings summarise the overall subject content of the book. In contrast
LibraryThing users included terms such as “Sparta”, “Italy”, “Rome”, “Greece” and “philosophy”. While these subjects are probably mentioned within the text of the book and may well feature prominently, under Library of Congress rules it would not be correct to say that the subject of The Classical World is philosophy. Rolla discovered similar tagging practices in his investigation. For example books that were comprehensive histories of Africa were given broad subject headings appropriate to the overall content such as “Africa –History”. However LibraryThing users added terms for more specific subjects such as “slavery”, “colonialism” and “exploration”. The inclusion of these narrower terms is to some extent a positive feature of folksonomies. Tags such as these provide a greater insight into the content of a book and may help users locate books on these topics that would not be located through subject headings. However there are problems. A user has no way of knowing how much content within the
book actually matches the subject of the tag. In some cases it may be that a user applies a specific tag on the basis of a passing mention in a book that otherwise has little or nothing to do with that term. Users browsing via that tag may therefore retrieve items that have very
little relevance to their information need. To some extent the long tail feature noted by Golder and Huberman should filter out instances such as these appearing in the tag cloud of a book as it is unlikely that large numbers of users will tag a subject based on only passing mention.

Geographic information that was not included in the subject headings was also frequently tagged. In Bridget Jones’ Diary 130 people applied the tag “London”, the principle setting of

the book. Similarly “London” was applied to David Nicholls’ One Day. In many fiction books setting can play an integral part of the plot. Moreover users may wish to discover new books set in a particular country.

Users also chose to tag protagonists and characters in fiction that were not apparent in the Library of Congress Subject Headings. The subject headings analysed during this study tended to only include significant literary characters such as “Poirot, Hercule (Fictitious character)”. In contrast LibraryThing users tagged character names such as “Lisbeth Salander” from The Girl With the Dragon Tattoo and “Luc Martineau” from See Jane Score. The lack of character names for modern fiction suggest an inconsistency in the cataloguing practices of the Library of Congress and also a potential bias towards more comprehensive subject cataloguing for classical works of literature at the expense of modern fiction.

Thematic tags were less common than the broad geographic and time period tags however add a richness to subject information of a book provided by the tag cloud. Examples of thematic tags include “dystopia”, “daily life”, “social commentary”, “religion” and “feminism”.

Genre Information
After subject tags the second most popular category was genre information at 30%. Many of these genre tags categorised the genre of the book according to author nationality such as “British literature” or “American literature”. These tags do have potential to be useful for the discovery of new books. The tag “classic” and variants such as “classic literature” and classic fiction” were very frequently used for books. Books tagged as “classic” ranged from 19th Century and early 20th century works that would often be shelved under a classics section in a book shop to more modern science fiction works such as Dune and I Robot. The term classic does not occur in any of the Library of Congress Subject Headings probably because it is a subjective term that not only acts as a genre but also as mark of literary worth. Other genre terms that appeared very frequently were tags such as “thriller”, “crime”, “detective”, “science fiction” and “fantasy”. Again these are commonly used genre categories in bookshops and
their popularity could stem from their familiarity to LibraryThing users.

The books selected for the sample from the science fiction / fantasy / horror genre produced some interesting specialised genre tags. The Lord of the Rings and The Name of the Wind were tagged as “epic fantasy” while the terms “cyberpunk” and “biopunk” were used for Neuromancer, Snow Crash and The Windup Girl. These terms describe subgenres of science fiction that have distinct themes, settings and styles and a strong subcultural following (Wikipedia articles on cyberpunk and biopunk). One of the strengths of folksonomic systems is that they do not restrain users to predefined terms in the same way as controlled vocabulary systems and therefore allow subcultures such as biopunk, steampunk and cyberpunk to introduce their own terminology. Other books also introduced genre terms that emerge out of popular language. For example this study drew out tags such as “tearjerker”, “sports
romance”, “fluff” and “chick lit”.

Melissa Adler explored this idea of folksonomies giving subcultures and communities the means to name their own resources in their own terms in her journal article Transcending Library Catalogs: A Comparative Study of Controlled Terms in Library of Congress Subject Headings and User-Generated Tags in LibraryThing for Transgender Books (2009). Adler argues that controlled vocabulary systems are too slow to adapt to new emerging vocabularies.

Folksonomies however are democratic and allow for new terminologies from subcultures to develop (Adler, 2009). Adler’s comment highlights the grassroots bottom up approach of the folksonomy that is directed by its user base in comparison to the top down approach of formal systems.

Format Information

These were tags that described the physical format of the book such as “paperback”, “hardback” and “ebook” or “kindle”. Tags such as these seem to be primarily personal allowing users to keep track of their own books. However some tags coded as format can be seen to have a wider appeal to other users of LibraryThing. Tags such as “movie” could be useful for users looking to find books that have movie adaptations. Likewise tags such as “audiobook” and “audiable” provides a way of users sharing books that are available in this format. Book formats are not included as part of the Library of Congress Subject Headings as these are seen as bibliographic rather than subject matter.

Reading Lists

Tags coded as reading lists primarily consisted of terms such as “1001″, “1001 books”. Such tags could hold an exploratory value for other users as they help group books together that may otherwise hold no other similar subject matter. For example books tagged “1001″ or “1001 books” held no consistent Library of Congress Subject Headings.

Series Information

3% of tags were used to link books within a series together. In some cases these corresponded to the official series name such as “Kingkiller Chronicles” and “Millennium Trilogy” or were abbreviated variations such as “Kingkiller” or “Millennium”. To some extent these tags could help users locate other books within the same series. However the fact that there were variations in series terms may make these unreliable. Moreover LibraryThing also includes a series link as part of the bibliographic information for each book rendering these tags superfluous. The term “series” was used on 18 of the books collected. As this tag provides no indication to which particular series it is referencing it is unlikely to be particularly helpful.

Author Name

Tags which contained the authors name made up 4% of the No Match category. As with series information these could to some extent be useful for users looking to browse for works by a particular author. However again this information is provided as part of the bibliographic information.

Problematical Tags

The section will address the issue of problematical tags in folksonomies by looking at the user specific tags that were coded during the process of this study.

Personal Tags

240 or 19% of the tags that were not a match to a Library of Congress Subject Heading were coded as personal tags. Out of the total number of tags collected this amounted to 14%. These were tags that did not appear to have any relevance to any other user other than the
individual who assigned the tag. Examples of frequent personal tags were “read”, “unread”, “own”, “TBR”, and “favorite”. This seems to be in line with the results of Tonkin et al who found that 16% of tags on delicious were personal.

Personal tags are a problematical feature of folksonomies and highlight the tension between users applying tags for their own organisational needs and the overall folksonomic system that results from their tags. For an individual cataloguing their own books tags such as “read” and “unread” can be immensely useful allowing he or she to keep track of items that they have read. However the fact that these personal tags are still evident in tags clouds suggests that folksonomies do not always manage to filter out these irrelevant tags. All of the books sampled in this study apart from one (A History of Private Life) contained at least one personal tag. Furthermore even the books with very high numbers of users applying tags had several personal tags featuring in their tag cloud. For example the tag cloud Pride and Prejudice which had the highest number of user applied tags (21154) still contained three personal tags (“favorite”, “own” and “read”).

Repeated Tags
As well as coding personal tags, a separate column in the spreadsheet was used to mark repetitions of tags within the tag cloud. A similar approach to that taken in Thomas, Caudle and Schmitz’s study (2009) was used to identify repeated tags. These were tags that were more or less the same as another tag within the cloud apart from lexical differences (spelling variations, abbreviation, noun – adjective combinations) or were synonyms. Any repeated personal tags were not counted as it was felt that these were not useful for general information retrieval.

This study found that 478 tags were potentially a repetition of another tag already in existence in the tag cloud. This constituted 24% of the total number of tags collected. This figure is considerably lower than the 36% of variations found by Thomas, Caudle and Schmitz (2009) which may be a result of their sampling all of the tags applied to a book rather than the top 30 as sampled in this study.

The high level of repeated terms is a problematical issue of folksonomies. One of the often levelled criticisms of folksonomies is that without a system for grouping related terms together information recall will be low when browsing by a single tag. For example a user browsing

using the tag “children’s book” may not necessarily retrieve items tagged with “children”, “children’s”, “children’s fiction” or “children’s literature”. The user will therefore be missing out on potentially relevant books. The high degree of repetition of tags can in one sense be seen in a positive light. By incorporating a variety of terms for the same concept within the tag list of one item its potential for recall is increased. However this is still far from ideal as it is unlikely that each item will contain all of the various synonyms and grammatical variations within its
set of tags.

Discussion of Results
This section will look at the results of the coding experiment in light of the aims and objectives. The results of the tagging experiment were enlightening. On the whole the tag clouds analysed
during the coding procedure seemed to provide a great deal of useful subject information. The total number of Exact Matches and Partial Matches show that there is a significant degree of crossover between the users of LibraryThing and the controlled vocabulary Library of Congress Subject Headings however the higher number of partial matches to exact matches suggests that there is a difference between the terminology used by Library of Congress cataloguers
and the preferred terms of ordinary users. The tags that were coded as not a match to a
Library of Congress Subject Heading indicate that there is a significant difference in the tagging practices of ordinary users and professional cataloguers. Many of the tags that emerged from this category were terms contrary to many of the conventions of subject headings such as narrow terms that were not representative of the overall subject matter of the book or overly broad terms. While there are problems associated with tags such as these their high
proportion suggests that users find them useful.

Taking an overall look at the tagging practices of users on LibraryThing we can see several differences in approach from formal methods. First and foremost as the results of the tags that were not a match to a Library of Congress Subject Heading show, users are not trying to
‘catalogue’ items in the technical sense of the word. Subject Cataloguing is a process that aims to provide an overall picture of the main subject matter of the book, usually in line with what the author intended. From the results of this coding experiment we can see that tagging frequently draws out terms that would not be selected by a professional cataloguer. This was particularly apparent in the high number of terms that referred to narrow topics that were not representative of the overall subject of the book. While professional subject cataloguers may claim that such terms are incorrect, we must remember that tagging is a non-restrictive system where users are primarily interested in their own information needs and as such there can be no right or wrong tags. Furthermore the fact that these terms were popular enough to feature in a books tag cloud suggests that they are not irrelevant to the majority of LibraryThing Users.

Another noticeable aspect of tagging as opposed to subject headings that emerged was the stark difference between the pre-coordinated approach of the Library of Congress Subject Headings and the post-coordinated nature of tags. By this we mean that in the subject headings concepts are brought together into a single heading. This serves to add specificity and provides explicit meaning. For example the subject heading “Cromwell, Thomas, Earl of Essex, 1485-1540 –Fiction” makes explicit that we are (a) talking about Thomas Cromwell, Earl of Essex (as opposed to Oliver Cromwell for example) and that the book is a fictional account. This means that a user browsing for similar items should only retrieve fictional

accounts of the correct Cromwell. In contrast this study has shown that users tend to favour post-coordinate tagging where a tag describes a single broad subject. As already outlined the lack of specificity introduced through tagging can lead to a lack of relevancy when browsing using these tags.

One possible solution is to allow users to combine multiple tags together into quasi pre- coordinated terms when searching and browsing. For example a user may browse for books that have both the tags “Cromwell” and “fiction”. Although in this example the tag
combination still lacks the degree of specificity of the subject heading an approach such as this could offer a means of gaining greater relevancy when browsing via tags. To some extent this approach has been taken by LibraryThing and included into their Tag Mash feature which allows users to suggest combinations of tags which are then combined. Some examples of Tag Mashes are “alcohol, history”, “dogs, humor” and even “erotica, zombies” (LibraryThing blog post on TagMash). A feature such as this could potentially offer a happy medium between the rigidity of formal subject headings and the overly broad nature of singular tags and as the examples show offers the potential for some interesting combinations that would be unlikely to be considered by the Library of Congress. However the LibraryThing solution is somewhat limited by the fact that users have to submit potential tag combinations. A more ideal solution would allow on the fly combinations that fit the information needs of the searcher.

Tagging practices also differed in their inclusion of bibliographic rather than subject information. Some of the coding categories that emerged such as format information, series information, awards and author name represent information that would not find their way
into Library of Congress Subject Headings. However in total these formed 16% of the No Match
category of tags and therefore meet an information need of LibraryThing users. Although lists and awards only made up a very small percentage of the total tags it is possible to see how these could be extremely useful means of grouping together books that otherwise may have no other similarities. Tags such as these could be a valuable addition to existing library
catalogues especially for grouping together books for a book group for example. It is difficult to see what value author name tags and format information could provide to users other than
the person who applied the term and as such these terms could possibly be considered
problematical.

To some extent the results of this study seem to support that the assertion that folksonomies reach a state of consensus. As the tag cloud only showed the thirty most popular terms, the majority of personal tags were effectively hidden from view. If we look for example at the book The Great Gatsby, the tag cloud only contained 2 personal tags. However if expand the tag cloud to show all tags as in fig. 9 we can see that there is a huge number of personal tags that are hidden from view. These tags represent the “long tail” of the power curve.

The existence of problematical tags within the results of this experiment certainly do highlight some of the problems that have been raised about folksonomies. The results showed that 14% of the total number of tags collected were user specific. It is difficult to see a possible solution to this problem as one of the key features of tagging is that users can tailor their terms to meet their individual information needs. Any attempt to curtail personal tags would therefore severely damage the flexibility of the system.

The fact that 24% of the tags collected repeated information already existing in an items tag cloud was significant. This high figure highlights the lack of control of synonyms, homonyms and grammar that are inherent in tagging and folksonomies. While personal tags can be reasonably hidden from view, repeated terms seem to be a persistent problem and featured prominently in the tag clouds. This will have significant implications for the efficacy of folksonomies as a form of information retrieval. In the course of this particular study homonyms were less apparent than synonyms and tended to be restricted to tags of ambiguous last names. However where homonyms do occur they can equally cause problems when trying to retrieve information. Guy and Tonkin noted this problem and used the case of
the term “dumbledore” as used as the description of a bumblebee in Thomas Hardy’s poem An August Midnight. Without a controlled vocabulary to associate the term with bees or insects the tag becomes lost in amongst the far more popular use of the term to refer to Albus Dumbledore from the Harry Potter series (Guy and Tonkin, 2006).

There have been numerous suggestions as to how problematical tags may be best dealt with. Some of these centre around the idea of ‘tag literacy’ where folksonomies operate on a series of guidelines or rules to help users (Mejias, 2005). For example Mejias’ set of guidelines includes pointers such as “think of tags as personal, but also think of tags as social”, “use plurals to define categories” and “Think specific, but also think general”. Other suggestions have included providing automatic suggestions or recommendations to users or using an underlying thesaurus to link synonyms. Out of the suggestions available the most feasible solution to problematic tags is perhaps to offer an optional list of popular tags for item. This is the method currently employed on Delicious. For example when bookmarking Librarything.com, Delicious offers the recommended tags: “books”, “library”, “web2.0″, “catalog”, “social”, “tagging”, “tools”, “community”, “reading” and “reference”. This means that a user can choose from the terms that are already in place rather than adding unnecessary repeated terms such as “calaloguing”.

The difficulty with implementing possible solutions to problematical tags is that we run the risk of removing the features that make tagging so appealing namely its simplicity and flexibility.

Answers to Research Questions

What degree of overlap is there between the user produced tags on LibraryThing and the controlled vocabulary Library of Congress Subject Headings?

This study has found that 142 or 7% of tags are an exact match to a Library of Congress Subject Heading while 363 or 18% were partial matches. This meant that a combined total of 25% of tags were an overlap with the subject headings.

Where user tags are not a match to a Library of Congress Subject Heading, what categories of information are users tagging?

Through the coding of tags that were not a match to a Library of Congress Subject Heading, eight distinct categories of tags were discovered. These were subject information, genre information, author name, book format, personal tags, reading lists, series information and awards. Of these, subject information and literary genre were by far the most popularly tagged categories at 33% and 30% respectively. Tags in these categories seem to provide potentially useful information that was missing from the subject headings and could therefore support the suggestion of incorporating tags into library catalogues.

Although less commonly tagged, categories such as awards and reading lists also provided useful information not apparent in the subject headings and linked together often unrelated books.

What proportion of user applied tags could prove to be problematic for browsing / information retrieval?

Coding revealed that 240 or 19% of tags were user specific and therefore were not useful to other users of the folksonomy. Furthermore 478 of the collected tags or 24% were deemed by the researcher to repeat information that was already in the tag cloud of an item. Personal tags were to some degree minimised by the power law distribution of tags in folksonomies preventing too many from appearing in the top thirty tags. Repeated tags were shown to have a problematic effect on browsing and information retrieval.

Conclusion
The results of this dissertation seem to support existing comparative studies between tagging systems and controlled vocabularies. The results were generally in line with those of Wetterstrom and show that the majority of user applied tags to bibliographic information provide information beyond the subject terms provided by the Library of Congress Subject Headings. The results of this dissertation therefore likewise indicate that tagging can be complementary to controlled vocabulary systems and that library catalogues may benefit from the inclusion of terms which better reflect the vocabulary of users. It would be interesting to see further studies which evaluate user opinion of integrated tagging and controlled
vocabulary systems.

The results of this coding experiment seem to highlight both the strengths and weakness of user tagging and folksonomies. The wide range of tags both in terms of subject matter and specificity highlight the flexibility of the system. However it seems that this flexibility invariably leads to a degree of messiness of folksonomies supporting the findings of Thomas, Caudle and Schmitz.

Because this study was primarily quantitative it has not managed to gain an insight into the motivations behind user tagging behaviour. A fascinating area of future study would be to provide a qualitative analysis of user tagging behaviour perhaps using questionnaires, interviews or observation. Qualitative study would also be able to provide user feedback on what they feel are the strengths and weaknesses of folksonomies and also how they would react to some of the proposed methods of improvements to tagging such as suggested terms and tag literacy.

Works Cited

Adler, Melissa (2009). “Transcending Library Catalogs: A Comparative Study of Controlled Terms in Library of Congress Subject Headings and User-Generated Tags in LibraryThing for Transgender Books”, Journal of Web Librarianship, 3:309–331, 2009

Bates, Jo and Rowley, Jennifer (2011),”Social reproduction and exclusion in subject indexing: A comparison of public library OPACs and LibraryThing folksonomy”, Journal of Documentation, Vol. 67 Iss: 3 pp. 431 – 448

Carman, Nicholas (2009)“LibraryThing Tags and Library of Congress Subject Headings: A Comparison of Science Fiction and Fantasy Works”. Available at:

Chan, Lois Mai (1986). “Library of Congress Subject Headings: Principles and Practice”. (Colorado : Libraries Unlimited)

Denscombe, Martyn (2010). “The Good Research Guide for Small Scale Social Research Project
– Fourth Edition”, (Maidenhead, Open University Press)

Fichter, Darlene (2006). “Intranet Applications for Tagging and Folksonomies,” Online 30, no. 3

Golder, Scott A. and Bernardo A. Huberman, “The Structure of Collaborative Tagging Systems,”
Available at: http://www.hpl.hp.com/research/idl/papers/tags/tags.pdf

Golder, S.A. and Huberman, B.A. (2006), “Usage patterns of collaborative tagging systems”,
Journal of Information Science, Vol. 32 No. 2, pp. 198-208.

Governor, James Duane Nickull, Dion Hinchcliffe, (2009) “Web 2.0 Architectures”. O’Reilly
Media

Jacob, E. (2004) “Classification and categorization: A difference that makes a difference”.
Library Trends, 52(3):515.540, 2004.

Library of Congress Catalogue, (Accessed 25 September 2012), http://catalog.loc.gov/cgi-bin/Pwebrecon.cgi?DB=local&PAGE=First

Librarything.com, “About Us”, (Accessed 25 September 2012), http://www.librarything.com/about

Librarything.com “Tagmash”, (Accessed 23 September 2012),

http://www.librarything.com/wiki/index.php/HelpThing:Tagmash

Librarything.com, “Librarything concepts”, (Accessed 23 September 2012), http://www.librarything.com/concepts

Lux, Mathias, Granitzer, Michael and Kern, Roman (2008). “Aspects of Broad Folksonomies”. Available at: http://www.uni-weimar.de/medien/webis/research/events/tir-07/tir07-papers- final/lux07-aspects-of-broad-folksonomies.pdf (Accessed 25th September)

Mathes, Adam (2004). Folksonomies – Cooperative Classification and Communication Through Shared Metadata, Available at: http://www.adammathes.com/academic/computer-mediated- communication/folksonomies.html (Accessed 25th September)

Mejias, Ulises (2005), Blog post on Tag Literacy, Available at:
http://blog.ulisesmejias.com/2005/04/26/tag-literacy/ (Accessed 27th September)

Morrison, P. J. (2008). “Tagging and searching: Search retrieval effectiveness of folksonomies on the World Wide Web”, Information Processing and Management 44, pp. 1562–1579.

Morville, Peter and Rosenfeld, Louis (2007) “Information Architecture for the World Wide
Web”, (O’ Reilly, Cambridge).

Munk, T. B. and Mørk, K. (2007). “Folksonomy, the power law & the significance of the least effort”. Knowledge organization 34 pp. 16-33

Nowick, E. A. & Mering, M. (2003). “Comparisons between internet users’ freetext queries and controlled vocabularies: A case study in water quality”, Technical Services Quarterly, 21 (2), 15-
32.

Parks, Heejin (2011), “A Conceptual Framework to Study Folksonomic Interaction”. Knowl. Org. 38(2011)No.6
Pickard, A. J. (2007). “Research Methods in Information” (Facet Publishing, London). Peterson, Elaine (2006). “Some Philosophical Problems with Folksonomy”, D-Lib Magazine
November 2006 Volume 12 Number 11

Peterson, Elaine (2008). “Paralell Systems: The Coexistence of Subject Cataloging and
Folksonomy”, Library Philosophy and Practice 2008 (April)

Rolla, Peter (2009). “User Tags vs Subject Headings: Can User-Supplied Data Improve Subject
Access to Library Collections”. Library Resource and Technical Services Journal, 53(3).

Robu, Valentin, Halpin, Harry and Shepherd, Hana (2009). “Emergence of Consensus and Shared Vocabularies in Collaborative Tagging Systems”, ACM Transactions on the Web, Vol. 3, September 2009, Pages 1-34. Available at: eprints.soton.ac.uk/268192/1/ACMTransactionsPreprint.pdf

Šauperl, A. (2010). “UDC and Folksonomies”, Knowledge Organisation. 37, No.4, pp. 307-317.

Sinclair, James and Cardew-Hall, Michael (2007). “The folksonomy tag cloud: when is it useful?”, Journal of Information Science 2008 34: 15

Spiteri, Louise (2005). “Controlled Vocabularies and Folksonomies.” Presentation at Canadian Metadata Forum, Ottawa, ON, Spetermber 27, 2005. Retrieved September, 2012 from http://www.collectionscanada.ca/obj/014005/f2/014005-05209-e-e.pdf.

Spiteri, Louise (2006). “The Use of Folksonomies in Public Library Catalogues,” Serials Librarian
51, no. 2

Thomas, Marliese, Caudle, Dana, M., Schmitz, Cecilia (2009). “Trashy Tags: Problematic Tags in
Librarything”, New Library World, Vol. 111 Iss: 5 pp. 223 – 235

Trant, Jennifer (2006). “Exploring the potential for social tagging and folksonomy in art museums: proof of concept”. Available at: http://www.archimuse.com/papers/steve-nrhm-
0605preprint.pdf (Accessed 26th September 2012)

Quintarelli, Emanuele (2005). “Folksonomies: power to the people”, paper presented at the ISKO Italy-UniMIB meeting : Milan : June 24, 2005. Available at: http://www.iskoi.org/doc/folksonomies.htm

LibraryThing Tag Mash Blog Post. Available at:

http://www.librarything.com/blogs/thingology/2007/07/tagmash-book-tagging-grows-up/

Vander Val, T. (2007). “Definition of folksonomy”, Vanderwal.net, Accessed 23 May 2012,

http://vanderwal.net/folksonomy.html

Henk J. Voorbij, (1998),”Title keywords and subject descriptors: a comparison of subject search entries of books in the humanities and social sciences”, Journal of Documentation, Vol. 54 Iss: 4 pp. 466 – 476

Wetterstrom, Mikael (2008). The Complementarity of Tags and LCSH – A Tagging Experiment and Investigation into Added Value in a New Zealand Library Context. The New Zealand Library and Information Management Journal, Vol. 50, No. 4 May 2008

Appendix A – Reflection

Overall I would say that while I am satisfied with the outcome of my dissertation there is certainly room for improvement.

The final dissertation saw some slight change from my original proposal. Further reading as part of my literature review indicated that problematic tags would be an interesting and useful area of focus and as such this was incorporated into the dissertations aims and objectives. The proposal indicated that an area of investigation would be a comparison between the different genres of books collected. Although this was started, the statistical comparison of six genres proved too time consuming after I fell behind schedule with the coding of the tags.

The most significant problem encountered was underestimating the time frame required to collect, code and produce statistics for the number of tags collected. Although a pilot study was performed I was perhaps overly optimistic as to how many tags I could use especially considering the learning curve with Excel. I feel that unfortunately as a result too much time was spent on gathering and processing the data leaving inadequate time to provide an in depth analysis. With hindsight I feel that better results could have been achieved using a
smaller sample and doing a more comprehensive analysis along the lines of Rolla’s study. I also
feel that a narrower set of aims and objectives may have produced a more targeted project.

On the positive side I am pleased that my results seemed to be similar to those of Wetterstrom’s study which was the primary inspiration for this dissertation. I feel that this shows that my coding methodology was performed correctly. I am also pleased that I chose the subject of tagging and folksonomies which proved to be a fascinating topic generating lively debate in the library and information science literature. I am also glad that I went out of my comfort zone and attempted a quantitative study through which I learned new skills such as Excel.

Appendix B – Proposal

Working Title
A c ompar ison betw ee n so cial tag g ing on Li brary Th in g an d t he L ibrar y o f
Congre s s Subje ct Hea din g s. W hat a d dit ion al in format ion can ta gs prov i de?

Introduction

In 2003, the social bookmarking site Delicious introduced a system whereby users could add their own index terms to their bookmarks. Although this was not the first instance of what is now commonly referred to as tagging, Delicious helped popularise it as a method of classification on the web. Tagging has since become an extremely widespread means for users to classify and index items on the web and features on a wide variety of sites including blogs, Amazon, Flickr and LibraryThing.

With increasingly widespread usage, tagging and folksonomies have become an important area of study for information professionals. Alenka Sauperl’s survey of LISTA noted that in
2005, 6 documents were indexed on the topic of tagging and folksonomies. By 2006 this had increased to 20, in 2007 there were 35, in 2008 there were 23 and there were 7 already at the beginning of 2009 (Sauperl, 2010).

This dissertation aims to contribute to the increasing amount of literature on the topic of social tagging and folksonomies by investigating whether tags have the potential to add more meaning to bibliographic records than is possible through traditional controlled vocabulary systems. If so, it is arguable that the inclusion of tags alongside traditional systems could have the potential to produce a more user centred form of item description.

Aims and Objectives
The first aim of this paper is to see the degree of crossover between socially produced tags on the LibraryThing website and the controlled vocabulary subject headings produced by the Library of Congress. How many of the tags produced by users on LibraryThing are an exact match to the subject headings produced by the Library of Congress? A high correlation between the two would suggest that even without access to a controlled vocabulary, tags are still representative of the core subject matter of a bibliographic item.

Secondly where the tags are not a match to the terms used in the LCSH, This paper aims to investigate what additional information these tags provide about the item. The fact that users are tagging information that is not included in a controlled system such as the LCSH suggests that there may be information needs that systems such as the LCSH are not meeting. For example do users prefer to use popular terms as access to points to bibliographic information rather than the more formal approved terms of the LCSH? Are there more specific levels of genre that are not represented under controlled vocabularies? How often do users use personalised tagging systems?

In order to meet these aims the objectives are as follows:

1. Compare the top thirty tags from a number of LibraryThing bibliographic records with the LCSH of the same record on the Library of Congress Catalogue.

2. Measure how many of the user produced tags are an exact match to the controlled vocabulary terms of the LCSH.

3. For those tags that are a match to LCSH terms, measure the ranking of these tags based on the number of users who assigned them.

4. For those terms that are not an exact match to LCSH terms, evaluate using a coding system what types of information about the bibliographic item these tags represent.

5. From the results of the previous evaluation measures, draw some conclusions as to the potential value of tagging and folksonomies as a form of bibliographic description. Is there a potential benefit for combining tagging systems with controlled vocabularies?

Scope and Definition
Given the time and resource constraints of a masters dissertation it is necessary to apply a number of limitations to the scope of this paper. Firstly this will be primarily a quantitative study into the relationship between folksonomies and controlled vocabulary systems. The primary mode of research will be the production of statistical measures of user applied tags and an evaluation of the said statistics. The application of a coding system created by the researcher and the subsequent evaluation will introduce some qualitative elements into the study. However it is not within the remit of this dissertation to perform in depth qualitative investigations into precisely why users choose certain tags, although certainly qualitative studies through questionnaires and interviews of tagging behaviour would provide valuable areas for future study.

As the tags and subject headings will be manually collected by the researcher, the sample size of bibliographical items from which to draw statistics will be naturally limited. Even so, it is hoped that the results of this small scale study will provide grounds for a more comprehensive experiment with larger sets of computer collected data.

The scope of this dissertation will cover tags drawn from one website (LibraryThing) and subject headings drawn from one controlled vocabulary system (Library of Congress Subject Headings). The reasoning for these choices is as follows:

LibraryThing
LibraryThing is a social cataloguing site. A registered user is able to create their own catalogue of books with bibliographic information drawn from the Library of Congress, all five national Amazon sites, and more than 690 world libraries (About us, LibraryThing.com). Users are then

able to edit and organise the catalogue information in their personal library in any way they wish. The primary social aspect of LibraryThing (and the subject of this dissertation) is the ability for users to ‘tag’ items in their library.

LibraryThing differentiates between a ‘work’ and a ‘book’. The book information page in a users library refers to their personal copy. Users can edit any of the bibliographic information on this record to suit their particular needs and can apply tags to books to create their own system of organisation. A LibraryThing ‘work’ is the public record of a particular book. A ‘work’ brings together all different copies of a book, regardless of edition, title variation, or language and combines all of the tags that LibraryThing community users have applied to the book in their personal libraries (Some LibraryThing Concepts, LibraryThing.com). Therefore although individual users may use completely individualistic means of tagging their own items, they are collectively adding to a socially created folksonomy.

Tags are displayed in the form of a tag cloud representing the top thirty tags added to that particular work with the text size reflecting the number of users who have applied that particular term. There is also the option to show all of the tags beyond the top thirty. It is here that you tend to find the highly personalised tags such as ‘my top shelf’ where only one person has used the term and it therefore hasn’t rank high enough to appear in the main tag cloud.

LibraryThing seemed the ideal choice for harvesting user tags for a number of reasons. Firstly LibraryThing was founded in 2005 and has since then build up a sizeable user base. This means that there should be a large number of tags available and each item should have tags applied by more than one user. Secondly LibraryThing shows the number of users who have applied a particular tag. This will be useful for measuring the ranking of LCSH terms in the top thirty tags (see objective 3, Aims and objectives). Perhaps one of the most important features of LibraryThing for this particular study is that it consists of bibliographic information. This makes it easier to compare to controlled vocabulary systems which are almost exclusively aimed at classifying bibliographic information.

Library of Congress Subject Headings
The Library of Congress Subject Headings are a controlled vocabulary of subject headings created and maintained by the United States Library of Congress for bibliographic records. Library of Congress subject headings are hierarchical in nature with the listings providing broader and narrower terms to use as necessary.

There are several reasons for choosing the LCSH as a comparison to LibraryThing tags. Firstly it is one of the most widely used and comprehensive controlled vocabulary subject description systems making it the most authoritative comparison to the socially created tags on LibraryThing. Secondly, the LCSH are primarily focused on describing the subject content of bibliographic material. As the tags from LibraryThing are also describing bibliographic material it seemed valid that comparisons could be drawn between the two. Another important reason for choosing the LCSH is that the subject headings are freely available as part of the Library of Congress Catalogue.

Definitions
There are no formal definitions of the terms tag and folksonomy. It is therefore important to be clear how this paper chooses to define these terms.

Tag
This dissertation will take the term tag to refer to a non-hierarchical keyword or term applied to an item by a user of the website (in this case LibraryThing. On LibraryThing tags can consist of a singular word or several words combined into one tag.

Folksonomy
A folksonomy will be taken to mean the collection of tags from several users applied to one item combined together.

Research Context/Literature Review
Tagging refers to the practice of users assigning non-hierarchical keywords (tags) to an item. Without the limitations and complications of controlled vocabularies, tagging allows for quick and easy assignment of keyword terms without the necessity of trained professionals. The ability for users to assign their own terms provides a democratic and social means of indexing where anyone can contribute with no restrictions on the terms they can use (Noruzi 2006; Munk and Mørk, 2007). A folksonomy is when tags from multiple users are combined forming a socially created categorisation system. Golder and Huberman (2006) found that a
folksonomy can form a stable pattern after 100 tags with the pattern representing a consensus
of the most commonly used tags. According to Thomas Vander Wal (2007) the value of folksonomies comes from people using their own vocabulary to add explicit meaning to an item which may come from an inferred understanding of the object.

The freedom associated with tagging and folksonomies does introduce a number of potential drawbacks. Without any means of linking synonyms and homonyms there is no means of associating terms with a shared meaning (Marliese, Caudle & Schmitz, 2010). Similarly, unlike controlled vocabularies such as the Library of Congress Subject Headings, folksonomy based systems do not show hierarchical links between terms which can be useful for narrowing down searches (Chan, 1986). Another associated problem is that tag based systems often do not allow searching with more than one tagged term. LibraryThing does however allow searching with a combination of tags through its tagmash feature (tagmash, LibraryThing).

There have been several studies that have attempted to compare social tagging and folksonomies with controlled vocabularies. Nowick and Mering compared user search terms on the topic of water quality with three controlled vocabularies including the Library of Congress Subject Headings (Nowick & Mering, 2003). They found that only 30-40% of the user search terms were a match with a LCSH. Wetterstrom conducted a comparative study of user
assigned tags created by volunteers of bibliographic material to Library of Congress Subject Headings using a coding system (Wetterstrom, 2008). Wetterstrom was particular keen to see whether user supplied tags contained terms that were localised to New Zealand which may not be represented by the Library of Congress terms. He found that the majority of user supplied terms (75.47%) were not a match with the LCSH terms. He therefore suggested that tags could complement Library of Congress subject headings in library catalogues. Sauperl
compared the terms from four folksonomies (Amazon, LibraryThing, Delicious, 43 Things) with the Universal Decimal Classification (UDC) to see how well the UDC represented terms on material other than bibliographic material (Sauperl, 2010). The results were surprising in that

91% and 90% of the terms from 43 Things and Delicious (non-bibliographic content) were present in UDC whereas only 33% and 79% of the concepts from Amazon and LibraryThing (bibliographic content) were present. In terms of the categories of tags, names, genres and topics tended to represent most by the tags across the four sites. As to what value tags offered above controlled vocabularies, Sauperl argues:

“A number of attributes that are important to users are not part of the UDC. Of those, some are part of the bibliographic description: collection/series, edition, accessibility, and URL. It is clear that the bibliographic description in Amazon is poor and does not allow the user to search by date of publication or edition. Users therefore have to resort to other means. They use tags to overcome the weakness of the system.” (Sauperl, 2010 p. 314).

A number of authors have looked at the value of tags as a form of searching and information retrieval. Kipp and Campbell performed a qualitative study as to how useful a selection of LIS educated users found tagging as a means of searching for information on PubMed and CiteULike (Kipp & Campell, 2010). Users seemed to find the greatest value of tags was as a means of serendipitous browsing from one article to the next and to locate related articles rather than as a means of direct searching. Marliese, Caudle and Schmitz looked at what they termed the ‘messiness’ of tags and how these characteristics could influence their value for search and retrieval in library catalogues (Marliese, Caudle & Schmitz, 2010). Their findings suggest that the biggest problem with user created tags is the high number of variations of tags that already existed in the system (36%). They suggested that a means of reducing the number of ‘messy tags’ may be for tagging systems to provide suggestions and recommendations to give guidelines for creating tags. Morrison performed a user evaluation study of the search retrieval effectiveness of folksonomies on the World Wide Web (Morrison,
2008). User relevance judgements suggested that Folksonomy search results overlapped with
those from the other systems, and documents found by both search engines and folksonomies were significantly more likely to be judged relevant than those returned by any single IR
system type.

Methodology

Sampling
My chosen sampling method is to use cluster sampling where each cluster is a particular genre
(where genre is taken to mean a broad subject area such as classic fiction, crime, business
etc.). There will be an equal number of books selected from each genre. The precise number of genres and books per genre will be decided after an initial pilot study to determine how large a sample is feasible when collecting the data by hand.

The reasoning behind this sampling procedure was that it would be valuable to compare tagging practices across multiple genres. Moreover random sampling did not seem a feasible or useful option as the random sample could select items that have tags entered by only a single user or could select items that are available on LibraryThing but not the Library of Congress Catalogue.

Data Collection
A cluster sample of books from the LibraryThing website will be selected by the researcher. The top 30 tags will be entered into an Excel spreadsheet as well the number of users who have applied a particular term for ranking purposes. The same book will then be retrieved on the Library of Congress catalogue and the LCSH terms will be entered into the spreadsheet.

Coding System
In order to compare the tag results of LibraryThing to the subject headings of the LCSH and to produce meaningful results as to what types of information users are tagging, it will be necessary to create a coding system. I intend to use Mikael Wetterstrom’s coding system as a basis for my own modified system. The exact coding criteria will be developed during the pilot study but some of the expected coding categories include:

Exact Match
If the tag is an exact match to a LCSH for the same item
Related Term
If the tag is a synonym of a LCSH term
Spelling Variation
This category could cover plural deviations from the LCSH or regional variations in spelling
Location Plot setting Character/Protagonist Plot characters Personal Tag
If the tag refers to something useful only to the user who applied the term. For example a location in
Opinion/review
For example “great”, “rubbish” etc.

As already noted this is not a complete or finalised coding system but rather some of the expected categories that may be used. The use of a coding system will introduce a qualitative and potentially biased element to the study as the research will be creating his own categories based on a personal interpretation of the data. One of the potential problems in using coding systems is that the research imposes categories that distort the data rather than allowing the data to influence the categories (Pickard, 2007). Although this will be to some extent unavoidable, similar studies using coding systems have still managed produced useful information (Wetterstrom, 2008).

Statistics
Once the data has been suitably coded it will be possible to produce statistics measuring:

• The number of tags which are exact matches to Library of Congress Subject Headings

o How highly these terms are ranked in the top 30 tags

• The most popular categories of tags across all genres as well as the most popular in specific genres in numerical and percentage terms.

Evaluation
The final stage will be to evaluate the results and draw conclusions, bearing in mind the limitations of the study. From the statistics produced it should be possible to provide answers to the research objectives.

Work Plan
Until I have performed the pilot study it will be difficult to work out a definite time frame. I therefore intend to begin work on the pilot study as soon as possible (within the next two weeks) to ensure that I can plan adequate time for each section. The key stages of the dissertation will be:

1. Literature review

The first stage will be to read widely on the subject of tagging and folksonomies as well controlled vocabulary systems to build upon the initial literature review given in this proposal. I have already begun the process of collecting and reading articles on these topics and plan to continue reading a write a draft of the literature review while undertaking parts 2, 3 and 4 of this work plan.

2. Exploratory study of tags on LibraryThing

This will serve two main purposes. Firstly it will help identify the clusters that will be used for sampling. Secondly from browsing the tags on LibraryThing I can begin to formulate my coding system. I have already begun this process.

3. Pilot Study

The pilot study will form an important part of the overall dissertation. The pilot study will consist of an evaluation of two items from each of the genres identified in the exploratory study. I will collect the tags and subject headings for each of the items and code the tags with the preliminary coding system formulated in the exploratory study. Through the pilot study I will be able to identify any problems in the sampling method and make any required changes. From the time taken to perform the pilot study I will be able to gauge how many items it is feasible to collect and code manually. It will also allow me to test out my coding system and identify categories that may be missing.

4. Data Collection

After the pilot study I will be able to begin collecting my data from the sample size that was decided by the pilot study. The data collection will consist of locating the required number of items on LibraryThing, entering the top 30 tags and number of users who added each term into a spreadsheet and also entering the LCSH terms for each item.

5. Coding of Tags

It is expected that this will be one of the most time consuming and laborious aspects of the dissertation. I will therefore ensure that ample time is devoted to this stage. Because of its potentially tedious nature I intend to begin a draft of the first section of the dissertation (introduction, aim & objectives, literature review) alongside this stage to help break up the coding process.

6. Statistics

Once the data has been collected and suitably coded I can begin to produce meaningful statistics that can be interpreted and evaluated. This stage will consist of presenting the coded data in a tabular and graphical form to represent statistics such as total number of tags for each category, percentage of tags of a given category and ranking of LCSH terms in the top tags.

7. Draft form

Once these stages are complete I can begin to produce a draft form of the dissertation. During the coding of the LibraryThing tags, the initial sections should already have
been started leaving the analysis of results and evaluation as the main sections to complete at this stage. Ideally the draft will be finished a month before the final due date to ensure there is enough time to proof read and improve.

Resources
LibraryThing is available free of charge and is accessible to the researcher. The Library of
Congress Subject Headings are freely available as part of the Library of Congress Catalogue. I
do not anticipate the need to consult the full subject heading schedules which would be a paid resource. Data collection and the production of statistics will be done using Microsoft Excel which I have access to.

Ethics and Confidentiality
There do not appear to be any ethical or confidentiality issues associated with this dissertation. All of the data collected is anonymous and not of a sensitive nature.

Select Bibliography

Chan, L. M. (1986). “Library of Congress Subject Headings: Principles and Application Second
Edition” (Libraries Unlimited, Colorado).

Golder, S.A. and Huberman, B.A. (2006), “Usage patterns of collaborative tagging systems”,
Journal of Information Science, Vol. 32 No. 2, pp. 198-208.

Kipp, M. E. I., and Campbell, D. G. (2010). “Searching with Tags: Do Tags Help Users Find
Things?” Knowledge Organisation 37 No.4, pp. 239-255.

Library of Congress Catalogue, Accessed 23 May 2012, http://catalog.loc.gov/cgi-bin/Pwebrecon.cgi?DB=local&PAGE=First

Librarything.com, “About Us”, Accessed 23 May 2012, http://www.librarything.com/about

Librarything.com “Tagmash”, Accessed 23 May 2012,

http://www.librarything.com/wiki/index.php/HelpThing:Tagmash

Librarything.com, “Librarything concepts”, Accessed 23 May 2012,

http://www.librarything.com/concepts

Munk, T. B. and Mørk, K. (2007). “Folksonomy, the power law & the significance of the least effort”. Knowledge organization 34 pp. 16-33

Morrison, P. J. (2008). “Tagging and searching: Search retrieval effectiveness of folksonomies on the World Wide Web”, Information Processing and Management 44, pp. 1562–1579.

Noruzi, A. (2006). “Folksonomies: (un)controlled vocabulary?” Knowledge organization 33 pp.
199-203.

Nowick, E. A. & Mering, M. (2003). “Comparisons between internet users’ freetext queries and controlled vocabularies: A case study in water quality”, Technical Services Quarterly, 21 (2), 15-
32.

Pickard, A. J. (2007). Research Methods in Information (Facet Publishing, London).

Rolla, P.J. (2009). “User Tags versus Subject Headings -Can User-Supplied Data Access to
Library Collections?”, LRTS, Vol. 53 Iss. 3, pp. 174-184.
Šauperl, A. (2010). “UDC and Folksonomies”, Knowledge Organisation. 37, No.4, pp. 307-317. Thomas, M., Caudle, D. M., Schmitz, C. (2010). “Trashy tags: problematic tags in LibraryThing”,
New Library World, Vol. 111 Iss. 5, pp. 223 – 235.

Vander Val, T. (2007). “Definition of folksonomy”, Vanderwal.net, Accessed 23 May 2012, http://vanderwal.net/folksonomy.html

Wetterstrom, M. (2008) “The Complementarity of Tags and LCSH – A Tagging Experiment and Investigation into Added Value in a New Zealand Library Context”, The New Zealand Library and Information Management Journal Vol. 50 Iss 4 pp. 292-307.

Leave a comment

Filed under Uncategorized

Downloadable data sets [OCLC]

Following on from my post on Linked data, OCLC have published bibliographic linked data of the most widely held works in WorldCat. The 1gb zip file of rdf data has the potential to be a great research tool for library and information professionals.

Mike Teets, OCLC Vice President for Innovation claims that:

 “This release will make it easier for the wider linked data community—commercial providers, retail organizations, researchers and scholars—to include library information in their workflows.”
Downloadable data sets [OCLC].

This project demonstrates what a valuable source of data libraries can be, and helps place the library at the forefront of the move towards a more semantically linked world wide web.

Leave a comment

Filed under Uncategorized

Linked Data

A quick addition to my last post on the UK Web Archive. Reading a post on the National Archives blog about linked data raised an interesting question about the value of web archives. A key principle of the Word Wide Web is the idea of hyperlinking. Therefore if we archive a particular site but not any of the pages it links too, are we therefore losing a great deal of the semantic meaning of that website?

The idea of linked data itself is a very interesting theory that we covered as part of the digital information technologies and architectures module at City University London. The main idea behind linked data (sometimes known as the semantic web) is to provide context to information on the web through a Resource Description Framework (RDF). Each RDF contains a subject, relationship and object. An example of a RDF would be:

The context the rdf triple adds to the term ‘raven’ distinguishes it from other uses of the word such as ‘The Raven’, the Edgar Allan Poe story. This video by OCLC explains it better than me but hopefully you can begin to see why this would be valuable. As well as adding context to words, the RDF triples would allow users to explore concepts. For example if you wanted to know more about ravens you could explore the genus corvus.

The practicalities associated with the semantic web mean that for now its uses are going to be pretty limited but the possibilities are definitely interesting.

Leave a comment

Filed under Uncategorized

UK Web Archive

Following on from my previous post, another article in the August CILIP Update caught my eye. This time it was Public Life Online: A Lasting Legacy, an article written by Maureen Pennock, the digital preservation manager at the British Library. The article gives an account of the UK Web Archive, set up in 2004 to act as a memory institution of digital material. While the National Archives are conducting a similar program, theirs is focused mostly on archiving central government websites. The British Library web archive in contrast is much more far reaching and ambitious in its aims, attempting to archive what are considered to be websites of cultural significance. Currently there are more than 10,000 different websites held in the archive as well as multiple copies of particular sites from different periods, allowing users to see changes over time.

Currently there are restrictions on which websites can be archived. The British Library is unable to crawl sites without the permission of the owner. However the selective nature of the UK Web Archive is probably a strength allowing it to function as a curated archive that says something about the UK online space. The British Library are currently archiving pages that:

  • reflect the diversity of lives, interests and activities throughout the UK;
  • contain research value or are of research interest;
  • feature political, cultural, social and economic events of national interest;
  • demonstrate innovative use of the web.

Quote from UK Archive Blog

This archive of online content certainly provides researchers with a rich source of information. As the article notes, when analytical software is applied to the archive it can show interesting trends such as emerging search terms. Storing multiple instances of a page could also prove useful for showing changing web technologies and website design.

The UK Web Archive is certainly an exciting project and it will be interesting to see how the British Library copes with making the archive accessible and pages findable as the content continues to grow. Looking at the UK Web Archive website, it seems that there is currently an attempt to classify material, allowing users to browse for websites on a particular subject such as medicine and health. This is also coupled with a full text search. Performing a few quick searches I found the search mechanisms to be very effective, allowing for keyword searching within a particular subject.

The archive seems to have been favorably received by the scholarly community as shown by this UK Web Archive blog post, particularly in the arts and humanities or social sciences disciplines.

The UK Web Archive is a great example of digital preservation and some of the unique challenges of dealing with material that is ‘born digital’ such as its constantly changing nature. Presumably there are also complex copyright issues associated with in a project such as this.

The article “Public Life Online: A Lasting Legacy”, by Maureen Pennock featured in the August 2012 edition of CILIP Update and is available to CILIP members at: http://www.cilip.org.uk/publications/update-magazine/pages/default.aspx

1 Comment

Filed under Uncategorized

Outsourcing

The article “Finding the Value in Corporate Libraries” is available to CILIP members in the August 2012 edition of CILIP Update available at: http://www.cilip.org.uk/publications/update-magazine/pages/default.aspx

I have just read a very interesting article in the August edition of CILIP Update on outsourcing in the information profession. Rob Corrao and Iain Dunbar of LAC Group, a professional services firm, argue against the fears that many in the information profession feel towards the term ‘outsourcing’ claiming that it can, in actual fact:

“transform knowledge and information managers who may be underused and under-appreciated into integral employees that add visible value to a business”

Having never worked in a commercial information position I do not have any first-hand experience of outsourcing myself. Even so, something about the term ‘outsourcing’ has always had negative connotations, suggesting loss of jobs and devaluation of the profession as a whole. However after reading the article and browsing the LAC Group website I am inclined to think that outsourcing may not necessarily be a bad thing. Firstly the article suggests that outsourcing is not necessarily about removing staff but rather freeing up the time of information professionals through introduction of new technologies or performing routine tasks off site. Interestingly the article noted that:

“managed services can mean leaving staff in-house but under the guidance of LAC where our staff member works as part of the team”

rather than necessarily taking everything off site. The main argument in favour of outsourcing seems to be that it reduces the costs of running an information team and that it frees up time for professionals to perform the task that represent real value to the company.

I do have some reservations as to whether outsourcing is limiting the number of entry level information jobs. With information teams becoming progressively smaller it seems to be very difficult to get a foot in the door. Even so it has been interesting to read another perspective on the issue.

Edit

Further reading on a library/information perspective of outsourcing:

Blog post by The Running Librarian http://www.therunninglibrarian.co.uk/2011/10/outsourcing-offshoring-and-right.html

Outsourcing in Law Firm Libraries by Rachel Pergament, Published on April 1, 1999. An old article but features an interesting case study of Baker and McKenzie. http://www.llrx.com/features/outsourcing.htm

Leave a comment

Filed under Uncategorized

CILIP New Professionals Day 2012

A while ago I attended CILIP’s New Professionals Day which brought together a wide range of new information professionals for a day of talks, workshops and networking. Influenced by the day I have (finally!) decided to revive this blog that was originally intended for my masters coursework.

Ned Potter: You already have a brand! Here are 5 ways to influence it…

The day opened with a talk by Ned Potter on personal branding. I must admit I was initially skeptical about this one, expecting to be bombarded with marketing jargon. However I’m happy to say I was wrong! The talk was both engaging and informative and provided an energetic start to the day.

When I think of brand, my mind immediately seems to go to:

 

However these are examples of branding rather than a brand. Your personal brand is about how people perceive you and how you wish to be perceived by others rather than logos and fancy colour schemes. It is important to be aware that we all have a brand whether we like it or not. However the good news is that we have power over this brand.

An important point of the talk was that you do not need to be a super-librarian who blogs, tweets, presents, writes, publishes, organises events and in his spare time fights crime! Rather the best means of creating a personal brand is to focus on the areas and tools that suit your personality and career aspirations. Ned perhaps puts it a little more clearly:

“Ultimately, what gets you the job which pays your wages, is your ideas and the stuff on your CV which is relevant to that particular job. The whole process of building a brand, marketing yourself: that’s a means to the end of getting more opportunities to add exciting and relevant ideas and things on your CV, rather than an end in itself”

Ned Potter

The key is to match your brand to the path you want. Having said that their are certainly steps anyone can take to improve their personal brand. It is now pretty common practice for employers (or anyone one else for that matter) to Google your name. Social media can be used to turn this practice to your advantage. Probably the most important tool in this respect is LinkedIn. A LinkedIn account can act as your personal CV and gives you a professional online presence. Twitter also can be used to build your personal brand. However it is important to think about the tone and content of your tweets in light of how you wish to present yourself.

The most important message of all however was:

Continuing professional development adventures: What? Why? How?

Next up was the first of three workshops. Led by Emma Illingworth the aim of the workshop was to look at what CPD is, why we should be doing it, and how we go about it. My impression of CPD entering into the workshop was mostly of structured forms such as CPD23 and chartership. However the most important thing I took away was that CPD is something can can be done at any time and does not necessarily have to be a structured approach. This also dispelled the myth that CPD requires a significant investment of time and money.

Cyberlibrarians: Information management jobs in the digital age

My second workshop gave a great introduction to some of the IT related roles which may not traditionally be seen as librarian-esq jobs. The workshop was led by Lisa Hutchins and information architect and Richard Hawkins, the online information manager at CILIP.

It was interesting to see the parallels between the work of an information architect and more traditional forms of librarianship as well as hearing about the advantages and challenges of self employment.

A career in corporate libraries: The pitfalls and the profits

My final workshop looked at the role of corporate libraries an area I was especially interested in. I was especially interested in finding out how necessary specialist knowledge such as a law degree is to work in corporate libraries. The consensus seemed to be that the lack of a law degree was not necessarily a barrier to entry into the field as long as you could demonstrate an interest in the law as well as good solid research and information management skills.

Lunch

CILIP put on an amazing spread of burritos for lunch.

How to assemble your New Professional’s Toolkit

Bethan Ruddock’s talk introduced the key ideas of her book The New Professional’s Toolkit.  She argued for a new professionals ‘toolkit’ of networks, mentors, a plan, resources and a voice. As we Ned Potter’s talk the point was that there is not a one size fits all approach. Rather we should aim to use these tools to forge our own path and achieve our own aims.

Social Media

The final talk of the day by Phil Bradley demystified the world of social media. Phil stressed that as information professionals we should be actively engaging with all forms of social media. As one of the primary means of exchanging information, social media is an important tool and we should all be aware of the strengths and weaknesses of various networks.

Pub!

After the final talk many of us headed off to the pub giving us a chance to discuss the day. It’s a testament to all of the speakers and participants that the NPD2012 managed to inspire such lively debate. Overall the New Professionals Day was a hugely beneficial experience. I felt that I learned something from each of the talks and would definitely recommend anyone to attend the next one.

Leave a comment

Filed under Professional Development, Uncategorized

DITA coursework 2: Web 2.0 and its applications for the library and information profession.

http://aspiringinfopro.wordpress.com/2012/01/08/dita-coursework-2-web-2-0/

According to Tim O’Reilly, the idea of Web 2.0 emerged out of the dot com bubble. He noted that companies that survived the collapse of the technological sector during the 1990’s had in common particular methods, concepts and technologies that allowed them to be cutting-edge compared to their competitors (Aanin-Yost, 2010). There are no hard definitions of Web 2.0 and the term is frequently surrounded by hyperbole and buzzwords. To act as a Launchpad for discussion Collins English Dictionary defines Web 2.0 as:

“the internet viewed as a medium in which interactive experience, in the form of blogs, wikis, forums, etc, plays a more important role than simply accessing information”

This definition certainly highlights one of the central characteristics of Web 2.0: that it is dynamic. Rather than seeing the users of the web as polarised between content producers and content consumers, the technologies that underpin Web 2.0 allow web content to be dynamic, with users playing the role of both producer and consumer. To use a simplistic example of the difference between Web 1.0 and Web 2.0, Britannica Online would represent Web 1.0 and Wikipedia Web 2.0 (O’Reilly, 2005). Britannica articles are produced and published in (relatively) static form to be read by web users. Wikipedia in contrast is continually changing and any web user can act as a content producer, adding their own knowledge to an article. The blog is another example of how users are no longer simply consuming information on the web but actively publishing. Blogs are so successful because they provide a user friendly ‘what you see is what you get’ interface, allowing users with no prior experience of html to publish web content. Moreover the diary style format provides something instantly recognisable.

Another defining characteristic of Web 2.0 is that the web acts as a platform. In the world of Web 2.0 it is supposed that most computing activity will occur through the web rather than applications stored on the users pc. In this way the computer becomes less of a platform in itself and more akin to a portal to access the web. While we have yet to see a complete deviation from using computers in a traditional sense, there is certainly a continuing move towards the idea of ‘cloud computing’ as exemplified by the idea of the Google’s Chromebook, a laptop based on an operating system which, rather than using natively installed software, relies upon web applications such as Google Docs (Shiels, 2011).

The final key characteristic of Web 2.0 which has already been touched upon is the idea of collaboration. In What is Web 2.0, O’Reilly talks about the idea of harnessing the collective intelligence of a products user base through feedback, user reviews, and user-crafted social networks and points to how sites such as Amazon, Flickr, and Facebook depend on user participation (Casey and Savastinuk, 2006). Indeed sites such as Facebook provide virtually no content of their own and are completely reliant upon user contribution and collaboration. They are therefore a perfect example of a website which provides a service for others to share information rather than publishing themselves.

So how do Web 2.0 technologies influence the library and information profession? Web 2.0 technologies have certainly caused lively debate amongst library and information professionals. Some have heralded the technologies as vital if libraries are to remain relevant in the modern age. The term ‘library 2.0’, coined by Michael Casey on his blog ‘library crunch’ in 2006 places web 2.0 as central to the work of libraries and librarians (Aqil, Ahmad and Siddique, 2011).

There are numerous examples of Web 2.0 technologies being applied in the library environment. For example the City University Library catalogue features social tagging, allowing users to apply their own keywords to books and items to aid other users.

The concept of the tag as a non-hierarchical form of content description is not new to the information profession who have used concepts such as keyword indexing and subject headings. What makes tagging different however is that it allows users to contribute their own ideas and concepts to the material which may better reflect their information needs forming what is known as a folksonomy (Aqil, Ahmad and Siddique, 2011). Other examples of increasing the participation of users in their library service include the ability to rate and review items and share their favourite items with others through social media such as Twitter and Facebook. The principle behind incorporating features such as these is to harness the knowledge of users in order to supplement and improve library services (Casey and Savastinuk, 2006). These techniques are also increasing the interactive nature of library services. Instead of the OPAC functioning as a simple portal to access the library catalogue database it becomes a platform on which users can tailor the way they access and use the libraries information.  It is important to note that in the case of social tagging for example, Web 2.0 technologies are not replacing existing library and information methods such as cataloguing with controlled vocabularies, but are rather being used to complement existing techniques. Moreover it would seem that techniques such as these are not necessarily revolutionising the way in which users interact with libraries. Generally these technologies are ‘tacked’ onto existing OPACs providing a degree of interaction by the user which is always secondary to the primary function of the OPAC.

A more interesting example of the possibilities afforded to libraries in the Web 2.0 environment is the open-source software Scriblio (http://scriblio.net/). Underpinned by WordPress, Scriblio seems closer to realising the ideal of Library 2.0.  The Hong Kong University of Science and Technology Library have implemented Scriblio into their SmartCat catalogue describing how it “features Web 2.0 technology, such as faceted searching, interactive tag clouds, user comments and RSS feeds” (http://catalog.ust.hk/catalog/?page_id=2). On an example catalogue entry we can see how they have implemented some of the features of Web 2.0 into a library environment (http://catalog.ust.hk/catalog/archives/836286). For example it offers the ability for users to share a catalogue item over a large number of blogging and social networking sites, the ability to write comments on an item in a similar way to writing comments on a blog and even a QR code providing the title, location and call number on a mobile device. Unlike the City University library catalogue, the fact that Scriblio is built upon WordPress means that the web 2.0 features are much more tightly integrated into the user interface and allow users to play a much more active role in shaping the library service.

Web 2.0 is also providing libraries with a platform to publicise their services. The City University library for example is using twitter to reach out to its student population and provide announcements such as changes to opening hours and new acquisitions. For Casey and Savastinuk one of the great benefits of libraries participating in social networking is that it encourages regular customer feedback preventing what they consider to be a tendency for libraries to “plan, implement and forget”. Some libraries are encouraging feedback from users through means such as instant messaging to provide a digital reference service. Another innovative use of web 2.0 technologies which has potential applications for libraries is the idea of the mash-up. According to Merrill, a mash-up is an interactive web application which draws upon content from external data sources and combines them into a new service (Merrill, 2006). Currently the uses of mash-ups in a library context have not been explored fully and tend to be confined to councils plotting public library locations onto Google Maps for example.

Conclusion

This essay has looked at some of the practical applications of web 2.0 in a library and information setting. To conclude I would like to think more generally about the theoretical implications of web 2.0 on the nature of library and information services. To some degree the concept of Library 2.0 is not a revolution in library and information services but rather an evolution building upon the work that information providers have been performing for years. For example the incorporation of instant messaging into a library service is not necessarily creating a new role for the librarian but rather better allowing them to accomplish the role of reference service in the Web 2.0 age. However there certainly are some paradigmatic shifts brought about by these new technologies as noted by Jack Maness. Library 2.0 is in essence not simply about extending the reach of existing library services but rather about allowing users to have a collaborative input. As Maness states “The biggest change this will lead to is that rather than creating systems and services for patrons, librarians will enable users to create them for themselves” (Maness, 2006). This means librarians will have to relinquish some degree of control over the information contents of their service. I would also argue that there are also a number of areas where we must be cautious. The world of Web 2.0 is rapidly changing. As we have seen with Myspace, it is possible for a web service to very quickly fall out of favour and no longer be relevant. It is important therefore that libraries look long term at the implications of incorporating a site such as Facebook into their library service should that service disappear or be replaced. Another major hurdle when combining web 2.0 with library services, especially in the public library environment is privacy. As Litwin has noted one of the primary features of web 2.0 which makes them so useful is that users are sharing personal information about themselves (Litwin, 2006). It is this personal information that makes it possible to tailor services to meet their needs. However this leads to implications for library services about how they tailor their services to individual needs while at the same time ensuring that users do not inadvertently reveal too much information about themselves.

Aqil, M., Ahmad, P.,  and Siddique, M. (2011), Web 2.0 and Libraries: Facts and Myths, Journal of Library and Information Technology, Vol. 31, No. 5, Sep 2011, pp. 395-400.

Casey, M., and Savastinuk, L., Library 2.0: Service for the Next Generation Library, Available at: http://www.libraryjournal.com/article/CA6365200.html

City University London library catalogue, Available at: http://www.city.ac.uk/library/ (accessed: January 07, 2012)

Litwin, R. (2006). The Central Problem of Library 2.0: Privacy, Available from http://libraryjuicepress.com/blog/?p=68 (accessed: January 07, 2012)

Maness, Jack M. (2006), Library 2.0 Theory: Web 2.0 and its Implications for Libraries, Available at: http://www.webology.org/2006/v3n2/a25.html (accessed: January 07, 2012)

Merrill, Duane (2006), Mashups: The New Breed of Web Apps, Available at : http://www.ibm.com/developerworks/xml/library/x-mashups/index.html (accessed: January 07, 2012)

O’Reilly, Tim (2005), What is Web 2.0, Available at: http://oreilly.com/web2/archive/what-is-web-20.html (accessed: January 07, 2012)

Shiels, Maggie (2011), Google unveils first Chrome powered laptops, Available at: http://www.bbc.co.uk/news/technology-13362111 (accessed: January 07, 2012)

SmartCat: about page, Available at: http://catalog.ust.hk/catalog/?page_id=2 (accessed: January 07, 2012)

SmartCat Example Catalogue Entry, Available at: http://catalog.ust.hk/catalog/archives/836286 (accessed: January 07, 2012)

web 2.0 definition. Dictionary.com. Collins English Dictionary – Complete & Unabridged 10th Edition. HarperCollins Publishers.  Available at: http://dictionary.reference.com/browse/web 2.0 (accessed: January 07, 2012)

Zanin-Yost, A (2010), Library 2.0: Blogs, Wikis, and RSS to Serve the Library, Library Philosophy and Practice, Sep, 2010

Leave a comment

Filed under Coursework, Uncategorized