What is text annotation? Five different types of annotations
Lionbridge AI is now TELUS International. But don’t worry! We are currently updating our great content to our new home. This article will be looking good again in no time!
Natural language processing (NLP) is one of the biggest fields of AI development. Numerous NLP solutions like chatbots, automatic speech recognition and sentiment analysis programs can improve efficiency and productivity in various businesses around the world. Recent breakthroughs in NLP have even shown potential to help the speech impaired communicate freely with automatic speech recognition devices and the people around them. However, none of these amazing technologies would be possible without text annotation and the companies that provide these annotation services.
To train NLP algorithms, large annotated text datasets are required and every project has different requirements. For developers looking to build text datasets, here is a brief introduction to five common types of text annotation.
1. Entity Annotation
Entity annotation is one of the most important processes in the generation of chatbot training datasets and other NLP training data. It is the act of locating, extracting, and tagging entities in text. Types of entity annotation include:
- Named entity recognition (NER): The annotation of entities with proper names.
- Keyphrase tagging: The location and labeling of keywords or keyphrases in text data
- Part-of-speech (POS) tagging: The discernment and annotation of the functional elements of speech i.e. adjectives, nouns, adverbs, verbs, etc.
Entity annotation teaches NLP models how to identify parts of speech, named entities and keyphrases within a text. In this task, annotators read the text thoroughly, locate the target entities, highlight them on the annotation platform and choose from a predetermined list of labels. To help NLP models learn about named entities further, entity annotation is often paired with entity linking.
2. Entity linking
Whereas entity annotation is the location and annotation of certain entities within a text, entity linking is the process of connecting those entities to larger repositories of data about them. Types of entity linking:
- End-to-end entity linking: The joint process of first analyzing and annotating entities within a text (named entity recognition), and engaging in entity disambiguation
- Entity disambiguation: The process of linking named entities to knowledge databases about them.
Entity linking is used to both improve search functions and user experience. Annotators are tasked with linking labeled entities within a text to a url that contains more information about the entity.
3. Text classification
Also known as text categorization or document classification, text classification tasks annotators with reading a body of text or short lines of text. Annotators must analyze the content, discern the subject, intent and sentiment within it and classify it based on a predetermined list of categories. Whereas entity annotation is the labeling of individual words or phrases, text classification is the process of annotating of an entire body or line of text with a single label. Related text annotation types include:
- Document classification: The classification of documents used to help with the sorting and recall of text-based content.
- Product categorization: Crucial for eCommerce sites, product categorization is the sorting of products or services into intuitive classes and categories to help improve search relevance and user experience. Sometimes annotators are shown product descriptions, product images, or both. The annotators would then choose from a list of departments or categories that the client has provided.
- Sentiment annotation: The classification of text based on the emotion, opinion, or sentiment within the text.
Because text classification is a broad category, various annotation types like product categorization or sentiment annotation are technically just specialized forms of text classification.
4. Sentiment annotation
Emotional intelligence is one of the most difficult fields of machine learning. Sometimes it is difficult even for humans to guess the true emotion behind a text message or email. It is exponentially more difficult for a machine to determine connotations hidden in texts that use sarcasm, wit, or other casual forms of communication. To help machine learning models understand the sentiment within text, the models are trained with sentiment-annotated text data.
More broadly referred to as sentiment analysis or opinion mining, sentiment annotation is the labeling of emotion, opinion, or sentiment inherent within a body of text. Annotators are given texts to analyze and must choose which label best represent the emotion or opinion within the text. A simple example would be the analysis of customer reviews. Annotators would read the reviews and label them as positive, neutral, or negative.
When built correctly with accurate training data, a strong sentiment analysis model can accurately detect the sentiment in user reviews, social media posts, and so on. The sentiment analysis model would then allow businesses to track public opinion about their products, allowing the companies to develop future strategies or alter current strategies accordingly.
5. Linguistic annotation
Also referred to as corpus annotation, linguistic annotation simply describes the process of tagging language data in text or audio recordings. With linguistic annotation, annotators are tasked with identifying and flagging grammatical, semantic, or phonetic elements in the text or audio data. Types of linguistic annotation include:
- Discourse annotation: The linking of anaphors and cataphors to their antecedent or postcedent subjects. Ex: James broke the chair. He felt really bad about it.
- Part-of-speech (POS) tagging: The annotation of the different function words within a text<
- Phonetic annotation: The labeling of intonation, stress and natural pauses in speech
- Semantic annotation: The annotation of word definitions
Linguistic annotation is used to create AI training datasets for a variety of NLP solutions such as chatbots, virtual assistants, search engines, machine translation, and more. These are just five types of text annotation commonly used in machine learning today. To read more about these five types of text annotation, please see our AI Data Solutions pages.