Technology

An overview of text summarization

There are two types of text summarization, and they follow programmed algorithms to understand keywords, grammar and sentence structure to summarize

For conducting research, you need to analyze several papers in a very short time. In most cases, it becomes difficult to go through hundreds of pages to find some vital information. This is where text summarization comes to help. With the help of AI and advanced machine programming, you can summarize large articles and read all the important information in a matter of minutes.

text summarization

The process of summarization is referred to as automatic text summarization. In this article, we will talk about the basics of automatic text summarization.

Types of text summarization

The two types of text summarization based on context

1. Extraction based text summarization

In extraction based text summarization, keywords and phrases are extracted from the main text. These extracted words are compiled into sentences to form the summary. Hence, the name extraction-based summarization.

The structure of the sentence remains the same. There is no rephrasing or use of synonyms. The reason why is that this is a much simpler version of summarization and does not require complex algorithms. The downside is as the words are neither changed and utilized as is, the sentences might not be structured properly.

2. Abstractive based text summarization

Abstractive based text summarization takes the keywords, changes them using the right synonym and then rephrases the main text, keeping the meaning the same.

So there will be no similarities between the main text and the summarized text apart from the meaning. Since this summarization process requires the use of synonyms, it uses many complex algorithms. It also ensures that the summary is reader-friendly and grammatically correct.

How machines summarize large texts

Machines follow some strict algorithms to make a summary of a large text. The step-by-step procedure is given below-

The machine takes key phrases from the main text. This is done through a method that extracts the merited key phrases from the source document. These merited key phrases are prerecorded in the algorithm and when two key phrases match, the word is selected and used in the summary.

The machine considers both positive and negatively labelled key phrases. These phrases are vital when generating well-structured sentences.

The machine then uses a binary machine learning classifier to summarize the whole text. In the classifier, the following factors are considered-

  • The length of each sentence
  • The frequency of the key phrase appearing
  • The number of characters in the key phrase

And last, when the keywords and the length of each sentence are decided, the summary is made by organizing all the sentences together.

This is just the basics of an average summarizer. Modern summarizers use AI, machine learning languages and several complex algorithms to conduct multiple text summarization.

What should be included in a summary?

To make the summary reader-friendly and ensure it has the same meaning as the main document; a summarizer has to consider two aspects.

The first one is that all the keywords and key phrases are in the summary, and the second, the key phrases are structured and expressed in the right way.

When a text summarization algorithm is made, people place more emphasis on finding the right keywords. But that is not the only important point. How it will be used in the summary also needs equal emphasis. Because of this, there are still very few automatic text summarizers that can match the level of human expertise.

To find out the right keyword, focus on nouns, location, incident, time, and activity. For example, let us look at the sentence “There was heavy rain yesterday. They had to cancel their match because of the rain”. Here, the keywords would be they (noun), yesterday (time), cancel the match (activity), heavy rain (incident). These are the first things that should be included in a summary.

And next is to express these keywords or put them into a structured sentence. It will be different for extraction based text summarization, and abstractive text summarization. For extraction based, the summary will be “Heavy rain yesterday, they cancelled the match”.

As mentioned before, extraction-based summarization only takes the keywords and compiles them. It does not give the sentence a proper structure.

For Abstractive summarization, the phrase would be “They cancelled their match yesterday because of the heavy rain”. In this summarization, the key phrases are properly structured into the summary. These are the two factors that should always be in a summary. The keywords and the proper expression of these keywords.

How to evaluate a summary

To evaluate a text summary, there are two methods. They are the intrinsic-extrinsic method and the intertextual-intratextual method.

1. The intrinsic-extrinsic method

The intrinsic process is used to check if the summary contains all the proper keywords. It also checks whether the keywords are structured in proper sentences and compare the automatic summary to a human-made summary.

If the keywords are similar and the texts are grammatically correct, then it is a good summary. As for the extrinsic method, it checks the impact of the summary on the rest of the document. It also checks how relevant the summary is, how comprehensive and readable it is.

2. Inter textual-intratextual method

Intertextual is the process of assessing the output of a specific summarization system. How well the summarization system, in this case, the algorithm evaluates the whole summary. As for the intratextual process, it focuses on the contrastive analysis of outputs of several summarization systems.

This means when multiple algorithms are used to make a summary, it evaluates all the systems.

Generally, extraction-based summarization is evaluated with an intertextual process and abstractive summarization is evaluated with an intratextual method. For more modern summarizers, AI is also used along with these methods.

Final thoughts

Reading large documents can be a huge waste of time. But summarizing the entire document with a text summarizer will allow you to receive all the key information in a short time. This makes the tool extremely useful for students, researchers and just about anyone who has to go through thousands of words every day.