+2

Text Summarization — What is it?

Introduction

Recently, text summarization in deep learning strikes my mind, making me obsess with it for days and nights. In this post, I would want to contribute my knowledge on this topic and explain how to complete the assignment in a black-box manner in this post, thanks to Huggingface.

image.png

It is evident that as technology progresses, there is an increasing flow of information that is quite astounding. The benefit does, however, bring up an unsettling phenomena wherein someone who needs to conduct research or whose work involves document searching drowns. We genuinely need a program that can extract the most crucial elements from documents.

image.png

This is when automatic text summary comes in handy. The act of summarizing involves communicating key points from the source text or texts. The goal of automatic text summarizing is to create a succinct and readable summary while maintaining the main ideas and content of the original text. Since the activity needs human understanding and language flexibility, it is typically questioned. Nonetheless, numerous studies have been carried out which demonstrate that, while the quality may not be outstanding, the potential is both implementable and developable.

Personally, I feel that this topic is very intriguing. The reason for that is when I am doing my research, there are tons of documents that I have to deal with. I thought to myself that “it must be an easy way for this”. An idea came to my mind that using machine to perform the task would be a breakthough in the NLP field. For a moment, I believed that I became an inventor, a game changer in this field. However, reality beats fantasy. At the end, I became in love with the topic and decided to delve deeper into the text summarization with deep learning.

There are many companies specialise this feature, making their products become “hard-to-deal-with” in the NLP field, for instance, Quillbot and Grammarly. You should try it once!

Main Schools

There are 2 main schools in this interesting field:

Extractive approach: this approach is a technique for creating summaries of texts by selecting and combining existing sentences from the original document. These sentences are chosen based on their relevance and importance to the overall theme of the text. In other words, Extractive summarization picks out the most important pieces of information from each document and puts them together to create a concise overview of the entire topic. Here are some key features of extractive summarization:

  • Simple and efficient: It’s relatively easy to implement and works well on short texts.
  • Preserves factual accuracy: The sentences are taken directly from the source text, so they are guaranteed to be factual.
  • Maintains coherence: By selecting sentences that are close together in the original text, the summary generally flows smoothly.
  • May lack originality: The summary may not contain any new insights or ideas, as it simply reflects the content of the original text.
  • Potentially misses key points: If important information is not contained in complete sentences, it may be missed by the summarization process.

Extractive summarization is a widely used technique for creating summaries of news articles, scientific papers, and other factual texts. It is a good choice when you need a quick and reliable summary that is guaranteed to be factually accurate. image.png Abstractive approach: this approach is a sophisticated approach to summarizing text that takes things a step further than its extractive counterpart. Instead of simply picking and choosing existing sentences, it delves deeper, understanding the main ideas and concepts of the original text, and then rephrases them in its own words, creating a concise and informative summary. Here are some key features of abstractive summarization:

  • Highly insightful: Can capture the main ideas and concepts beyond the surface level of the text, providing deeper understanding.
  • More concise: Often produces shorter summaries compared to extractive methods, focusing on the most essential information.
  • Original and creative: Generates new sentences and phrases, potentially expressing insights not explicitly stated in the source text.
  • More complex to implement: Requires advanced models and training techniques, typically involving deep learning methods like neural networks.
  • May introduce potential inaccuracies: The generated text, while creative, can deviate from the factual content of the original source.

Abstractive summarization is well-suited for complex and creative texts where capturing the essence of the content is crucial. It shines in applications like summarizing news articles, research papers, or even literary works, where understanding the main points and conveying them succinctly is paramount. image.png Comparing to our daily life, we can see that we tend to do extractive summarize whenever a job requires the task. For instance, we have to do research for our companies. We want the research to be done as fast as possible but also be concise. Therefore, we would look up multiple available resources and search for the most valuable information. Practicing abstractive approach requires more than that. This is due to the fact that abstractive summarization techniques handle issues that data-driven techniques like sentence extraction find easier to handle, such as semantic representation, inference, and natural language production.

Code Implementation

Recently, Huggingface just released a noteworthy feature that makes it easier for learners who are not experienced with deep learning to approach LLM in a way that makes a great deal interest.

Imagine having a vast library of the world’s most advanced AI models at your fingertips, ready to tackle your unique tasks with just a few lines of code. That’s precisely what the Hugging Face API-Inference offers, empowering you to harness the power of machine learning without the complexities of training and infrastructure.

Note that you will need huggingface access token, therefore, you should register one. Make sure that you would create one, and ready to stand on the shoulders of giants.

To begin with, you would choose your favor model and implement neccessary lines of code. It can be done by following lines:

import requests
model_id = "facebook/bart-large-cnn" # https://huggingface.co/facebook/bart-large-cnn
API_TOKEN = "..." # Replace your api_token here
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = f"https://api-inference.huggingface.co/models/{model_id}"

def query(payload):
  response = requests.post(API_URL, headers=headers, json=payload)
  return response.json()

Taking the second section of this post as an example, the task would return:

text = """
There are 2 main schools in this interesting field:
Extractive approach: this approach is a technique for creating summaries of texts by selecting and combining existing sentences from the original document. These sentences are chosen based on their relevance and importance to the overall theme of the text. In other words, Extractive summarization picks out the most important pieces of information from each document and puts them together to create a concise overview of the entire topic. Here are some key features of extractive summarization:
Simple and efficient: It's relatively easy to implement and works well on short texts.
Preserves factual accuracy: The sentences are taken directly from the source text, so they are guaranteed to be factual.
Maintains coherence: By selecting sentences that are close together in the original text, the summary generally flows smoothly.
May lack originality: The summary may not contain any new insights or ideas, as it simply reflects the content of the original text.
Potentially misses key points: If important information is not contained in complete sentences, it may be missed by the summarization process.

Abstractive approach: this approach is a sophisticated approach to summarizing text that takes things a step further than its extractive counterpart. Instead of simply picking and choosing existing sentences, it delves deeper, understanding the main ideas and concepts of the original text, and then rephrases them in its own words, creating a concise and informative summary. Here are some key features of abstractive summarization:
Highly insightful: Can capture the main ideas and concepts beyond the surface level of the text, providing deeper understanding.
More concise: Often produces shorter summaries compared to extractive methods, focusing on the most essential information.
Original and creative: Generates new sentences and phrases, potentially expressing insights not explicitly stated in the source text.
More complex to implement: Requires advanced models and training techniques, typically involving deep learning methods like neural networks.
May introduce potential inaccuracies: The generated text, while creative, can deviate from the factual content of the original source.

Comparing to our daily life, we can see that we tend to do extractive summarize whenever a job requires the task. For instance, we have to do research for our companies. We want the research to be done as fast as possible but also be concise. Therefore, we would look up multiple available resources and search for the most valuable information. Practicing abstractive approach requires more than that. This is due to the fact that abstractive summarization techniques handle issues that data-driven techniques like sentence extraction find easier to handle, such as semantic representation, inference, and natural language production.
"""

data = query(
    {
        "inputs": text,
        "parameters": {"do_sample": False},
    }
)

>> [{'summary_text': 'There are 2 main schools in this interesting field: extractive and abstractive. Extractive summarization picks out the most important pieces of information from each document and puts them together to create a concise overview of the entire topic. The abstractive approach delves deeper, understanding the main ideas and concepts of the original text, and then rephrases them in its own words.'}]

Conclusion

Text summarization is a very interesting topic. In fact, it becomes viral and is believed to be a must-have feature when a product comes to the LLM topic.

Thank you for reading this article; I hope it added something to your knowledge bank! Just before you leave:

👉 Be sure to press the like button and follow me. It would be a great motivation for me.

👉 Follow me: Linkedin | Github


All rights reserved

Viblo
Hãy đăng ký một tài khoản Viblo để nhận được nhiều bài viết thú vị hơn.
Đăng kí