Planning before Starting a Project/Thesis

Planning before Starting a Thesis

  • Research Background

There are numerous strategies for steaming bent words for various dialects, yet not many works for Bangla word stemming. Consequently, stemming the Bangla word is an unsolved issue. There is a wide range of circumstances that can happen in the Bangla language for word steaming. In our report, we develop a rule-based algorithm to stem Bangla words.


  • Problem Statement

Here, we attempt to eliminate originate from the word utilizing rule-based methodology. We need to eliminate postfix from the word. For this situation, we need to check which postfix really we need to eliminate. So we need to discover explicit addition to make any expression of its comparing root word. We observe some CFG rules which cause us to recognize intonation from the word. We eliminate the right expression from the word and locate our appropriate bent word.

As stemming Bangla word is not available or implement widely, the users who want to use Bangla may face many problems. 



  • Motivation

As stemming Bangla word is not available or implemented widely, the users who want to use Bangla may face many problems. 

As stemming Bangla word is not available or implemented widely, the users who want to use Bangla may face many problems. So, to make their easier stemming Bangla words, we develop the Bangla stemmer. 


  • Objective of Research 

This research focuses on the development of new techniques for ….

  1. To convert any word in the root word.
  2. To reduce word
  3. To remove the suffix from a word.
  4. Organization of the Thesis

The thesis comprises five chapters. The thesis organization is generally described as follows:

The first chapter presents the inception of the thesis. The problem and research motivation are described in this chapter. The research objectives of the thesis are also outlined.

The second chapter describes the overview of the previous work.

The third chapter describes the methodology of our proposed framework.

The fourth chapter describes the output or experimental result of our thesis.

The fifth chapter describes the future scope of this work, and the conclusive words about the method are outlined in this chapter.

Literature Review

    1. Literature Review
      1. Introduction

 In the ongoing year, Stemmer extraordinarily affects different exploration fields. So concentrate on this subject has begun from the earliest starting point. Here we examine the stemmer and lemmatizer, which causes us to stem the correct word.


  • Scope of Research

In our thesis, we convert any word into the root word. It encourages you to pack any information. We can address the information utilizing not many words. A few words really address straightforward sentences. At times a few words have addressed the entire sentence. If we convert the word into the root word, we can address a sentence utilizing some word that really causes you to locate the significant word and pack the information.


  • Literature Review

Here are some related works which are done before:

In the paper [1], they examined another standard-based stemmer. This stemmer attempt to discover affectation in the word and attempt to get curved from the word. They built up a stemmer that works stepwise and assisted us with getting legitimate stem word from unique .they said as they utilize stepwise interaction, they need less time in preparing and less addition check for this situation. They got 88 % exactness. Be that as it may, they work just action word and thing word, and it doesn't work in blocked word.


In the paper [2], they proposed a recursive principle-based stemmer. They repeat the same word in the system. It checks the longest substring and eliminates this addition. This interaction recursively takes out numerous postfixes from a solitary word and retrieve more relative root. The proposed strategy stemmed the word in all the potential ways. They got precision 92%.


In the paper [3], They built their own stemmer and utilized this stemmer to construct a pos tagger. They proposed a computerized pos labeling framework and here add 45000 words. The proposed model was tried in the Bangla corpus. They named the word utilizing stemmer—primary concentration in this framework postfix based framework.


In the paper [4], they discussed a standard free based stemmer. Text mining has happened to uncommon interest with the enormous development of literary information. It requires some preprocessing steps. Stemming is one of them. The proposed technique is RFreeStem.The most generally utilized stemmer depends on guidelines to be applied to the words to eliminate superfluous parts. Several rules rely upon language.


 Two unique sorts of language structures logical and manufactured. Eliminating all attach require an incredible number of rules. We proposed to decide on free stemmer that can be applied to a different language. The most well-known works are rule-based and corpus based-strategies. An improvement of this technique should be possible by refining our dendrogram cutting strategy.

In the paper [5], they described a standard based stemmer. He handles addition from the word. He proposed a calculation that works in close space. It proposed a few standards to build up a standard based stemmer. This calculation assists with building up a spell checker and recover the significant data.


In the paper [6], they examined another standard based methodology. In any case, in this paper, they proposed a framework that is really discovering the stem of the word. They have some stem word datasets and utilizing this dataset, they utilize the animal power technique to discover the root word. They have 89% precision.


In the paper [7], they convert words into vectors to achieve those works in the neural organization. It proposed another neural organization design. They accept Bangla as a source of perspective language. They got exactness 69.57%. They utilize two Bangla testing and preparing datasets for this reason. They said the token-based measurable lemmatization technique needs additional preparation sets. 


This interaction needs additional time in preparation. It utilized 10 overlap approval to separate preparing and testing dataset. They have 50 k-word pulls. It is hard for them to deal with this tremendous word in the neural organization. They proposed an effective neural organization engineering which less time in preparing.


  • Summary

Although we have dealt with some issues during the standards age, we have conquered them effectively to add some extraordinary principles. We have impeccably developed an effective calculation and strategy for Bengali stemming. That is more wonderful than the existing procedure in the viewpoint of exactness and intricacy. This procedure calculation can be utilized in numerous angles to improve the framework that depends on the Bengali Language. Stemmer assists with recovering the information from the huge data set. Once in a while, an enormous information base has immense information. We need just some from that point. It is hard for us to look through each word from the information base. We need watchword finding for this situation. So we convert all the words into root words it causes us to identify the significant words.

    1. Methodology
      1. Introduction 

In our thesis, we have proposed a stemmer dependent on a rule-based approach. We partition the sentence into various grammatical features. We basically center around Verb, number (organism, substance). We gather a few guidelines from Bangla language grammar books. We proposed a calculation that assists with stemming the word.

  • Methodology 
    1. Text normalization

The normalizing text implies changing it over to a more advantageous, standard form. Textnormalizationis the way toward changing content into a sanctioned (standard) structure. For example, the word “food” and “fud” can be transformed into “food” in their canonical form. Another example is mapping of near-identical words such as “stopwords,” “stop-words,” and “stop words” to just “stopwords.”

Every NLP task needs to do text normalization:

Segmenting/tokenizing words

Normalizing word formats

Segmenting sentences in running text

  •  pre-processing

we have to overcome the below four steps, data collection, sentence tokenization, removing punctuations, and removing stop words before stemming is done.

Post a Comment

Never enter the spam link in the comment section. If you have any inquiry, please let me know in the comment section.

Previous Post Next Post