Rules in Natural Language Processing

Rules in Natural Language Processing

Hi guys, in this article, I will be giving a brief Introduction about some of the most common rules which needs to be followed while performing any NLP task.

This is the second article in my NLP series. If you haven't gone through my previous article A Brief Introduction to NLP, I suggest you please go through it as it will help you to get a better understanding about NLP and why we need these rules.

So, lets begin our NLP journey.

Rules in Natural Language Processing:

For any machine to work on NLP, it must know some standard NLP rules including phonology, semantics, syntax, morphology, and pragmatics, above all – ambiguity. But before that, let us understand what a normal NLP workflow looks like.

Standard NLP workflow

NLP workflow.png

NLP works on the major workflow model that is a step-by-step process to reach the desired output. The whole process includes text wrangling, preprocessing, parsing, and outcome.

Now, let’s start by explaining each and every rule one by one.

1. Words in a Sentence

In any NLP task, the first thing which we should cover is the nature of words which includes adjectives and noun phrases. The verb, tense, infinitive form, number, person, etc. are all the part of PoS (Part of Speech tagging) which is a technique used in any NLP task. We will get to know more about this later on. Now, in a sentence, all the necessary information is explained in the part of speech, inflected forms, verbs, nouns, and so on that are used to compute an output.

For example:

John bought a book

For each word in the sentence, we will determine the type of word it is like Noun, verb etc.

John -> Proper Noun

Bought -> Verb- past

a ->Determiner

book -> Noun

The underlying idea is that given a sequence of words with their respective tags, we can decide, for the next word, the most likely PoS.

2. Words to Sentence

We know that NLP can easily work around the words and understand the instances. But what about the syntax? Where is NLP going to use it? It can be a bit tricky to understand syntax. The words are grouped together but in a relatable format in the chunks units. NLP uses Parsing to analyze the sentences as per the grammar of the NLP.

What is Parsing: Parsing is the process of determining the syntactic structure of a text by analyzing its constituent words based on an underlying grammar (of the language). See this example grammar below, where each line indicates a rule of the grammar to be applied to an example sentence “Tom ate an apple”.

tom.png

3. Meaning of Words:

The meaning behind the words can be extremely confusing sometimes. The use of the word “bank” can be used in two contexts that are easy to understand for us Humans but for a computer, it can be confusing.

This leads to two major concern that NLP faces

Synonymy – Words with similar meanings
Polysemy – Words with several meanings

For instance,

He made the world record.

We have a record of the conversation.

The syntax and PoS tagging are similar in the above sentences that might make it difficult for the NLP to understand it. In this type of phrase, a deep approach is followed that uses world knowledge. The knowledge helps in removing ambiguity and place the right meaning to the words.

4. Pragmatics

The context of the sentence is the next thing that is essential to extract the actual meaning. Is it a joke? Sarcastic comment? Serious comment? These things hold a lot of importance when it comes to analyzing the data.

For instance,

Will – That was a dumb move.

James – Well, thank you, that’s so sweet of you.

Why dumb is related to sweet? What was it – sarcasm, joke, or plain irony? These fall under the complex mechanism of the NLP that works on the intent of the words. A classifier can be trained to determine what the tweet or status is all about.

It can include word frequency or even the adjectives such as exaggeration or unexpectedness that are added. However, there is always room for improvement and with time the NLP system can adopt the intentions as well.

5. Syntax & Structure

When it comes to programming languages, then you need to know that syntax and structuring always go hand in hand. In NLP this includes convention, rules, principles of the words, phrases, clauses, and so on.

These come hand in a number of ways including parsing, annotation, and text processing. However, to grasp it you need to know that it holds a lot of value with the major text syntax or more of the grammar in the NLP.


Thats all for this article. I hope it has helped you understand the basic rules of NLP and why these are important.

For an introduction to NLP, please go through my previous article.

In my next post, I will be continuing this series and we will look at some NLP tasks like Tokenization, Stemming etc with their pratical implementation.

Please, do share your experience with natural language processing and how you think it can help you in the comments section.

You can find all my other code at my GitHub and articles at my blog . Drop a star or like if you find it useful.

Thank you for reading, I would love to connect with you at LinkedIn.

Do share your valuable feedback and suggestions!

Till than, Happy NLP.