To start working on NLP, this is probably the most apt time. Internet connectivity and data accessibility has brought millions of applications in the market today. Here we will see how we can take benefit of this mobile shift with Natural Language Processing.
Natural Language refers to a language that we, humans use for everyday communication such as English, Hindi, or Portuguese. In contrast to artificial languages such as programming languages, mathematical notations etc., natural languages keep evolving with every generation, thus are hard to pin down with explicit rules. Natural Language Processing covers any kind of computer manipulation of natural language. It could be as simple as counting word frequencies to compare different writing styles or it can involve comprehension of complete human utterances, at least to the extent of being able to respond with meaningful answers.
Many applications emerged in the real world following intense and continued research and development.
NLP is trending in the following technology trends:
#Knowledge discovery in texts
#Sentiment Analysis in E-Commerce Websites
#Named Entity Extraction
These are some successful implementations of natural language processing (NLP):
NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.
Install NLTK: sudo pip install -U nltk
Install Numpy (optional): sudo pip install -U numpy
Run python and type : import nltk
Ensure that the NLTK module is installed. On the command line, check for NLTK by running the following command:
$ python -c "import nltk"
If NLTK is installed, this command will complete without error.
In Python’s interactive environment, import the
>>> from nltk.corpus import twitter_samples
Tokenization is the act of breaking up a sequence of strings into pieces such as words, keywords, phrases, symbols and other elements, which are called tokens.
>>> import nltk >>> sentence = "the occupation of taking and printing photographs or making movies" >>> tokens = nltk.word_tokenize(sentence) >>> tokens
This output is a list where each element in the list is a list of tokens of the sentence. Now that we have the tokens of sentence now we can tag the tokens with the appropriate POS tags.
Want to explore more visit NLTK.
The idea is loosely based on the Python NLTK where all algorithms are in the same package.
Installation: You can install via NPM like so:
npm install natural
If you want to install from the source (which can be found here on github), pull it and install the npm from the source directory.
git clone git://github.com/NaturalNode/natural.git cd natural npm install .
Now, let us understand Stemming with an example.
In NLP, stemming is the process of reducing words to their base or root form — generally a written word form.
Here is the small code snippet for stemmer with “natural ”.
var natural = require('natural'), stemmer = natural.PorterStemmer; var stem = stemmer.stem('stems'); console.log(stem); stem = stemmer.stem('stemming'); console.log(stem); stem = stemmer.stem('stemmed'); console.log(stem); stem = stemmer.stem('stem'); console.log(stem);
stem stem stem stem
In case you want to explore more about stemming, visit github .
When using ML techniques in NLP, you should always pay attention to what information you need to feed your algorithm and how you can represent that information to get the best results.
Stay tuned for next update !!!November 14, 2017