We have loads and loads of text data sitting to be examined and analysed. But we cannot directly go ahead and use the raw text data as it is for our machine learning and deep learning models, it needs to be cleaned and preprocessed.

When we say the “data needs to be cleaned” it means that the data contains unecessary elements that need to be removed, some information needs to be fetched from the raw data that can further help us build the model more effectively.

We’ll see some of the most common methods to clean the text data and…


Photo by Sigmund on Unsplash

While working for text data there comes a time where we need to correct the spellings of the words in our corpus.

Datasets like customer reviews(movie reviews, hotel reviews, amazon reviews, etc) and conversation files, etc contain many typographical errors that need fixing for better analysis.

Let’s work on the Spelling Corrector

We are going to use a python library called textblob, for our python spell corrector. Using textblob library, we can create Machine Learning Models for the task of Spelling Corrections. Detecting actual word spelling errors is a much more difficult task, as any word in the input text can be an error.

But it’s…


Photo by NeONBRAND on Unsplash

Before we start making a chatbot, let’s learn some basic information about Chatbots

What is a Chatbot ?

A computer program anybody can talk to with normal language.

No matter what type of chatbot it is, they all have a similar purpose — to take regular human language input, understand what is being said and to provide a relevant, correct answer based on the knowledge it has.

Why Use Chatbots?

Chatbots excel at completing repetitive tasks and work around the clock. They can work alone or alongside humans, and are effective at completing 60–90% of an average human team’s workload, depending on the use case.

Types of Chatbots

Rule-based bots

With this type…


Photo by Compare Fibre on Unsplash

We have IMDB data for movie reviews and their sentiment whether it’s positive or negative, we’ll use machine learning to create a binary classifier for the reviews.

We’ll start with downloading the IMDB dataset-

https://www.dropbox.com/s/mdvgzifpfdd05iv/IMDB-Dataset.csv?dl=0

Importing Libraries

Let’s start with importing libraries :

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import re
from nltk.stem import PorterStemmer
import spacy
nlp=spacy.load("en_core_web_sm")
from spacy.lang.en.stop_words import STOP_WORDS as stopwords
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from sklearn.metrics import confusion_matrix
from sklearn.ensemble import RandomForestClassifier from…

Photo by Caspar Camille Rubin on Unsplash

As a Machine Learning aspirant we need to read csv/excel files to perform data anaysis but then comes a time when we have to go deeper into loading data and we have to fetch data from a SQL database or a SQL table.

We’ll perform a basic dataloading from a MySQL database table to a pandas DataFrame.

Requirements

We need the following things installed on your machine with basic knowledge of working on pandas and mysql:

  1. a sql client, we’ll use MySQL for our operations
  2. Python, pandas,pymysql.
  3. sqlalchemy, if not installed, use : pip install sqlalchemy

Let’s Start !!

We’ll start with installing packages…

Damanpreets

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store