When it comes to data analysis, the knowledge of data wrangling or putting the available dataset in correct shape and format is as necessary as the actual operation of data analytics.

Normally, data doesn’t come in a shape that we desire, and so knowing how to do that becomes essential. Although we have a lot of software tools out there to wrangle the data, in this article I will show you how to use R and Python to do the job.

The main operations that will be discussed are:

  1. Data Frame Subsetting
  2. Applying functions over rows and columns
  3. Grouping and…


Mahabharata is an ancient Indian epic that narrates a story of an establishment known as Hastinapur. The focus of the story is the two groups of cousins (Kauravas and Pandavas)who ultimately went for war against each other. The story beautifully describes the development of each character and the dilemmas of right and wrong around them. The story has been given the shape of a Television series by a very famous Indian producer and director called B. R. Chopra, the episodes of which are still available on YouTube.

This article revolves around a very popular episode of the same series where…

Not only Model Training, everything in life demands a sequence. Take an example of lighting a gas stove:

1) You turn on the gas pipeline valve to the stove.

2) You switch on the knob of the stove.

3) You pass a spark to light it.

Can you rearrange the above three steps and still light up the stove? The answer is no, try any combination, say passing a spark first and then turning on the stove knob & then lining up the gas pipeline. Yes, you guessed it right, there would be no flame. Sequences are often defined by…

Pandas is an extensive Python library used by Data Scientists in a varied number of fields. If you have worked in Microsoft Excel or MySQL, you are already familiar with relational data structures. You must be comfortable with the data arranged in rows and columns. I came to know about the library while learning machine learning in Python. Pandas not only offer you tools to carry out extensive data analysis and statistical inferences but because it is built upon Numpy (another important Python library) it proves to be a platform to build highly complex machine learning and artificial neural network…

DBSCAN stands for Density-Based Spatial Clustering Application with Noise. It is an unsupervised machine learning algorithm that makes clusters based upon the density of the data points or how close the data is. That said, the points which are outside the dense regions are excluded and treated as noise or outliers. This characteristic of the DBSCAN algorithm makes it a perfect fit for outlier detection and making clusters of arbitrary shape. The algorithms like K-Means Clustering lack this property and make spherical clusters only and are very sensitive to outliers. …

This article will try to:

  • Discuss the idea behind logistic regression
  • Explain it further through an example

What you are already supposed to know

You are given a problem of predicting the gender of a person based on his/her height. To start with you are provided the data of 10 people, whose height and gender are known. You are asked to fit a mathematical model in this data in a way that will enable you to predict the gender of some other person whose height value is known but we have no information…

Regression is most probably the first machine learning algorithm that one learns. It is basic, simple and simultaneously a very useful tool that solves a lot of machine learning problems. This article is about Ridge Regression, a modification over the Linear Regression to make it more suitable for feature selection. The whole story is divided into four equally important parts as mentioned below:

  1. Linear Regression: The basic idea of Ordinary Least Squares in the linear regression is explained.
  2. Feature Selection: What feature selection in machine learning is and how it is important is illustrated.
  3. Parameter calculation: What parameters are calculated…

This article will try to:

  • Explain graphically the basics of K-Means Clustering, an unsupervised machine learning algorithm
  • Discuss the method of finding optimum value of K and centroid location of the clusters

What you are already supposed to know:

· Basic mathematics and Euclidian geometry

In machine learning, one of the frequently encountered problems is grouping similar data together. You know the income level of people and now you want to group people with similar income levels together. You want to know who are the people with low income power, the people with high or very high income power, which…

This article will discuss:

  • Decision Trees — A famous classification algorithm in supervised machine learning
  • The basics behind decision trees and how to develop one without a computer

What you are already supposed to know:

  • Basics of statistics and probability

When it comes to classification problems in machine learning, algorithms like Logistic Regression, Discriminant Analysis, etc. are the ones that come into one’s mind. There is another type of classification algorithm which is highly intuitive, easy to interpret and understand called the Decision Tree algorithm. The tree-based models can be used for both regression and classification but we will discuss…

Note: The currency values mentioned in the article are in Indian Rupees

The widespread use of the internet and modern-day advancement in the same has caught the eye of every entrepreneur. The ways of doing online business are enormous. From selling goods online as Flipkart and Amazon, selling entertainment online like Netflix to online advertisement companies like TVF, almost every sector of business has shifted to the cyber world now.

While an online business has its own perks, it is no less of a challenge to start one. …

