When it comes to data analysis, the knowledge of data wrangling or putting the available dataset in correct shape and format is as necessary as the actual operation of data analytics.
Normally, data doesn’t come in a shape that we desire, and so knowing how to do that becomes essential. Although we have a lot of software tools out there to wrangle the data, in this article I will show you how to use R and Python to do the job.
The main operations that will be discussed are:
Mahabharata is an ancient Indian epic that narrates a story of an establishment known as Hastinapur. The focus of the story is the two groups of cousins (Kauravas and Pandavas)who ultimately went for war against each other. The story beautifully describes the development of each character and the dilemmas of right and wrong around them. The story has been given the shape of a Television series by a very famous Indian producer and director called B. R. Chopra, the episodes of which are still available on YouTube.
This article revolves around a very popular episode of the same series where the cousins play the game of Chausar which later on become the reason for the great war of Kurukshetra. This article is based on a TV episode and may not be compared with the literature in books. …
Not only Model Training, everything in life demands a sequence. Take an example of lighting a gas stove:
1) You turn on the gas pipeline valve to the stove.
2) You switch on the knob of the stove.
3) You pass a spark to light it.
Can you rearrange the above three steps and still light up the stove? The answer is no, try any combination, say passing a spark first and then turning on the stove knob & then lining up the gas pipeline. Yes, you guessed it right, there would be no flame. Sequences are often defined by engineers to operate machines and they call them SOPs (Standard Operating Procedures). A simple SOP for operating a pump would be to first lining out the suction and discharge and then turning it on. The SOPs are defined to make the process smooth, efficient, and error-free. …
Pandas is an extensive Python library used by Data Scientists in a varied number of fields. If you have worked in Microsoft Excel or MySQL, you are already familiar with relational data structures. You must be comfortable with the data arranged in rows and columns. I came to know about the library while learning machine learning in Python. Pandas not only offer you tools to carry out extensive data analysis and statistical inferences but because it is built upon Numpy (another important Python library) it proves to be a platform to build highly complex machine learning and artificial neural network algorithms. The three main data structures provided by Pandas are Data Series (One dimensional), Data Frames (Two dimensional), and Data Panels (multi-dimensional). Pandas is equipped with an enormous number of functions and classes that give you enough power to deal with any data analysis problem. We will in this article discuss the three main functions for data frames that not only find application in almost every problem/project but if understood can prove pivotal. …
This article is also published here
DBSCAN stands for Density-Based Spatial Clustering Application with Noise. It is an unsupervised machine learning algorithm that makes clusters based upon the density of the data points or how close the data is. That said, the points which are outside the dense regions are excluded and treated as noise or outliers. This characteristic of the DBSCAN algorithm makes it a perfect fit for outlier detection and making clusters of arbitrary shape. The algorithms like K-Means Clustering lack this property and make spherical clusters only and are very sensitive to outliers. …
This article will try to:
What you are already supposed to know
You are given a problem of predicting the gender of a person based on his/her height. To start with you are provided the data of 10 people, whose height and gender are known. You are asked to fit a mathematical model in this data in a way that will enable you to predict the gender of some other person whose height value is known but we have no information about his/her gender. This type of problem comes under the classification domain of supervised machine learning. If the problem demands from you to make classifications into various categories like True, False or rich, middle class, poor or Failure, Success, etc then you are dealing with the classification problems. Its counterpart in machine learning is a regression problem where the problem demands from us to predict a continuous value like marks = 33.4%, weight = 60 Kg, etc. …
Regression is most probably the first machine learning algorithm that one learns. It is basic, simple and simultaneously a very useful tool that solves a lot of machine learning problems. This article is about Ridge Regression, a modification over the Linear Regression to make it more suitable for feature selection. The whole story is divided into four equally important parts as mentioned below:
This article will try to:
What you are already supposed to know:
· Basic mathematics and Euclidian geometry
In machine learning, one of the frequently encountered problems is grouping similar data together. You know the income level of people and now you want to group people with similar income levels together. You want to know who are the people with low income power, the people with high or very high income power, which you think can be helpful in devising a perfect marketing strategy. …
This article will discuss:
What you are already supposed to know:
When it comes to classification problems in machine learning, algorithms like Logistic Regression, Discriminant Analysis, etc. are the ones that come into one’s mind. There is another type of classification algorithm which is highly intuitive, easy to interpret and understand called the Decision Tree algorithm. The tree-based models can be used for both regression and classification but we will discuss only the case of classification here. …
Note: The currency values mentioned in the article are in Indian Rupees
The widespread use of the internet and modern-day advancement in the same has caught the eye of every entrepreneur. The ways of doing online business are enormous. From selling goods online as Flipkart and Amazon, selling entertainment online like Netflix to online advertisement companies like TVF, almost every sector of business has shifted to the cyber world now.
While an online business has its own perks, it is no less of a challenge to start one. …