What is Machine Learning?

 


Introduction

In the field of AI, one of the major disciplines is Machine Learning. If you need to learn about AI this post will be helpful. 
"Machine learning (ML) is the study of computer algorithms that improve automatically through experience."

 This is how Wikipedia defines Machine Learning. In other words, Machine Learning is the discipline in which we let the computer itself learn through data rather than hard coding the solving algorithms/ logic. We allow the computer to figure out the solving algorithm for a particular problem by exposing more and more data relevant to the problem. 

In other words, Machine-learning algorithms use statistics to find patterns in large amounts of data. And data, here, encompasses a lot of things—numbers, words, images, behaviors, action items, and many more things. If that data can be digitally stored, it can be fed into a machine-learning algorithm.

Humans' major role in Machine Learning is to correctly choose 

  • A rich (quality and quantity wise) data set, 
  • A learning/ training algorithm 
  • Fine-tuning learning parameters in the learning-algorithm and
  • Identify suitable features that suit the problem.

The Difference with the Traditional Programming

If you are new to ML the following example will give you a basic intuition of how ML actually works.

 In conventional programming what we do is, we write the algorithm/ function/ logic to solve a particular problem, and then whenever we feed the program relevant inputs, according to the algorithm/ function/ logic it will simply give us the answer. The quality of the program will rely upon the quality of the algorithm/ function/ logic developed by the programmer.

Figure 1: Traditional Programming Process


Unlike in traditional programming, in Machine Learning we need many input-output data points which is generated by the system to which the algorithm/ function/ logic is going to be applied. From these data, the ML learning algorithm will automatically formulate the system algorithm/ function/ logic which maps system inputs and outputs. So the quality of the solution will rely on the learning algorithm as well as the richness of the data set (i.e. The data set consists of all possible input-output scenarios). Compared to traditional programming, the ML process has two stages,
  1. Learning Phase
  2. Prediction/ Inference Phase

Learning Phase
Predicting/ Inference Phase

Figure 2: ML process


Good Features for Learning

A feature is an individual measurable property or characteristic of data. Choosing informative, discriminating, and independent features is a crucial step for effective algorithms in Machine Learning. Features are usually numeric. See the following example.

When we try to predict our Uber (A taxi service) taxi fare, we know that our features that determine the fare are the distance between pickup and drop off locations, time of the day - because different times of the day have different fares - and the day of the week - is it a holiday or not? They also consider the traffic congestion of the route. We can add more features such as weather condition-related features. Is it snowing or not? Is it raining or not? Because when it's snowing, we will expect taxi fares to rise. So if you are trying to solve determining the taxi fees using Machine Learning techniques then these are what your features should be.

So if your Machine Learning problem is a big-data/data-science-related task, then most of the time the situation will be like determining the taxi fees.

When it comes to images and video-related Machine Learning tasks, there are also some special kinds of features. As examples, you can use directly the images/ frames (pixel values) as features, Edges, Blobs, HOG features, SIFT features, SURF features, Optical Flow features, and Segmentation data as features. 

In Natural Language Processing you can use meta-features, such as word counts, stop word counts, punctuation counts, the length of characters, the language of text, and many more. Other big types of features are text-based features like tokenization, vectorization, stemming, part of speech tagging, and the name of the entity extraction, etc.

Figure 3: Optical Flow Features

Figure 4: Edge Features


A Simple Intuition

Assume that we want to find out the driving logic of a system that actually behaves like y = mx + c (which we do not know beforehand) and we are going to apply an ML technique to find out the underlying logic.

Step 1:
We collect many input-output data points from the system by giving almost all the possible inputs to the system.
Figure 5: A sample data-set


Step 2:
We run the training on these collected data points and let the ML algorithm learn from the data to identify the underlying algorithm. Here in this example, the learning algorithm will correctly identify the and c values of y = mx + c relationship. This is called the learning phase.


Step 3:
Since we have the driving function, now we can get the result for any arbitrary input to the system. This is called the inference phase. 
Figure 6: Learning Phase and Regression Phase


In traditional programming, we need to find out the m and c values of the relationship in some other method. So in every sense of the word Machine Learning is letting the computer itself do the learning part on behalf of us.

I hope this example gives you a rough understanding of ML but, do not think that ML is as simple as it seems in this example we just need to give a bunch of input-output pairs to the computer and the rest will be done by the computer for you..! Then why do companies invest so much in ML research and development? ML is much more complex than it seems in this simple example.

Types of Machine Learning Problems


Mainly there are two types of ML problems
  1. Classification
  2. Regression

Classification

A classification problem is a problem where we are using data to predict which category something falls into. An example of a classification problem could be analyzing an image to determine if it contains a car or a person, or analyzing medical data to determine if a certain person is in a high-risk group for a certain disease or not. In other words, we are trying to use data to make a prediction about a discrete set of values or categories.

Figure 7: Classification

Regression 

Regression problems on the other hand are problems where we try to make a prediction on a continuous scale. Examples could be predicting the stock price of a company or predicting the temperature tomorrow based on historical data. The simple example I discussed in the previous heading is a regression example.
Figure 8: Regression

Some ML problems may be pure classification or pure regression problems but some may have a classification part as well as a regression part.

Learning Methods

There are three main ML learning methods that we can talk about.
  1. Supervised Learning 
  2. Unsupervised Learning 
  3. Reinforcement Learning 

Supervised Learning

This type of algorithm consists of a target/outcome variable (or dependent variable) which is to be predicted from a given set of predictors (independent variables). Using this set of variables, we generate a function that maps inputs to desired outputs. The training process continues until the model achieves the desired level of accuracy on the training data. 

Examples of Supervised Learning Algorithms: RegressionDecision Tree, Random Forest, KNN, Logistic Regression, etc.

Unsupervised Learning

In this type of algorithm, we do not have any target or outcome variable to predict/estimate. It is used for clustering populations in different groups, which is widely used for segmenting customers into different groups for specific interventions. 

Examples of Unsupervised Learning: K-means Clustering

Reinforcement Learning

Using this type of algorithm, the machine is trained to make specific decisions. It works this way: the machine is exposed to an environment where it trains itself continually using trial and error. This machine learns from past experience and tries to capture the best possible knowledge to make accurate business decisions. 

Today one of the most successful general-purpose AI projects is said to be the Google DeepMind. The learning method they have used to train this AI is Reinforcement Learning. In the journey of developing a perfect AI, the major learning technique will be Reinforcement Learning. You will learn more about AI from this previous post about AI.

Examples of Reinforcement Learning: Monte Carlo, Q-learning, DDPG



Machine Learning Algorithms

There are many ML algorithms developed over time. Some popular algorithms are,

Comments