Digital Image Processing

Digital Images

The definition from Wikipedia is,

"A digital image is an image composed of picture elements, also known as pixels, each with finite, discrete quantities of numeric representation for its intensity or gray level that is an output from its two-dimensional functions fed as input by its spatial coordinates denoted with x, y on the x-axis and y-axis, respectively"
In simple terms, a digital image is an image that is made out of a large number of square-shaped pieces called pixels whose magnitudes represent the color level/ intensity of a particular place in the image. Simply a digital image is just like a jigsaw puzzle whose pieces are square-shaped.

Figure 1: A zoomed digital image
Figure 1: A jigsaw puzzle




Digital images are created/ captured using digital computers or using digital image sensors (Digital cameras). The images with file extensions like .jpg, .gif, .png, .tif, .bmp, .psd; plus .eps are digital images. 

Digital Image Processing (DIP)

Digital image processing deals with the manipulation of digital images through a digital computer. It is a sub-field of signals and systems but focuses particularly on images. DIP focuses on developing a computer system that is able to perform processing on an image. The input of that system is a digital image and the system process that image using efficient algorithms, and gives an image as an output. 

The most common example is Adobe Photoshop. It is one of the widely used applications for processing digital images. You can learn how the magic inside Adobe Photoshop works and many more things in the field of DIP.

Figure 3: DIP System block diagram
source: https://www.tutorialspoint.com/dip/index.htm

Image Coordinate System


For indexing each pixel in a digital image, a coordinate system is defined. The origin of the image coordinate system is defined at the left top corner of the image. And the positive 'x' direction will be along the width and the positive 'y' direction will be down the height.

Figure 4: Image Coordinate System


Measure Image Size (Bits per Pixel - bpp)

Bpp or bits per pixel denotes the number of bits per pixel. The number of different colors in an image depends on the depth of color or bits per pixel. By default in most of the image processing libraries, an image is loaded with bpp 8 format (uint8 format) which means a pixel value can range from 0 - 255 (i.e: 2^0 to 2^8 -1). You can change this according to your need to any other data format like,
  • binary - 1bpp
  • int8 - 8 bpp
  • int16 - 16 bpp
  • int32/ Long int - 32 bpp
  • Float - 32 bpp
  • Double - 64 bpp
The size of an image depends on three things.
  • Number of rows (in pixels)
  • Number of columns (in pixels)
  • Number of bits per pixel
The formula for calculating the size of an image is given below.

Size of an image = rows x columns x bpp
As an example, a gray-scale image with 1024 rows and it has 1024 columns. Since it is a grayscale image, it has 256 different shades of gray (0 - 255) or it has 8 bits per pixel. Then putting these values in the formula, we get

Image size = 1024 x 1024 x 8 bits
           8388608 bits
            = 8388608 / 8 = 1048576 bytes
           = 1048576 / 1024 = 1024kb
           1024 / 1024 = 1 Mb

Color Spaces

Color spaces are different types of color modes, used in image processing and signals and systems for various purposes. Some of the common color spaces are,

Binary/ Black and White Images

This is not considered a color space but good to know about this. 

A binary image is one that consists of pixels that can have one of exactly two colors, usually black and white. Binary images are also called bi-level or two-level. This means that each pixel is stored as a single bit (i.e., a 0 or 1). The name is black-and-white, B&W is often used for this concept, but not to be misunderstood with gray-scale images. In Photoshop parlance, a binary image is the same as an image in "Bitmap" mode.

Figure 5: A Binary Image


Gray-scale

A gray-scale picture just needs intensity information - how bright is a particular pixel. The higher the value, the greater the intensity. So for a gray-scale image, all you need is one single byte (normally uint8/ 8 bpp is used) for each pixel. One byte (or 8 bits) can store a value from 0 to 255, and thus you'd cover all possible shades of gray.

Figure 6: Gray-scale color range


Figure 7: A Gray-scale Image

RGB

RGB is the most widely used color space. The main purpose of the RGB color model is for the sensing, representation, and display of images in electronic systems, such as televisions and computers. What RGB model states, that each color image is actually formed of three different images. Red image, Green image, and Blue image. A normal gray-scale image can be defined by only one matrix/color component, but a color image is actually composed of three different matrices/ color components.

RGB uses additive color mixing because it describes what kind of light needs to be emitted to produce a given color.

One color image matrix = red matrix + blue matrix + green matrix

Figure 8: RGB Additive Color Mixing
source: https://en.wikipedia.org/wiki/Color_space


Figure 9: RGB Components


CMY and CMYK

These are the color models/ spaces used in the Printing process. 

The CMY and CMYK color models are subtractive color models (RGB is an additive color model). They use subtractive color mixing used in the printing process because it describes what kind of inks need to be applied so the light reflected from the substrate/ base medium and through the inks produces a given color. 

The name CMY comes from the initials of the three subtractive primary colors: cyan, magenta, and yellow. 

CMYK is used in color printing and is also used to describe the printing process itself. CMYK refers to the four ink plates used in some color printing: cyan, magenta, yellow, and key (black). One starts with a white substrate (canvas, page, etc.), and uses ink to subtract color from white to create an image. CMYK stores ink values for cyan, magenta, yellow and black.

Figure 10: CMY Subtractive Color Mixing
source: https://en.wikipedia.org/wiki/Color_space

Y'UV

Y’UV defines a color space in terms of one luminous/ brightness/ intensity (Y’) and two chrominance/ color (U and V) components. The Y’UV color model is used in the following composite legacy analog color video standards. 

Y’CbCr color model contains Y’, the luminous component, and cb and cr is the blue-difference and the red-difference chrominance components. Its common applications include JPEG and MPEG compression.  

Y’UV is often used as the term for Y’CbCr, however, they are totally different formats. The main difference between these two is that Y'UV is analog while Y'CbCr is digital.

Figure 12: Y'CbCr Components
source: https://www.tutorialspoint.com/dip/introduction_to_color_spaces.htm


Basic Concepts in DIP 

The following concepts are essential to properly learning DIP.

Image Kernels

An image kernel is a small matrix used to apply effects like the ones you might find in Photoshop such as blurring, sharpening, outlining, or embossing. They're also used in machine learning for 'feature extraction', a technique for determining the most important portions of an image. Some simple kernels and their effect after convolution (we will discuss soon after this) is shown below. You can predict the effect of a kernel just by analyzing the kernel (After learning what happens in convolution).

Figure 13: Image Left Shifting


Figure 14: Image Smoothing/ Blurring

Figure 15: Image Sharpening

Image Filters

Simply a set of kernels stacked together is called an Image Filter (A 3D element). This could be even a single Kernel. The terms Kernel and Filter are used as synonyms but, they have two meanings. This difference can be understood very well in the study of Convolutional Neural Networks. Here in this post, you can use both Kernel and Filter as synonyms since we are talking about filters with only one kernel. 
Figure 16: Image Filter Example

Convolution

Convolution is the mathematical term used to explain the process of using image kernels/ filters with a digital image. If the image is denoted as f and the image Kernel is denoted as w then the convolution process or applying w to f can be denoted as w * f. The asterisk mark (i.e.: *) denotes the convolution operation. 

The mathematical definition of the 2D convolution (which is related to DIP) is as follows. 

A graphical representation of convolution will make it easy to understand. Assume we are applying the kernel, 

0

-1

0

-1

5

-1

0

-1

0


to an image. Convolution works by determining the value of a central pixel by adding the weighted values of all its neighbors together. Since the kernel is 3 x 3 we apply the convolution to image patches of 3 x 3 as well. The first 3 x 3 image path is colored purple in Figure 17. We simply
  • Place our kernel matrix on top of the image patch,
  • Multiply the overlapping image pixel values by the relevant kernel element value (element-wise multiplication),
  • Sum up them together and (105 x 0 + 102 x -1 + 100 x 0 + 103 x -1 + 99 x 5 + 103 x -1 + 101 x 0 + 98 x -1 + 104 x 0 = 89)
  • Assigns it to the middle pixel of the 3 x 3 image patch.
  • Shift the kernel by one pixel by one to the right and apply the same process up to the right corner
  • Then move the kernel one pixel down and continue the process until we cover the entire image.  
The process is simple as it it a repeating of shift --> multiply --> sum. 
Figure 17: How Convolution is done
source: https://embarc.org/embarc_mli/doc/build/html/MLI_kernels/convolution_2d.html

You will realize that in this process we are unable to fill the border pixels due to not being able to find a 3 x 3 image patch. Due to this issue, the resultant image will be smaller than the original image. If we keep on applying several filters to an image then the final result will be much smaller than the original input image. 

One solution for this is replicating/ padding the edge pixels around the image and doing the convolution. There are many other padding techniques to handle this issue. The following demonstration will give you a better idea of convolution.

Convolution is the major operation we used in DIP to do certain filtering and also to find out image features (edges, corners, blobs, etc.)

Anyway, the operation referred to as convolution in image processing and machine learning fields has some slight differences from the original mathematical convolution definition. If you are interested please check this link.

Video Processing

A video is nothing but a fast-moving set of frames/ images. Therefore all the image processing techniques we apply to an image can be applied to a video (for the frames of the video) as well. In addition to an image you have to consider some additional details as well in video processing but if you are familiar with image processing, video processing will also be an easy field for you to study.

Tools and Libraries for Computer Vision

Computer vision is the field of study of both image processing and video processing. There are so many open-source as well as closed-source tools and libraries developed for this field. Some popular tools and libraries are,

OpenCV

OpenCV (Opensource Computer Vision) is an open-source free computer vision library that contains many different functions for computer vision and machine learning. It has many different algorithms related to computer vision that can perform a variety of tasks including facial detection and recognition, object identification, monitoring moving objects, tracking camera movements, tracking eye movements, extracting 3D models of objects, creating an augmented reality overlay with a scenery, recognizing similar images in an image database, etc. OpenCV has interfaces for C++, Python, Java, MATLAB, etc. and it supports various operating systems such as Windows, Android, Mac OS, Linux, etc.

Figure 19: OpenCV Logo

Figure 20: OpenCV Object Tracking Example

Matlab

Matlab (Matrix Library) is a numerical computing environment that was developed by MathWorks in 1984. It contains the Computer Vision Toolbox which provides various algorithms and functions for computer vision. These include object detection, object tracking, feature detection, feature matching, camera calibration in 3-D, 3D reconstruction, etc. You can also create and train custom object detectors in Matlab using machine learning algorithms such as YOLO v2, ACF, Faster R-CNN, etc. These algorithms can also be run on multicore processors and GPUs to make them much faster. The Matlab toolbox algorithms support code generation in C and C++. Matlab is not a free tool.

Figure 21: Matlab Logo

Figure 22: Matlab Medical Image Processing Example

Octave

Octave is a free tool that is very similar to Matlab in behavioral and syntax wise but, Octave does not have the specialized toolboxes which are a part of Matlab. This has many of the basic built-in functions and algorithms which is related to computer vision just like in Matlab.

Figure 23: Octave Logo

Figure 24: Octave Microscope Image Processing Example

TensorFlow

TensorFlow is a free open-source platform that has a wide variety of tools, libraries, and resources for Artificial Intelligence and Machine Learning which includes Computer Vision. It was created by the Google Brain team and was initially released on November 9, 2015. You can use TensorFlow to build and train Machine Learning models related to computer vision that include facial recognition, object identification, etc. Google also released the Pixel Visual Core (PVC) in 2017 which is an image, vision, and Artificial Intelligence processor for mobile devices. This Pixel Visual Core also supports TensorFlow for machine learning. TensorFlow supports languages such as Python, C, C++, Java, JavaScript, Go, Swift, etc. but without an API backward compatibility guarantee. There are also third-party packages for languages like MATLAB, C#, Julia, Scala, R, Rust, etc.

Figure 25: Tensorflow Logo

Figure 26: Tensorflow Object Detection Example

Pytorch

Pytorch is a similar framework by Facebook (Meta) and one of the leading Deep Learning frameworks out there along with Tensorflow.
Figure 27: PyTorch Logo

CUDA

CUDA (Compute Unified Device Architecture) tool/ API by NVIDIA is used as a foundation for parallel computing. CUDA uses the power of GPUs to deliver incredible performance. The toolbox incorporates the NVIDIA Performance Primitives library, which contains a set of images, signal, and video processing functions. Developers can program in various languages like C, C++, Fortran, MATLAB, Python, etc. while using CUDA. CUDA only supports NVIDIA GPUs.

Figure 28: CUDA Logo

Figure 29 Pose Estimation in NVIDIA Edge Devices with CUDA boost

Keras

Another free open-source Python-based deep learning library that combines the best of different libraries. It has gained popularity as it combines the power of TensorFlow, Theano, and CNTK. It can run on TensorFlow, Microsoft Cognitive Toolkit, PlaidML, or Theano. Keras is often used for quick experimentation with deep neural networks. Currently, Tensorflow has built-in support for Keras. 

Figure 30: Keras Logo


Figure 31: Keras - A Segmentation Network

Future Posts

In the future, we will discuss the concepts we mentioned in this post with practical implementations in Python language. 


Comments