Support Vector Regression (SVR)

SVR is a machine learning technique that is generally utilized for continuous or numerical prediction applications. While standard regression models seek to minimize the differences between predicted and actual values, SVR takes a different approach, focused on fitting a “tube” or a hyperplane around the data points. The goal is to fit as many data points as possible inside this tube while keeping variances to a minimum. SVR is very beneficial when dealing with non-linear correlations in data since it uses kernel functions to translate the input features into a higher-dimensional space, allowing complex patterns to be discovered.

One of SVR’s primary advantages is its ability to successfully handle outliers. The algorithm emphasizes points within the tube while ignoring those outside of it. This makes SVR resistant to noise in the data and ensures that extreme values have less influence on the model. SVR has applications in a variety of fields, including finance, biology, and environmental research, where precise prediction of continuous variables is critical. Support Vector Regression is a key tool in the data scientist’s toolset for regression problems due to its flexibility, particularly in detecting subtle patterns and handling outliers.

Support Vector Machines (SVM)

SVM is a strong supervised machine learning method that is used for classification and regression tasks. SVM’s basic idea is to find a hyperplane in a high-dimensional space that best separates data points of distinct classes. The “support vectors” are the data points nearest to the decision boundary or hyperplane. The margin, which is the distance between the support vectors and the decision boundary, is maximized by SVM. A wider margin indicates better generalization to previously unseen data and increased resistance to noise in the training data.

SVM’s capacity to handle non-linear correlations in data using kernel functions is one of its primary strengths. Kernel functions convert the input features into a higher-dimensional space, allowing a hyperplane to be found that efficiently separates the data in this transformed space. As a result, SVM can capture complicated decision boundaries and achieve high accuracy in a wide range of scenarios. Furthermore, SVM is less prone to overfitting than other algorithms since the margin maximization encourages a more generalizable model.

While SVMs thrive in many applications, they may struggle with huge datasets or several classes. SVM training on a large dataset can be computationally expensive, and the method may suffer if the number of features exceeds the number of samples. Despite these limitations, SVM continues to be a popular choice in a variety of disciplines, including image classification, text categorization, and bioinformatics, due to its versatility and effectiveness in dealing with different and complicated datasets.

BERT

For the third project we are taking a data set about comments of a resturant and analyze comments about sentiment and we are thinking of using BERT model.

BERT (Bidirectional Encoder Representations from Transformers), potential natural language processing (NLP) approach. It uses the Transformer design,  to depict a breakthrough in language interpretation.

BERT is its bidirectional context awareness, which enables it to take into account both left and right context words at the same time. BERT pre-trains on vast volumes of textual data by anticipating missing words in sentences, in contrast to earlier NLP models that processed text in a unidirectional manner. This allows it to acquire deeply contextualized representations of words. Through this pre-training procedure, BERT gains a thorough grasp subtleties and semantics.

In a variety of NLP tasks, such as named entity recognition, sentiment analysis, and question answering, BERT has shown outstanding performance. It is a foundational model in the area because to its capacity to capture context-rich embeddings, and its pre-trained representations may be adjusted with comparatively little task-specific input for particular downstream tasks. BERT has had a significant influence on the field of NLP, inspiring the creation of many cutting-edge models that draw from its design and guiding ideas.

Confusion Matrix

As the name implies, a confusion matrix is a numerical matrix that indicates the confusion points in a model. The confusion matrix is a structured method of mapping the predictions to the original classes to which the data belong. In other words, it is a class-wise distribution of the predictive performance of a classification model. This suggests that confusion matrices are only applicable in supervised learning frameworks—that is, when the output distribution is known.

Confusion Matrix for Binary classification:

A dataset with just two unique categories of data is called a binary class dataset. To keep things simple, we might refer to these two groups as the “positive” and the “negative.”

Assume that the dataset we use to assess a machine learning model has a binary class imbalance, with 60 samples in the test set’s positive class and 40 samples in its negative class.

Now, in order to completely comprehend the confusion matrix pertaining to this binary class categorization issue, we must first obtain familiar with the following terms:

  • True Positive (TP) refers to a sample belonging to the positive class being classified correctly.
  • True Negative (TN) refers to a sample belonging to the negative class being classified correctly.
  • False Positive (FP) refers to a sample belonging to the negative class but being classified wrongly as belonging to the positive class.
  • False Negative (FN) refers to a sample belonging to the positive class but being classified wrongly as belonging to the negative class.

SVM – Support Vector Machines

For problems involving regression and classification, Support Vector Machines (SVMs) represent a stable and adaptable class of supervised machine learning algorithms. A Support Vector Machine (SVM) is especially useful in situations when there are more features than samples since its main objective is to locate a hyperplane in a high-dimensional space that maximizes the margin between various classes. The data points that are closest to the decision border and affect its position are known as support vectors, and they are essential to its success. Strong Variable Classifiers (SVMs) are useful in a wide range of applications, including bioinformatics, image classification, and handwriting recognition. They perform well in high-dimensional spaces and remain resilient to outliers. The algorithm’s usefulness in non-linear classification and regression issues is further enhanced by its capacity to handle complex relationships thanks to the kernel approach. SVMs can be an effective tool in your machine learning toolbox if you’re working with data where a distinct margin of separation between classes is essential.

LSTM

Long Short-Term Memory (LSTM) networks are comparable to very intelligent instruments.

When it comes to jobs that need things to happen in a specific order, such as words in a phrase or stock values over time, they excel. LSTMs, in contrast to earlier techniques, feature an interesting design that aids in their long-term memory of critical information.

Imagine it as if you had a smart gate and a dedicated memory cell that decide what information you should remember and forget at each stage. Because of this, LSTMs are excellent at comprehending language, identifying speech, and forecasting future trends in fields like finance.

Put another way, LSTMs are constructed using a set of principles that enable them to gradually discover patterns in data. LSTMs are the clever technology that allows a computer to be taught to recognize a friend’s voice or anticipate the words that will be said next in a phrase. They let computers to comprehend and process sequential data in incredibly intelligent ways; they are the superheroes of computer programming.

Convolutional Neural Network(CNN)

One particular type of neural network that in machine learning and picture classification is a convolutional neural network. Face Recognition is a prime illustration.
Assume you click on 50 images of a person’s face, each for ten different people. You therefore have 500 images overall, each featuring 10 distinct people. CNN will be able to determine the facial features of each of these ten people if you provide it with this data. And subsequently, using the prior knowledge, it will identify the face in a fresh photo of any one of these ten people.

CNN makes an effort to replicate how people view images with their eyes. If you stop to think about it, you will see that even when seeing a whole picture, your attention can only be focused on one point at a time. After that, we move our focus to the other points, which is how we memorize the key features of a picture or a face and remember it.

CNN is a series of operations that first extracts the most significant features from a given image, and then it turns the entire image into a single row of numbers. which the fully connected ANN classifier is capable of learning.

What is Artificial Neural Network(ANN) in layman term?

When we combine multiple neurons together, it creates a vector of neurons called a layer.

When we combine multiple layers together, where all the neurons are interconnected to each other, this network of neurons is called, Artificial Neural Network or abbreviated as ANN.

There are three major type of layers in the ANN listed below

  1. The Input layer (interface to accept data): The data input is handled by a single input layer, which transmits it to the network. Keep in mind that this layer is only the passed data vector. It is only an interface to receive data for the hidden layers, which are the real neurons, and does not consist of an actual neuron layer.
  2. The Hidden layer(s) (Actual Neurons):An ANN may include one or more hidden layers. These are the real neurons that use the input data to calculate things. How many hidden levels has to be employed is inconclusive. It is quite difficult to determine the ideal number of hidden layers and the number of neurons in each layer, this is accomplished by looking at the end accuracy for different configurations.
  3. The Output layer (Actual Neurons to output the result):There is only one output layer and the number of neurons in it depends on the target variable.

 

Recurrent Neural Networks (RNN) Explanation Layman term

The human brain is capable of long-term and short-term memory retention. We can rewind time to recall the sequence of events that occurred and predict what will occur next by using the sequence of events that came before. The goal of Recurrent Neural Networks (RNN) is to imitate this function.

Think about the following situation: When reading a book, you make sense of a chapter’s events by referring to those of earlier chapters. You almost turn back time in your mind to consult the earlier sequence of events that clarifies the current ones. Events from two or three chapters prior are mixed in with those from the previous chapter. Each of these occurrences is imprinted in memory with a temporal sense, indicating when it occurred—recently or far ago.

Because they are still “fresh” in your memory, recent events are easier for you to recall than ones that happened a long time ago. Therefore, the events of the past have an impact on your comprehension of the present situation or aid in your ability to “predict” what will happen next.

Text Preprocessing using NLP techniques

Text must be represented as numerical columns when building a classification model using free text input, such as user reviews and comments. Text vectorization is the term for this method. In other words, using a series of numerical columns to represent text.

There are two main methods for doing this.

  1. Count Vectorization:  is a text preprocessing method that creates a matrix of term frequency counts from a collection of text documents in natural language processing (NLP). This approach involves representing each document in the corpus as a row and each unique word as a column in the matrix. The matrix’s cell values show the frequency with which each word occurs in a given document. Text data may be easily formatted for use in a variety of natural language processing (NLP) activities, including text classification and clustering, by using a technique called count vectorization.

 

2. TF-IDF Vectorization: Term Frequency-Inverse Document Frequency,  is another text preparation method frequently employed in NLP. It is a more sophisticated approach that considers a term’s significance over the whole corpus in addition to how frequently it occurs in a text. Every word in a document is given a weight by TF-IDF based on its term frequency—how often it appears in the text—and its inverse document frequency, which measures how uncommon it is throughout all the documents. This produces a matrix with values denoting the relative relevance of each phrase inside each document, with each document represented as a row and each term as a column.

What are the types of sampling?

  1. Simple Random Sampling Without Replacement (SRSWOR): The most common type of sampling is this one. The concept is that you cannot choose the same number more than once. The term “Without” Replacement was born.
  2. Simple Random Sampling With Replacement (SRSWR): When the total number of values (Population) is minimal, this kind of sampling is employed. Repetition in the chosen values is permitted.

  3. Stratified Sampling:

    A stratum is a group.

    Stratified sampling ensures that a small number of randomly chosen values are taken from each category.

    Examine the example below, which has three different kinds of numbers. Ten, one hundred, and five hundred series.

    It is possible that any one of the series will go entirely unnoticed if you choose five digits at random. The example below shows that the 10 series numbers are entirely absent.

  4. Systematic Sampling:

    Systematic sampling involves choosing each ‘i’th value. Every fifth or tenth number, for instance.

    A straightforward mechanism is in place. This determines the values’ index.

  5. Biased Sampling:

    As the name suggests, this is when you selected values based on your choice purposefully.

    This type of sampling is also known as purposeful sampling or convenience sampling

Sampling Theory

Think about the jar of bubble gum, which has different colored bubble gums.
There’s a good chance that the gums you “randomly” choose will contain gums of all different hues.
As a result, you might conclude that the sample that was chosen at random is representative of all the gums in the jar.
These randomly chosen gums are referred to as the sample in statistical terms, and the jar is referred to as the population.

Effect of Size on Sampling:
Example:
The bubble gum jar contained 200 gums with 6 different colors.
If you select only 10 gums there is a chance that few colors may NOT be present.
If you select 50 gums then there is a high chance of all colors being present.
If you select 100 gums then there is a very high chance of all colors being present.
if you select all 200 gums then its sure that all colors will be present. This is the case where the sample is the same as the population. That means you simply selected all!