Skip to main content

Data Science

Data Science in short

It is process of extracting insights from data by using scientific methods.

Scientific Methods

1. Machine Learning

2. Deep Learning

3. Natural Language Processing

4. Statistics

5. Visualization Tools (Seaborn, Matplotlib)

 

Data in Data Science

1. Structured >> CSV Files, Excel Files

2. Unstructured >> Images, Videos, Text, Audios

3. Semi-structured >> JSON, HTML

 

Types of Machine Learning

1. Supervised ML

2. Unsupervised ML

3. Reinforcement ML

 


Supervised Machine Learning

If the Data is labelled, Use Supervised ML

 labelled data

    1. Independent Variables (Input Variable, Predictors, Features, Parameters)

    2. Dependent Variable (Output, Target)

   

1. Classification:(Categorical data in target Column)

    1. Binary Classification (Two Categories in target column):

        Approved/Declined, 1/0, True/False, Yes/No, Pos/Neg, Good/Bad, Spam/Not-Spam

       

    2. Multiclass Classification (more than 2 categories in target column):

        High Risk - Medium Risk - Low Risk

        Class0-CLass1-CLass2

        setosa-versicolor-virginica

 

2. Regression:(Continuous Data in taregt Column)

    1. Car Price

    2. Weather Prediction

    3. House Price

    4. Stock Price, Time Series Analysis   


Supervised Machine Learning Algorithms

1. Linear Regression (Regression) 

2. Logistic Regression (Classification)

3. K-Nearest Neighbor (Classification and Regression)

    3.1 KNNClassifier

    3.2 KNNRegressor

   

4. Decision Tree (Classification and Regression)

5. Random Forest (Classification and Regression)

6. AdaBoost (Classification and Regression)

7. Gradient Boost (Classification and Regression)

8. XGBoost (Classification and Regression)

9. Support Vector Machine (Classification and Regression)

10. Naive Bayes Classifier (Classification)

 

Unsupervised Machine Learning

If the data is Unlabeled, use Unsupervised Machine Learning.

 

Unsupervised Machine Learning Algorithms

1. KMeans Clustering

2. Hierarchical Clustering

3. Principal Component Analysis (PCA)

 

Reinforcement Learning

Reward Based Learning

1. Self-Driving Cars

2. Temperature Control System


Data Science Project Steps

1. Problem Statement

2. Data Gathering:(Database- MySQL, MongoDB)

3. Exploratory Data Analysis (EDA)- Numpy, Pandas, Seaborn, Matplotlib

4. Feature Engineering:

    Standardization

    Normalization

    Binning

    One Hot Encoding

    Label Encoding

    Transformation (Log, square root, cube root)

5. Feature Selection:(To select best set of features)

    1. Filter Method

    2. Wrapper Method

    3. Embedded Method

6. Model Training:

    1. Linear Regression

    2. KNN Regression

    3. DT Regression

7. Model Evaluation:

    Regression:

        1. MSE

        2. RMSE

        3. MAE

        4. R-Squared Value

    Classification:

        1. Confusion Matrix

        2. Classification Report (Precision, Recall, and f1-score)

        3. AUC- ROC Curve

        4. Accurcy

8. Web Development Framework:(Python Developer)

    1. Flask- To Write API's

    2. DJango

    3. FastAI

    4. gRPC

9. Deployment:

    1. AWS

    2. GCP



We are discussing each and every step in next session, stay stunned. get well soon : )


Comments

Popular posts from this blog

  Natural Language Processing Understanding Human Language Natural Language Processing (NLP) is a subfield of artificial intelligence that focuses on the interaction between computers and human language. NLP involves the use of statistical and computational techniques to analyze, understand, and generate human language. It is a rapidly growing field that has numerous applications in various industries, including healthcare, finance, marketing, and customer service. NLP involves several key components, including: Text Preprocessing: The first step in NLP is to preprocess the text data, which involves cleaning the data, removing stop words, and tokenizing the text into individual words or phrases. Part-of-Speech Tagging: Part-of-speech tagging involves labeling each word in a sentence with its corresponding part of speech, such as noun, verb, adjective, or adverb. This helps to identify the grammatical structure of a sentence. Named Entity Recognition: Named Entity Recognition (NE...
  Deep Learning: Understanding Neural Networks Deep Learning is a subfield of machine learning that involves the use of neural networks to model complex relationships in data. Neural networks are a series of interconnected nodes, or neurons, that process and transmit information. They are inspired by the structure and function of the human brain, and are capable of learning from large amounts of data without being explicitly programmed. Deep Learning has become increasingly popular in recent years due to its ability to handle complex and unstructured data, such as images, audio, and text. Some common applications of Deep Learning include computer vision, speech recognition, natural language processing, and autonomous vehicles. Neural networks can be divided into three main categories: feedforward neural networks, recurrent neural networks, and convolutional neural networks. Feedforward Neural Networks: Feedforward neural networks are the simplest type of neural network, consisting ...
  Machine Learning: An Introduction Machine learning is a subfield of artificial intelligence that involves the use of statistical and computational techniques to enable computers to learn from data without being explicitly programmed. It is a powerful tool that has become increasingly popular in recent years due to its ability to learn from large amounts of data and make predictions based on that data. Machine learning algorithms c an be divided into two categories: supervised learning and unsupervised learning. In supervised learning, the algorithm is trained on a labeled dataset, where the correct output is provided for each input. The goal of supervised learning is to learn a mapping between inputs and outputs that can be used to make predictions on new data. Examples of supervised learning algorithms include linear regression, logistic regression, decision trees, and neural networks. In unsupervised learning, the algorithm is trained on an unlabeled dataset, where the input da...