Skip to main content

Data Science

Data Science in short

It is process of extracting insights from data by using scientific methods.

Scientific Methods

1. Machine Learning

2. Deep Learning

3. Natural Language Processing

4. Statistics

5. Visualization Tools (Seaborn, Matplotlib)

 

Data in Data Science

1. Structured >> CSV Files, Excel Files

2. Unstructured >> Images, Videos, Text, Audios

3. Semi-structured >> JSON, HTML

 

Types of Machine Learning

1. Supervised ML

2. Unsupervised ML

3. Reinforcement ML

 


Supervised Machine Learning

If the Data is labelled, Use Supervised ML

 labelled data

    1. Independent Variables (Input Variable, Predictors, Features, Parameters)

    2. Dependent Variable (Output, Target)

   

1. Classification:(Categorical data in target Column)

    1. Binary Classification (Two Categories in target column):

        Approved/Declined, 1/0, True/False, Yes/No, Pos/Neg, Good/Bad, Spam/Not-Spam

       

    2. Multiclass Classification (more than 2 categories in target column):

        High Risk - Medium Risk - Low Risk

        Class0-CLass1-CLass2

        setosa-versicolor-virginica

 

2. Regression:(Continuous Data in taregt Column)

    1. Car Price

    2. Weather Prediction

    3. House Price

    4. Stock Price, Time Series Analysis   


Supervised Machine Learning Algorithms

1. Linear Regression (Regression) 

2. Logistic Regression (Classification)

3. K-Nearest Neighbor (Classification and Regression)

    3.1 KNNClassifier

    3.2 KNNRegressor

   

4. Decision Tree (Classification and Regression)

5. Random Forest (Classification and Regression)

6. AdaBoost (Classification and Regression)

7. Gradient Boost (Classification and Regression)

8. XGBoost (Classification and Regression)

9. Support Vector Machine (Classification and Regression)

10. Naive Bayes Classifier (Classification)

 

Unsupervised Machine Learning

If the data is Unlabeled, use Unsupervised Machine Learning.

 

Unsupervised Machine Learning Algorithms

1. KMeans Clustering

2. Hierarchical Clustering

3. Principal Component Analysis (PCA)

 

Reinforcement Learning

Reward Based Learning

1. Self-Driving Cars

2. Temperature Control System


Data Science Project Steps

1. Problem Statement

2. Data Gathering:(Database- MySQL, MongoDB)

3. Exploratory Data Analysis (EDA)- Numpy, Pandas, Seaborn, Matplotlib

4. Feature Engineering:

    Standardization

    Normalization

    Binning

    One Hot Encoding

    Label Encoding

    Transformation (Log, square root, cube root)

5. Feature Selection:(To select best set of features)

    1. Filter Method

    2. Wrapper Method

    3. Embedded Method

6. Model Training:

    1. Linear Regression

    2. KNN Regression

    3. DT Regression

7. Model Evaluation:

    Regression:

        1. MSE

        2. RMSE

        3. MAE

        4. R-Squared Value

    Classification:

        1. Confusion Matrix

        2. Classification Report (Precision, Recall, and f1-score)

        3. AUC- ROC Curve

        4. Accurcy

8. Web Development Framework:(Python Developer)

    1. Flask- To Write API's

    2. DJango

    3. FastAI

    4. gRPC

9. Deployment:

    1. AWS

    2. GCP



We are discussing each and every step in next session, stay stunned. get well soon : )


Comments

Popular posts from this blog

  Natural Language Processing Understanding Human Language Natural Language Processing (NLP) is a subfield of artificial intelligence that focuses on the interaction between computers and human language. NLP involves the use of statistical and computational techniques to analyze, understand, and generate human language. It is a rapidly growing field that has numerous applications in various industries, including healthcare, finance, marketing, and customer service. NLP involves several key components, including: Text Preprocessing: The first step in NLP is to preprocess the text data, which involves cleaning the data, removing stop words, and tokenizing the text into individual words or phrases. Part-of-Speech Tagging: Part-of-speech tagging involves labeling each word in a sentence with its corresponding part of speech, such as noun, verb, adjective, or adverb. This helps to identify the grammatical structure of a sentence. Named Entity Recognition: Named Entity Recognition (NE...
  Machine Learning: An Introduction Machine learning is a subfield of artificial intelligence that involves the use of statistical and computational techniques to enable computers to learn from data without being explicitly programmed. It is a powerful tool that has become increasingly popular in recent years due to its ability to learn from large amounts of data and make predictions based on that data. Machine learning algorithms c an be divided into two categories: supervised learning and unsupervised learning. In supervised learning, the algorithm is trained on a labeled dataset, where the correct output is provided for each input. The goal of supervised learning is to learn a mapping between inputs and outputs that can be used to make predictions on new data. Examples of supervised learning algorithms include linear regression, logistic regression, decision trees, and neural networks. In unsupervised learning, the algorithm is trained on an unlabeled dataset, where the input da...
  Regression and Classification Machine Learning Algorithms Understanding the Differences Machine learning algorithms can be divided into two main categories: regression and classification. Regression is used when the output variable is continuous, while classification is used when the output variable is categorical. Regression Machine Learning Algorithms Regression machine learning algorithms are used to predict a continuous output variable. Some common regression algorithms include: Linear Regression: Linear regression is a simple algorithm that tries to model the relationship between the input variables and the output variable using a linear equation. It is commonly used in situations where there is a linear relationship between the input and output variables. Polynomial Regression: Polynomial regression is a variation of linear regression that can be used to model non-linear relationships between the input and output variables. It works by adding polynomial terms to the linear ...