Skip to main content

Data Science

Data Science in short

It is process of extracting insights from data by using scientific methods.

Scientific Methods

1. Machine Learning

2. Deep Learning

3. Natural Language Processing

4. Statistics

5. Visualization Tools (Seaborn, Matplotlib)

 

Data in Data Science

1. Structured >> CSV Files, Excel Files

2. Unstructured >> Images, Videos, Text, Audios

3. Semi-structured >> JSON, HTML

 

Types of Machine Learning

1. Supervised ML

2. Unsupervised ML

3. Reinforcement ML

 


Supervised Machine Learning

If the Data is labelled, Use Supervised ML

 labelled data

    1. Independent Variables (Input Variable, Predictors, Features, Parameters)

    2. Dependent Variable (Output, Target)

   

1. Classification:(Categorical data in target Column)

    1. Binary Classification (Two Categories in target column):

        Approved/Declined, 1/0, True/False, Yes/No, Pos/Neg, Good/Bad, Spam/Not-Spam

       

    2. Multiclass Classification (more than 2 categories in target column):

        High Risk - Medium Risk - Low Risk

        Class0-CLass1-CLass2

        setosa-versicolor-virginica

 

2. Regression:(Continuous Data in taregt Column)

    1. Car Price

    2. Weather Prediction

    3. House Price

    4. Stock Price, Time Series Analysis   


Supervised Machine Learning Algorithms

1. Linear Regression (Regression) 

2. Logistic Regression (Classification)

3. K-Nearest Neighbor (Classification and Regression)

    3.1 KNNClassifier

    3.2 KNNRegressor

   

4. Decision Tree (Classification and Regression)

5. Random Forest (Classification and Regression)

6. AdaBoost (Classification and Regression)

7. Gradient Boost (Classification and Regression)

8. XGBoost (Classification and Regression)

9. Support Vector Machine (Classification and Regression)

10. Naive Bayes Classifier (Classification)

 

Unsupervised Machine Learning

If the data is Unlabeled, use Unsupervised Machine Learning.

 

Unsupervised Machine Learning Algorithms

1. KMeans Clustering

2. Hierarchical Clustering

3. Principal Component Analysis (PCA)

 

Reinforcement Learning

Reward Based Learning

1. Self-Driving Cars

2. Temperature Control System


Data Science Project Steps

1. Problem Statement

2. Data Gathering:(Database- MySQL, MongoDB)

3. Exploratory Data Analysis (EDA)- Numpy, Pandas, Seaborn, Matplotlib

4. Feature Engineering:

    Standardization

    Normalization

    Binning

    One Hot Encoding

    Label Encoding

    Transformation (Log, square root, cube root)

5. Feature Selection:(To select best set of features)

    1. Filter Method

    2. Wrapper Method

    3. Embedded Method

6. Model Training:

    1. Linear Regression

    2. KNN Regression

    3. DT Regression

7. Model Evaluation:

    Regression:

        1. MSE

        2. RMSE

        3. MAE

        4. R-Squared Value

    Classification:

        1. Confusion Matrix

        2. Classification Report (Precision, Recall, and f1-score)

        3. AUC- ROC Curve

        4. Accurcy

8. Web Development Framework:(Python Developer)

    1. Flask- To Write API's

    2. DJango

    3. FastAI

    4. gRPC

9. Deployment:

    1. AWS

    2. GCP



We are discussing each and every step in next session, stay stunned. get well soon : )


Comments

Popular posts from this blog

  Natural Language Processing Understanding Human Language Natural Language Processing (NLP) is a subfield of artificial intelligence that focuses on the interaction between computers and human language. NLP involves the use of statistical and computational techniques to analyze, understand, and generate human language. It is a rapidly growing field that has numerous applications in various industries, including healthcare, finance, marketing, and customer service. NLP involves several key components, including: Text Preprocessing: The first step in NLP is to preprocess the text data, which involves cleaning the data, removing stop words, and tokenizing the text into individual words or phrases. Part-of-Speech Tagging: Part-of-speech tagging involves labeling each word in a sentence with its corresponding part of speech, such as noun, verb, adjective, or adverb. This helps to identify the grammatical structure of a sentence. Named Entity Recognition: Named Entity Recognition (NE...

Data Science Intrudoction

  Data Science: Unlocking Insights from Data Data science is a field that has become essential in many industries today, as organizations strive to leverage data to gain insights and improve decision-making. Data science involves the use of statistical and computational methods to extract insights from data, analyzing large amounts of data to identify patterns and trends, and using that information to make predictions and inform decisions. The data science process typically involves the following steps: Define the problem: The first step in any data science project is to define the problem to be solved. This involves identifying the business problem and the data required to solve it. Collect the data: Once the problem is defined, the next step is to collect the necessary data. This can involve gathering data from various sources, such as databases, web scraping, or surveys. Prepare the data: Before the data can be analyzed, it needs to be prepared. This can involve cleaning the da...