It is process of extracting insights from data by using scientific methods.
Scientific Methods
1. Machine Learning
2. Deep Learning
3. Natural Language Processing
4. Statistics
5. Visualization Tools (Seaborn,
Matplotlib)
Data in Data Science
1. Structured >> CSV Files,
Excel Files
2. Unstructured >> Images,
Videos, Text, Audios
3. Semi-structured >> JSON,
HTML
1. Supervised ML
2. Unsupervised ML
3. Reinforcement ML
Supervised Machine Learning
If the Data is labelled, Use
Supervised ML
1. Independent Variables (Input Variable, Predictors, Features,
Parameters)
2. Dependent Variable (Output, Target)
1. Classification:(Categorical
data in target Column)
1. Binary Classification (Two Categories in target column):
Approved/Declined, 1/0, True/False,
Yes/No, Pos/Neg, Good/Bad, Spam/Not-Spam
2. Multiclass Classification (more than 2 categories in target column):
High Risk - Medium Risk - Low Risk
Class0-CLass1-CLass2
setosa-versicolor-virginica
2. Regression:(Continuous Data in
taregt Column)
1. Car Price
2. Weather Prediction
3. House Price
4. Stock Price, Time Series Analysis
Supervised Machine Learning
Algorithms
1. Linear Regression
(Regression)
2. Logistic Regression
(Classification)
3. K-Nearest Neighbor
(Classification and Regression)
3.1 KNNClassifier
3.2 KNNRegressor
4. Decision Tree
(Classification and Regression)
5. Random Forest
(Classification and Regression)
6. AdaBoost (Classification
and Regression)
7. Gradient Boost
(Classification and Regression)
8. XGBoost (Classification
and Regression)
9. Support Vector Machine
(Classification and Regression)
10. Naive Bayes Classifier
(Classification)
Unsupervised Machine Learning
If the data is Unlabeled, use
Unsupervised Machine Learning.
Unsupervised Machine Learning
Algorithms
1. KMeans Clustering
2. Hierarchical Clustering
3. Principal Component Analysis
(PCA)
Reinforcement Learning
Reward Based Learning
1. Self-Driving Cars
2. Temperature Control System
Data Science Project Steps
1. Problem Statement
2. Data Gathering:(Database-
MySQL, MongoDB)
3. Exploratory Data Analysis (EDA)-
Numpy, Pandas, Seaborn, Matplotlib
4. Feature Engineering:
Standardization
Normalization
Binning
One Hot Encoding
Label Encoding
Transformation (Log, square root, cube root)
5. Feature Selection:(To select
best set of features)
1. Filter Method
2. Wrapper Method
3. Embedded Method
6. Model Training:
1. Linear Regression
2. KNN Regression
3. DT Regression
7. Model Evaluation:
Regression:
1. MSE
2. RMSE
3. MAE
4. R-Squared Value
Classification:
1. Confusion Matrix
2. Classification Report (Precision,
Recall, and f1-score)
3. AUC- ROC Curve
4. Accurcy
8. Web Development
Framework:(Python Developer)
1. Flask- To Write API's
2. DJango
3. FastAI
4. gRPC
9. Deployment:
1. AWS
2. GCP
We are discussing each and every step in next session, stay stunned. get well soon : )
Comments
Post a Comment