Please enable JavaScript.
Coggle requires JavaScript to display documents.
DATA SCIENTIST - Coggle Diagram
DATA SCIENTIST
PYTHON
NUMPY
Arrays (creation, indexing, slicing)
-
-
Linear algebra (dot, inv, eig, svd)
Random module (rand, randn, seed)
PANDAS
-
Indexing, filtering, slicing
-
Merging, joining, concatenation
Missing values (fillna, dropna)
-
BASICS
Data types (int, float, string, bool)
Lists, tuples, sets, dicts
Loops, conditionals, functions
List comprehensions, generators
OOP (class, object, inheritance, polymorphism)
-
Error handling (try, except, finally)
File handling (read, write, append)
MATPLOTLIB/SEABORN
Line, bar, scatter, histogram, boxplot, heatmap
Styling (colors, labels, titles, legends)
-
MATHS
CALCULUS
Functions, limits, continuity
Differentiation rules (power, product, quotient, chain)
-
Gradient, Jacobian, Hessian
-
Integration (definite, indefinite, substitution, by parts)
-
PROBABILITY
Random variables (discrete, continuous)
Probability axioms, Bayes theorem
-
Expectation, variance, covariance
Joint, marginal, conditional distributions
Law of large numbers, Central limit theorem
LINEAR ALGEBRA
Scalars, vectors, matrices, tensors
Vector addition, scalar multiplication
Dot product, cross product
Matrix addition, multiplication, transpose
Determinant, inverse, rank
Eigenvalues, eigenvectors
-
STATISTICS
Descriptive statistics (mean, median, mode, std, variance, skewness, kurtosis)
Sampling methods (random, stratified, cluster)
Hypothesis testing (null, alternate, p-value, significance)
Z-test, t-test, chi-square test, ANOVA
-
-
-
MACHINE LEARNING
SUPERVISED LEARNING
REGRESSION
Linear regression (OLS, assumptions, residual analysis)
-
-
-
-
Gradient Boosting (XGBoost, LightGBM, CatBoost)
CLASSIFICATION
Logistic regression (sigmoid, odds ratio, logit)
-
-
Naïve Bayes (Gaussian, Multinomial, Bernoulli)
-
SVM (linear, kernel tricks)
Ensemble methods (bagging, boosting, stacking)
ROC, AUC, Precision, Recall, F1-score
UNSUPERVISED LEARNING
Clustering (k-means, hierarchical, DBSCAN)
Dimensionality reduction (PCA, t-SNE, UMAP)
Association rules (Apriori, FP-growth)
MODEL EVALUATION
Train-test split, cross-validation
-
Confusion matrix, classification report
Regression metrics (MSE, RMSE, MAE, R²)
-
DATA PREPROCESSING
Handling missing values (mean, median, mode, interpolation, deletion)
Encoding categorical variables (LabelEncoder, OneHotEncoder, OrdinalEncoder, Target encoding)
Scaling (StandardScaler, MinMaxScaler, RobustScaler, Normalization)
Outlier detection (Z-score, IQR, IsolationForest)
Feature engineering (polynomial features, interaction terms, log transform, binning)
Feature selection (filter, wrapper, embedded methods)
-
-
-
SQL
-
ORDER BY, GROUP BY, HAVING
JOINs (INNER, LEFT, RIGHT, FULL)
-
Window functions (ROW_NUMBER, RANK, PARTITION BY)
-
-
-
-