Please enable JavaScript.

Coggle requires JavaScript to display documents.

Python ML (Technique (Regression / Estimation, Classification (A or B),…

- - - - Number of samples: Total number of examples you have in the data.
      - Number of classes: Total number of topics or categories in the data.
      - Number of samples per class: Number of samples per class (topic/category). In a balanced dataset, all classes will have a similar number of samples; in an imbalanced dataset, the number of samples in each class will vary widely.
      - Number of words per sample: Median number of words in one sample.
      - Frequency distribution of words: Distribution showing the frequency (number of occurrences) of each word in the dataset.
      - Distribution of sample length: Distribution showing the number of words per sample in the dataset.
  - - - sequence models
        
        convolutional neural networks (CNNs)
        
        recurrent neural networks (RNNs)
      - n-gram models
        
        logistic regression
        
        simple multi- layer perceptrons (MLPs / fully-connected neural networks)
        
        gradient boosted trees
        
        support vector machines.
    - - Comparison
        
        MLP takes much less compute time than sequence models.
        
        Choose MPL when number of samples / words per sample is less than <= 1500, otherwise a sequential model should be choosen
      - Layers
        
        Input
        
        Hidden
        
        Output
  - - - Tokenizing into word unigrams + bigrams provides good accuracy while taking less compute time.
      - Objective: (Divide text into words/smaller sub-text)(Vocabulary)
    - - Objective: (Define good numerial values for Tokens)
      - One-hot encoding: [1,0,1,1,0,1,0]
      - Count encoding: [1,0,1,2,3,1,0]
      - Tf-idf encoding: (occurences/samples) * log(occurences/samples) [0.33, 0, 0.23, 0.45]
    - - Cut
        
        Tokens that occur extremely rarely across the dataset
        
        how much each token contributes to label predictions
        
        Statistical functions:
        
        f_classif
        
        chi2
      - More than 20,000, less accuracy
    - - Objective: converts all feature/sample values to small and similar values. This simplifies gradient descent convergence in learning algorithms
    - - One-hot encoding
      - Word embeddings
        
        As a result, we can represent word tokens in a dense vector space (~few hundred real numbers), where the location and distance between words indicates how similar they are semantically (See Figure 7). This representation is called word embeddings.
        
        Embedding Layer ????
  - - - Keras
    - - Binary >> sigmoid >> 0.21
      - Multi >>> softmax >>> 0.21 / 0.39 / 0.50 (in case of 3 layers)
    - - sepCNN the best in the test (others CNN, RNN and CNN-RNN)
    - - Metric (Accuracy choosen)
      - Loss function: A function that is used to calculate a loss value that the training process then attempts to minimize by tuning the network weights. For classification problems, cross-entropy loss works well.
      - Optimizer: A function that decides how the network weights will be updated based on the output of the loss function. We used the popular Adam optimizer in our experiments.
      - We repeat training using the dataset for a predetermined number of epochs????. We may optimize this by stopping early, when the validation accuracy stabilizes between consecutive epochs, showing that the model is not training anymore.
- - - - Y. Dependent variable
      - X. Independent Variable
    - - Ordinal
      - Poisson
      - Fast forest quantile
      - Linear, Polynomial, Lasso, Stepwise, Ridge
      - Bayesian linear
      - Neural network
      - Decision forest
      - Boosted decision tree
      - KNN (K-nearest neighbor)
    - - Theta 1 is slope/Coefficient
      - Theta 0 is intercept
      - sklearn
      - Evaluation
        
        Mean absolute error
        
        Mean Square Error (MSE / MSD)
        
        Root Mean Square Error (RMSE)
        
        R-squared (How closed is the fitted line)
      - Confusion matrix (Evaluation table)
  - - - Binary classification
      - Multi-class
    - - Decision Tree (ID3, C4.5, C5.0)
      - Naive Bayes
      - Linear Discriminant Analysis
      - K-nearest neighbor
        
        Low K value causes high complex model (Over-fitting)
        
        Use test data to define k, increasing until be fine
      - Logistic regression
      - Neural Networks
      - Support Vector Machines (SVM)
    - - Jaccard index
      - F1-Score (Confusion Matrix)
        
        Precision= TP / (TP + FP)
        Recall = TP / (TP+FN)
        F1-Score = 2x (prec x rec) / (prec + rec)
        
        Recall definition = Preditos corretos / (todos que deveriam ter sido preditos correto)
        Ex: Total de label correct-answer=10
        Classificado correto=7
        Classificado errado=3
        Recall=70%
      - Log loss
        
        Probability of a class label, instead of a label
  - - - Binary class
      - You need the probability of you prediction
      - if your data is linearly separable. The decision boundary of logistic regression is a line or a plane or a hyper-plane.
      - You need to understand the impact of the feature
    - - sigmoid
      - Linear shows 0 or 1
      - Logistic shows probability to each class as 0.75
    - - newton-cg
      - lbfgs
      - liblinear
      - sag
      - saga
  - - - SVM works by first, mapping data to a high-dimensional feature space so that data points can be categorized
      - A separator is estimated for the data (it is a hyper plan in a 3D space
        Kerneling procedure: mapping data into a high dimention space
    - - Advantages
        
        Accurate in high dimentional space
        
        Memory efficient
      - Disavantages
        
        Prone to over-fitting if the number of features is greather than samples
        
        Does not provide a directly probability estimation
        
        Is not computationally efficient if you have a big dataset
    - - Image classification
      - Text mining tasks
      - Detecting spam
      - Sentiment analysis
    - - linear
      - Polynomial
      - Radial basis function
      - Sigmoid
- - - - hist(*, bins** ) set interval to reduction
      - util
        
        shuffle data: df.sample(frac=1)
        to reindex: reset_index(drop=True)
        drop will avoid create new rows
    - - Classification
      - Regression
      - Clustering
      - functions
        
        train_test_split
        
        KNeighborsClassifier
        
        metrics.accuracy_score
        
        metrics.confusion_matrix