Please enable JavaScript.

Coggle requires JavaScript to display documents.

Generative ai - Coggle Diagram

- - - - transformers are advanced architectures designed to process sequential data like text or time-series
        
        key features
        
        Self-Attention Mechanism: Focuses on the most relevant parts of the input for each word or token, by assigning attention weights for every word wrt every other word
        
        Parallelism: Enables faster training compared to sequential models like RNNs
        
        Scalability: Works well with very large datasets and models
        
        Steps
        
        Tokenise the words: Converting words into numbers, for example their position on dictionary
        
        Encoder
        
        Encodes input "prompts" with contextual understanding and produces one vector per input token.
        
        Embedding layer: Vector high dimension embedding space, each token is represented as a vector and it occupies a unique location within that space
        
        These embeddings serve as the initial input to the transformer, capturing semantic and syntactic information about the tokens in a numerical format that the model can process
        
        Pass the resultant vectors to self attention layers
        
        multi-head self-attention mechanism: This self attention is multi headed. each self-attention head will learn a different aspect of language. For example, one head may see the relationship between the people entities in our sentence. Whilst another head may focus on the activity of the sentence. Whilst yet another head may focus on some other properties such as if the words rhyme
        
        feed-forward network applies a point-wise fully connected layer to each position separately and identically
        
        Decoder: Accepts input token and generates new token
        
        multi-head self-attention mechanism
        
        feed-forward network
        
        softmax layer is a probability distribution across the entire dictionary of words that the model uses
        
        Types
        
        Encoder only: Sentiment Analysis, Named entity recognition, Word classification
        
        Decoder only: Text generation
        
        Encoder-Decoder: Translation, Summarisation, Question answering.
  - - - classify between cat or dog
  - - - generate dog image
- - - - Given an input called prompt an LLM can then complete this sentence, with a different completion every time!