Please enable JavaScript.
Coggle requires JavaScript to display documents.
AI Practitioner AIF -C01, AI & ML - Coggle Diagram
AI Practitioner AIF -C01
-
-
-
Images
Diffusion Model
-
-
hace el proceso hacia atras, mucho ruido, luego poco ruido hasta llegar a la imagen
Bedrock
-
-
-
-
security, privacy, govenance and responsible AI features
-
Knowledge Bases
va por otros data sources pero no entiendo para que ya que , entiendo el de abajo el fine-tuning
Fine-tuning
-
-
-
-
-
Use cases
Chatbot con una persona particular, o un tono particular, o un proposito particular
-
-
-
-
-
Evaludation a Model
Automatic evaluation
tienes unas benchamark questions y generas las mejores respuestas o ideales respuestas esperadas. Luego evaluas el modelo con estas bechmark questions, el modelo genera las respuestas. Y tienes modelos automaticos de escore que se llaman judge models que te comparan esos dos conjuntos y te saca con Grading Score
-
Benchmark Datasets
-
• Wide range of topics, complexities, linguistic phenomena
• Helpful to measure: accuracy, speed and efficiency, scalability
• Some benchmarks datasets allow you to very quickly detect any kind of bias and potential discrimination against a group of people
-
Human evaluation
Es el mismo metodo automatico pero al final no hay un judge model sino que un conjunto de empleados o personas evaluarn las respuestas vs las bechmark y dicen si esta bien o no.
-
You are developing a model and want to ensure the outputs are adapted to your users. Which method do you recommend?
-
RAG y Knowdlege Base
-
Vector Databases - GRAB
-
-
-
-
-
como funciona
en s3 esta el documento en chunks, luego se pasa a un embeddings model (amazon titan, cohere) que lo convierte en un vector database y de ahi se crea el vector database en opensearch o otra bd. Este proceso hace que los documents sean bien serachable
-
-
-
Guardrails
Bloquear topicos, responde que no va a habalr de eso, hamful, no deseable contenido
Agents
varias multi-step task related to infraestructura, provision, aplicaicones, deployment, actividades operacionales
-
por ejemplo un angente puede tener tareas como ver las historia de compra, dar recomendaciones, hacer una orden etc
Pricing
-
-
Tecnicas que consumen
-
RAG: un poco mas caro, que prompt
FIne tunning , mas caro que rag
Domain adaptation fine-tunning, el mas caro
Continued Pre-training
-
• Also called domain-adaptation fine-tuning, to make a model expert in a specific domain
-
• Good to feed industry-specific terminology into a model (acronyms, etc…)
-
Concepts - GRAB
-
-
Embeddings
Crear vectores para texto, imagenes o audio
What type of generative AI can recognize and interpret various forms of input data, such as text, images, and audio?
-
-
-
-
Prompt Latency
-
esta velocidad es mas del modelo, lo grande, los tokents de input y de output
La latencia NO ES IMPACTADA por Top P, Top K o Temperature
-
Amazon Q businness
-
-
Plugins- como jira, servicenow, salesforce etc
-
-
Admin controls
lo mismo que guardrails. Por ejemplo si preguntan por juevos de video se dice que es un topico restringido no es empresarial etc
-
Amazon Q developer
le puedes hacer preguntas la infra actual, por ejemplo cuantas lamdas tengo
generate code in java, jaascript, python, typescript, c#...
-
-
-
-
Comprenhed
-
-
extracts key phares, palces, people, brands
-
Custom classification
-
support text, pdf, word, images
-
-
transcribe
-
-
-
custom vocabularies
add specific words, phases, dominios tecnicos por ejemplo, acronymos
Polly
text to speech, es lo contrario a transcribe
-
-
Voice engine: si es neutral, standdar
Amazon Kendra
-
• Extract answers from within a document (text, pdf, HTML, PowerPoint, MS Word, FAQs…)
-
-
• Ability to manually fine-tune search results (importance of data, freshness, custom, …)
Amazon Mechanical Turk
-
-
Example
• You have a dataset of 10,000,000 images and you want to
-
-
-
-
-
-
-
-
Amazon’s Hardware for AI
GPU-based EC2 Instances (P3, P4, P5…, G3…G6…)
-
-
-
Algorithms
-
-
-
Image Processing
classification, detection
-
-
Data Wrangler
-
usa ML features
music dataset song ratings, listenings durations etc
feature store
In machine learning, a feature is data that's used as the input for ML models to make predictions.
SageMaker Clarify
-
-
Automatically evaluate FMs for your generative AI use case with metrics such as accuracy, robustness, and toxicity to support your responsible AI initiative.
Ground Truth
es para RLHF, pones personas a revisar los data set y a ponerle los labels adecuados. With Ground Truth, you can use workers from either Amazon Mechanical Turk, a vendor company that you choose, or an internal, private workforce along with machine learning to enable you to create a labeled dataset.
ML Governance
Model Cards - Create and view documentation - Informations about model, users, risk
-
Model Dashboard: repositorio centrlizado, informacion e insights de modelos
-
SageMaker Pipelines
A workflow that automates the process of building, training, and deploying a ML model
-
Helps you easily build, train, test, anddeploy 100s of models automatically
Iterate faster, reduce errors (no manual steps), repeatable mechanisms…
-
SageMaker JumpStart
• ML Hub to find pre-trained Foundation Model (FM), computer vision models, or natural language processing models
• Large collection of models from Hugging Face, Databricks, Meta, Stability AI…
-
-
• Pre-built ML solutions for demand forecasting, credit rate prediction, fraud detection and computer vision
-
- Responsible AI, Security,Compliance and Governance for AI Solutions
-
-
Interpretability
El algoritmo de liner regresion es altamente interpretable, es lfacil de entender pero tiene poco performance, osea con pocos datos no es eficiente y hay que tener muchos.
Por su lado el algorimo de redes neuronales es poble en interpretabilidad, es muy dificil entender como funciona y que hace cada capa pero tiene alto perfomance, funciona muy bien
Explainability
Being able to look at inputs and outputs and explain without understanding exactly how the model came to the conclusion
-
-
Challenges of Gen AI
Toxicity
Generating content that is offensive, disturbing, or inappropriate
-
Hallucinations
Assertions or claims that sound true, but are incorrect
-
Prompt Misuses - GRAB
Poisoning
-
• Leads to the model producing biased, offensive, orharmful outputs (intentionally or unintentionally)
-
-
Prompt Leaking
son preguntas que tambien exponen data protegida o data usada por el modelo, el ejemplo era que le mostrara un prompt anterior y le muestra un prompt hecho por otro usuario
jailbreaking
le hacen muchas muchas preguntas de como hacer x o y cosa y al final le dicen como hacer una bomba y contesta
Governance
-
-
Data Management Concepts
• Data Lifecycles – collection, processing, storage, consumption, archival
• Data Logging – tracking inputs, outputs, performance metrics, system events
• Data Residency – where the data is processed and stored (regulations, privacy
requirements, proximity of compute and data)
• Data Monitoring – data quality, identifying anomalies, data drift
• Data Analysis – statistical analysis, data visualization, exploration
• Data Retention – regulatory requirements, historical data for training, cost
Data Lineage
• Source Citation
-
• Datasets, databases, other sources
• Relevant licenses, terms of use, or permissions
-
-
• Helpful for transparency, traceability and accountability
-
-
-
-
AWS CloudTrail
• Provides governance, compliance and audit for your AWS Account
-
Summary
• IAM Users – mapped to a physical user, has a password for AWS Console
-
-
-
-
• AWS Lambda – serverless, Function as a Service, seamless scaling
-
-
-
-
• Inspector – find software vulnerabilities in EC2, ECR Images, and Lambda functions
-
• Artifact – get access to compliance reports such as PCI, ISO, etc…
• Trusted Advisor – to get insights, Support Plan adapted to your needs
Deep Learning
-
Neural Network
-
las conexiones se crean y se eliman, hablan entre ellos para saber que data pasar o no pasar a otra layer
-
-
Multi-modal model
Puede procesar y combinar texto, audio e imagenes at the same time
Training data
Label data, por ejemlo una imange de perro o gato. Esto es para Supervised Learging
-
-
Training
-
Validation set; se usa para hacer tune model parameter y validar el performance : 10-20% del data set
-
Feature Engineering
extraer y transformar raw data, por ejemplo, si tengo la fecha de nacimiento mejor la convierto a un entero con la edad.
En no estructurada data: ejemplo convertir el texto en numerical features usando tegnicas como embeddings
-
-
-
-
-
Model Fit - GRAB
• In case your model has poor performance, you need to look at its fit
-
• Underfitting : • Model performs poorly on training data : • Could be a problem of having a model too : simple or poor data features
-
Bias and Variance - GRAB
• Bias : • Difference or error between predicted and actual value : • Occurs due to the wrong choice in the ML process
• High Bias • The model doesn’t closely match the training data • Example: linear regression function on a non-linear dataset • Considered as underfitting
-
-
Variance
Model Fit How much the performance of a model changes if trained on a different dataset which has a similar distribution
Binay Classification
consusion matrix
-
Metricas
-
-
• F1 Score – Best when you want a balance between precision and recall, especially in imbalanced datasets
-
-
-
-
-
Hyperparameter
• Hyperparameter:
-
-
• Examples: learning rate, batch size, number of epochs, and regularization
• Hyperparameter tuning:
-
• Improves model accuracy, reduces overfitting, and enhances generalization
• How to do it?
• Grid search, random search
-
-