Please enable JavaScript.
Coggle requires JavaScript to display documents.
AI-102 - Coggle Diagram
AI-102
Build a question answering solution
Implement multi-turn conversation
you might need to ask follow-up questions to elicit more information from a user before presenting a definitive answer.
You can enable multi-turn responses when importing questions and answers from an existing web page or document based on its structure
When you define a follow-up prompt, you can link to an existing answer in the knowledge base or define a new answer specifically for the follow-up.
Or you can explicitly define follow-up prompts and responses for existing question and answer pairs.
Test and publish a knowledge base
Testing a knowledge base
You can test your knowledge base interactively in Language Studio, submitting questions and reviewing the answers that are returned. You can inspect the results to view their confidence scores as well as other potential answers.
Deploying a knowledge base
you can deploy it to a REST endpoint that client applications can use to submit questions and receive answers.
Create a knowledge base
you can use the REST API or SDK to write code
you can use the REST API or SDK to write code
it is more common to use the Language Studio web interface to define and manage a knowledge base.
To create a knowledge
Create a Language resource in your Azure subscription.
Enable the question answering feature.
Create or select an Azure Cognitive Search resource to host the knowledge base index.
In Language Studio, select the Language resource and create a Custom question answering project
Name the knowledge base.
Add one or more data sources to populate the knowledge base:
URLs for web pages containing FAQs.
Files containing structured text from which questions and answers can be derived
Pre-defined chit-chat datasets that include common conversational questions and responses in a specified style.
Create the knowledge base and edit question and answer pairs in the portal.
Use a knowledge base
To consume the published knowledge base, you can use the REST interface.
The response includes the closest question match that was found in the knowledge base, along with the associated answer, the confidence score, and other metadata
Compare question answering
to language understanding
when to use question answering
when to use the conversational language understanding capabilities of the Language service
there are some differences in the use cases that they are designed to address
The two services are in fact complementary. You can build comprehensive natural language solutions that combine conversational language understanding models and question answering knowledge bases.
Improve question
answering performance
improve its performance with active learning and by defining synonyms.
Use active learning
Active learning can help you make continuous improvements so that it gets better at answering user questions correctly over time.
Implicit feedback
As incoming requests are processed, the service identifies user-provided questions that have multiple, similarly scored matches in the knowledge base. These are automatically clustered as alternate phrase suggestions for the possible answers that you can accept or reject in the Suggestions page for your knowledge base in Language Studio.
Explicit feedback
you can control the number of possible question matches returned for the user's input by specifying the top parameter
You can implement logic in your client app to compare the score property values for the questions, and potentially present the questions to the user so they can positively identify the question closest to what they intended to ask
your app can use the REST API to send feedback containing suggested alternative phrasing based on the user's original input
The
qnaId
in the feedback corresponds to the id of the question the user identified as the correct match. The userId parameter is an identifier for the user and can be any value you choose, such as an email address or numeric identifier.
feedback will be presented in the active learning Suggestions page for your knowledge base in Language Studio for you to accept or reject.
Define synonyms
useful when question submitted by users might include multiple different words to mean the same thing
service can find an appropriate answer regardless of which term an individual customer uses
Understand question answering
The Language service includes a question answering capability,
define a knowledge base of question and answer pairs that can be queried using natural language input.
published to a REST
consumed by client applications, commonly bots.
knowledge base can be created from
Web sites containing frequently asked question (FAQ) documentation
Files containing structured text, such as brochures or user guides.
Built-in chit chat question and answer pairs that encapsulate common conversational exchanges.
Create a question answering bot
A bot is a conversational application that enables users to interact using natural language through one or more channels, such as email, web chat, voice messaging, or social media platform such as Microsoft Teams.
Question answering is often the starting point for bot development
Language Studio provides the option to easily create a bot that runs in the Azure Bot Service based on your knowledge base
To create a bot from your knowledge base, use Language Studio to deploy the bot and then use the Create Bot button to create a bot in your Azure subscription. You can then edit and customize your bot in the Azure portal.
Intro
enable users to ask questions using natural language, and receive appropriate answers
you will learn how to use the Language service to create a knowledge base of question and answer pairs that can support an application or bot.
Prepare for AI Enginnering
Understand capabilities of Azure Machine Learning
cloud-based platform for running experiments at scale to train predictive models from data, and publish the trained models as services.
Automated machine learning. This feature enables non-experts to quickly create an effective machine learning model from data.
Azure Machine Learning designer. A graphical interface enabling no-code development of machine learning solutions.
Data and compute management. Cloud-based data storage and compute resources that professional data scientists can use to run data experiment code at scale.
Pipelines. pipelines to orchestrate model training, deployment, and management tasks.
Data scientists
Ingest and prepare data.
Run experiments to explore data and train predective models
Deploy and manage trained models as webservice
Software engineers
Using Automated Machine Learning or Azure Machine Learning designer to train machine learning models and deploy them as REST services that can be integrated into AI-enabled applications.
Collaborating with data scientists to deploy models based on common frameworks such as Scikit-Learn, PyTorch, and TensorFlow as web services, and consume them in applications.
orchestrate DevOps processes that manage versioning, deployment, and testing of machine learning models as part of an overall application delivery solution.
Understand capabilities of Azure Cognitive Services
Cognitive Services
Speech
Text to speech
Speech to translation
Speech to text
Speaker Recognition
Vision
Image classification
Object detection
Video Analysis
Facial analysis
Image analysis
OCR
Decision
Anomaly detection
Content Moderation
Content personalization
Language
Text Analysis
Question Answering
Language understanding
Transalation
Applied AI Services
Azure Metrics Advisor
Azure Video Analyzer for Media
Azure Form Recognizer
Azure Immersive Reader
Azure Bot Service
Azure Cognitive Search
Understand capabilities of the Azure Bot Service
is an Applied AI service for developing and delivering bot solutions that support conversational interactions across multiple channels, such as web chat, email, Microsoft Teams, and others.
writing code
Bot Framework SDK
Bot Framework Composer to develop complex bots
agents that can engage in conversational interactions
https://azure.microsoft.com/en-us/services/bot-services/#overview
Understand considerations for responsible AI
Reliability and safety
Privacy and Security
Fairness
Inclusiveness
Transparency
Accountability
Understand considerations for AI Engineers
Probability and confidence scores
model can be accurate, but no predictive model is infallible
it's important to understand that predictions reflect statistical likelihood, not absolute truth
predictions have an associated confidence score that reflects the probability on which the prediction is being made
Software developers should make use of confidence score values to evaluate predictions and apply appropriate thresholds to optimize application reliability and mitigate the risk of predictions that may be made based on marginal probabilities
Responsible AI and ethics
mportant due to the nature of how AI systems work and inform decisions; often based on probabilistic models, which are in turn dependent on the data with which they were trained.
apply due consideration to mitigate risks and ensure fairness, reliability, and adequate protection from harm or discrimination.
Model training and inferencing
Many AI systems rely on predictive models that must be trained using sample data
relationships between the features in the data (the data values that will generally be present in new observations) and the label (the value that the model is being trained to predict)
Understand AI-related terms
Machine Learning
subset of data science that deals with the training and validation of predictive models. data scientist prepares the data and then uses it to train a model based on an algorithm that exploits the relationships between the features in the data to predict values for unknown labels.
use the data they have collected to train a model that predicts the annual growth or decline in population of a species based on factors such as the number of nesting sites observed, the area of land designated as protected, the human population in the local area, the daily volume of traffic on local roads, and so on
Artificial Intelligence
usually (but not always) builds on machine learning to create software that emulates one or more characteristics of human intelligence
a predictive model could be trained to analyze image data taken by motion-activated cameras in remote locations, and predict whether a photograph contains a sighting of the animal.
Data science
The data can then be analyzed, using statistical techniques to extrapolate from the samples to understand trends and relationships between human activities and wildlife
The data can then be analyzed, using statistical techniques to extrapolate from the samples to understand trends and relationships between human activities and wildlife
Understand capabilities of Azure Cognitive Search
enables you to ingest and index data from various sources
search the index to find, filter
sort information extracted from the source data
is an Applied AI Service
define an enrichment pipeline that uses AI skills to enhance the index with insights derived from the source data
the insights extracted by your enrichment pipeline can be persisted in a knowledge store
Define artificial intelligence
Decision making - The ability to use past experience and learned correlations to assess situations and take appropriate actions
Speech - The ability to recognize speech as input and synthesize spoken output. The combination of speech capabilities together with the ability to apply NLP analysis of text enables a form of human-compute interaction that's become known as conversational AI
Text analysis - The ability to use natural language processing (NLP) to not only "read", but also extract semantic meaning from text-based data.
Visual perception - The ability to use computer vision capabilities to accept, interpret, and process input from images, video streams, and live cameras.
AI as software that exhibits one or more human-like capabilities
Extract text from image and documents
Explore Computer Vision options for reading text
APIs
OCR API
Use this API to read small to medium volumes of text from images.
The API can read text in multiple languages.
Results are returned immediately from a single function call.
Read API
Use this API to read small to large volumes of text from images and PDF documents.
This API uses a newer model than the OCR API, resulting in greater accuracy.
The Read API can read printed text in multiple languages, and handwritten text in English.
The initial function call returns an asynchronous operation ID, which must be used in a subsequent call to retrieve the results.
both technologies via the REST API or a client library.
Use the OCR API
call the OCR REST function (or the equivalent SDK method) passing the image URL or binary image data, and specifying the language of the text to be detected (with a default value of en for English), .
and optionally the detectOrientation parameter to return information about orientation of the text in the image
Introduction
OCR allows you to extract text from images, such as photos of street signs and products, as well as from documents—invoices, bills, financial reports, articles, and more.
you need to train machine learning models to cover many use cases.
Use the Read API
call the Read REST function (or equivalent SDK method), passing the image URL or binary data, and optionally specifying the language the text is written in (with a default value of en for English).
The Read function returns an operation ID, which you can use in a subsequent call to the Get Read Results function in order to retrieve details of the text that has been read.
Depending on the volume of text, you may need to poll the Get Read Results function multiple times before the operation is complete.
the text is broken down by page, then line, and then word.
the text values are included at both the line and word levels, making it easier to read entire lines of text if you don't need to extract text at the individual word level.
implement text extraction solutions with images and documents using form recognizer service's OCR Test Tool, pre-built models, and custom models.
Extract data from forms
with Form Recognizer
Train custom models without labels (unsupervised)
The simplest way to train a custom model is to use an unsupervised learning technique in which you train the model using unlabeled sample forms.
This layout and field mapping information is then used to train a model that can extract data from similar forms.
To train a model with unlabeled sample forms
Upload at least 5 sample images or PDF forms to an Azure Storage blob container to use for training.
Generate a shared access security (SAS) URL for the container.
Use the Train Custom Model REST API function (or equivalent SDK method) to start training using the forms, passing the SAS URL for the container.
Use the Get Custom Model REST API function (or equivalent SDK method) to get the trained model ID.
You can generate a SAS URL for the container through the Azure Portal.
Train custom models with labels (supervised)
Supervised training requires an input of form documents and JSON documents.
To train a custom model using labeled sampled forms
Store sample forms in an Azure blob container, along with JSON files containing layout and label field information.
You can generate an ocr.json file for each sample form using the Form Recognizer's Analyze Layout function. Additionally, you need a single fields.json file describing the fields you want to extract, and a labels.json file for each sample form mapping the fields to their location in that form
Generate a shared access security (SAS) URL for the container.
Use the Train Custom Model REST API function (or equivalent SDK method) with the useLabelFile parameter set to true to train the model.
Use the Get Custom Model REST API function (or equivalent SDK method) to get the trained model ID.
OR
Use the Sample Labeling Tool to label and train.
if your form is complex, or you need to define explicit field mappings, you can use a supervised learning approach and train your model using labeled forms.
Understand prebuilt models
Receipts
Streamlining business expense reporting processes
Automating auditing and accounting tasks
Analyzing consumer behavior and shopping trends
Invoices (preview)
Processing paperwork in real time
Accelerating access to reliable data
Business Cards (preview)
Extracting contact information from business cards to quickly create phone contacts
Automatically creating contacts from images to integrate data with a CRM
Keeping track of sales leads
Use Form Recognizer models
Understanding confidence scores
If the form appearance varies, consider training more than one model, with each model focused on one form format.
If the form appearance varies, consider training more than one model, with each model focused on one form format.
make sure that the form you are analyzing has a similar appearance to forms in the training set if the confidence values of the pageResults are low.
If the form appearance varies, consider training more than one model, with each model focused on one form format.
confidence values of the readResults are low, try to improve the quality of your input documents.
Using a prebuilt model
use the Analyze REST API function
Analyze Receipt
Analyze Business Card
Analyze Invoice
This function starts the form analysis and returns a result ID, which you can pass in a subsequent call to the Get Analyze Result function to retrieve the results.
Get Analyze Receipt Result
Get Analyze Business Card Result
Get Analyze Invoice Result
A successful JSON response contains readResults and documentResults nodes.
The readResults node contains all of the recognized text. Text is organized by page, then by line, then by individual words.
The documentResults node contains the form-specific values that the model discovered. This is where you'll find useful key/value pairs like the first name, last name, company name and more.
Depending on the form analyzed, the response may also contain pageResults, which includes the tables extracted.
Using a custom model
This function starts the form analysis and returns a result ID, which you can pass in a subsequent call to the Get Analyze Form Result function to retrieve the results.
with your custom model ID (generated during model training).
To extract form data using a custom model, use the Analyze Form REST API function
If you trained the model using unlabeled sample forms, the results are returned in a pageResults node
If you used labeled forms to train the model, the results are returned in the documentResults node
Get started with Form Recognizer
A resource subscription
A selection of form files for data extraction
Subscribe to a resource
Understand Form Recognizer file input requirements
Format must be JPG, PNG, PDF (text or scanned), or TIFF.
File size must be less than 50 MB.
Image dimensions must be between 50 x 50 pixels and 10000 x 10000 pixels.
The total size of the training data set must be 500 pages or less.
https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/overview?tabs=v2-1#input-requirements
Decide what component of Form Recognizer to use
need to accomplish.
To use OCR capabilities to capture the layout of a form, use the Layout service. The Layout API will accurately extract the structured output from documents.
To create an application that extracts data from receipts, business cards, or invoices use a prebuilt model. These models do not need to be trained. Form Recognizer services analyze the documents and return a JSON output.
To create an application to extract data from your industry-specific forms, create a custom model. This model needs to be trained on sample documents. After training, the custom model can analyze new documents and return a JSON output.
Use the Form OCR Test Tool
FOTT can be used in the browser or deployed in a container.
Use Layout
FOTT's Layout service enables you to upload a file, analyze it, and download the extracted data in a JSON file and table file.
Create a form recognizer or cognitive service resource
Select the "Layout" feature in FOTT.
Analyze your document. You will need your form recognizer or cognitive service endpoint and key.
FOTT currently supports three types of projects:
Use prebuilt model to get data
Use Layout to get text, tables, and selection marks
Use Custom to train a model with labels and get key value pairs
Use prebuilt models
To extract data from common forms with FOTT's prebuilt models
Create a form recognizer or cognitive service resource
Select the "prebuilt models" feature in FOTT.
Analyze your document. You will need your form recognizer or cognitive service endpoint and key.
FOTT can be used to analyze form layouts, extract data from prebuilt models, and train custom models.
Use custom to train a model
When you use FOTT to build custom models, the ocr.json files, labels.json files, and fields.json file needed for supervised training are automatically created and stored in your storage account.
When you use FOTT to build custom models, the ocr.json files, labels.json files, and fields.json file needed for supervised training are automatically created and stored in your storage account.
Create a form recognizer or cognitive service resource
Collect at least 5-6 sample forms for training and upload them to your storage account container.
Generate a shared access security (SAS) URL for the container.
Configure cross-domain resource sharing (CORS). CORS enables FOTT to store labeled files in your storage container.
Select the "custom" feature in FOTT.
Start a new project using your storage container's SAS URL and form recognizer or cognitive service key.
Use FOTT to apply labels to text.
Train your model. Once the model is trained, you'll receive a Model ID and Average Accuracy for tags.
Test your model by analyzing a new form that was not used in training.
You can use FOTT's custom service for the entire process of training and testing custom models.
Form Recognizer services can be accessed through a user interface called the Form OCR Test Tool (FOTT).
What is Form Recognizer?
uses Optical Character Recognition (OCR) capabilities and deep learning models to extract text, key-value pairs, selection marks, and tables from documents.
OCR captures document structure by creating bounding boxes around detected objects in an image
REST APIs and client library SDKs that can be used to build intelligence into your applications.
boxes are recorded as coordinates in relation to the rest of the page
Form Recognizer provides underlying models that have been trained on thousands of form examples. The underlying models enable you to do high-accuracy data extraction from your forms with little to no model training.
Form Recognizer service components
Services
Layout Service: takes an input of JPEG, PNG, PDF, and TIFF files. Returns a JSON file with the location of text in bounding boxes, text content, tables, selection marks (also known as checkboxes or radio buttons), and document structure.
Prebuilt Models: prebuilt models detect and extract information from document images and return the extracted data in a structured JSON output.
Receipts
Business Cards (in preview)
Invoices (in preview)
Custom Models: custom models extract data from forms specific to your business. Custom models can be trained by calling the Train Custom Model API.
Unsupervised learning (with unlabeled forms)
Supervised learning (with labeled forms)
Access services with the client library SDKs or REST API
You can access Form Recognizer services by using a REST API or client library SDKs to integrate the services into your workflow or application.
Form Recognizer services are also supported by a user interface known as the Form OCR Test Tool (FOTT) that can do layout extraction and model training.
https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/how-to-guides/try-sdk-rest-api?tabs=preview,v2-1&pivots=programming-language-rest-api
Introduction
Many people still manually extract data from forms to exchange information.
instances
When filing claims
When enrolling new patients in an online management system
When entering data from receipts to an expense report
When reviewing an operations report for anomalies
When selecting data from a report to give to a stakeholder
Azure Form Recognizer is a Vision API that extracts key-value pairs and table data from form documents.
Uses of the Form Recognizer service include:
Process automation
Knowledge mining
Industry-specific applications
uses machine learning technology to identify and extract key-value pairs and table data from form documents with accuracy, at scale
Create computer vision solutions
Classify images
Provision Azure resources for custom vision
to build your own computer vision models for image classification or object detection.
Use existing (labeled) images to train a Custom Vision model.
Create a client application that submits new images to your model to generate predictions.
two kinds of Azure resource
A training resource
A Custom Vision (Training) resource.
A Cognitive Services resource.
A prediction resource,
A Cognitive Services resource.
A Custom Vision (Prediction) resource.
You can use a single Cognitive Services resource for both training and prediction, and you can mix-and-match resource types (for example, using a Custom Vision (Training) resource to train a model that you then publish using a Cognitive Services resource).
Understand image classification
Models can be trained for multiclass classification (in other words, there are multiple classes, but each image can belong to only one class) or multilabel classification (in other words, an image might be associated with multiple labels).
a model is trained to predict a class label for an image based on its contents. Usually, the class label relates to the main subject of the image.
Intro
Custom Vision service enables you to build your own computer vision models for image classification.
requires software to analyze an image in order to categorize (or classify) it.
Train an image classifier
you can use the Custom Vision portal, the Custom Vision REST API or SDK, or a combination of both approaches.
portal provides a graphical interface that you can use to
Create an image classification project for your model and associate it with a training resource.
Upload images, assigning class label tags to them.
Review and edit tagged images.
Train and evaluate a classification model.
Test a trained model.
Publish a trained model to a prediction resource.
The REST API and SDKs enable you to perform the same tasks by writing code, which is useful if you need to automate model training and publishing as part of a DevOps process.
Image classification is used to determine the main subject of an image. You can use the Custom Vision services to train a model that classifies images based on your own categorizations.
Detect objects in images
Understand object detection
a model is trained to detect the presence and location of one or more classes of object in an image
components
The class label of each object detected in the image.
The location of each object within the image, indicated as coordinates of a bounding box that encloses the object
Use the Custom Vision service for object detection
A training resource
A Cognitive Services resource.
A Custom Vision (Training) resource.
A prediction resource
A Cognitive Services resource.
A Custom Vision (Prediction) resource.
You can use a single Cognitive Services resource for both training and prediction, and you can mix-and-match resource types (for example, using a Custom Vision (Training) resource to train a model that you then publish using a Cognitive Services resource).
Train an object detector
use the REST API or SDK to write code that performs the training tasks.
to train an object detection model, you can use the Custom Vision portal to upload and label images before training, evaluating, testing, and publishing the model;
The most significant difference between training an image classification model and training an object detection model is the labeling of the images with tags
image classification requires one or more tags that apply to the whole image
object detection requires that each label consists of a tag and a region that defines the bounding box for each object in an image.
Intro
Object detection is a common computer vision problem that requires software to identify the location of specific classes of object in an image
Consider options for labeling images
automatically suggests regions that contain objects, to which you can assign tags or adjust by dragging the bounding box to enclose the object you want to label.
Bounding box measurement units
if you choose to use a labeling tool other than the Custom Vision portal, you may need to adjust the output to match the measurement units expected by the Custom Vision API.
use the interactive interface in the Custom Vision portal.
Subsequent labeling of new images can benefit from the smart labeler tool in the portal, which can suggest not only the regions, but the classes of object they contain.
you can use a labeling tool, such as the one provided in Azure Machine Learning Studio or the Microsoft Visual Object Tagging Tool (VOTT), to take advantage of other features, such as assigning image labeling tasks to multiple team members.
Object detection is used to locate and identify objects in images. You can use Custom Vision to train a model to detect specific classes of object in images.
Analyze video
Azure Video Analyzer for Media is a service to extract insights from video, including face identification, text recognition, object labels, scene segmentations, and more
Intro
a great deal of information is encapsulated in video files, and you may need to extract this information for analysis or to support indexing for searchability
Understand Video Analyzer for Media capabilities
Facial recognition - detecting the presence of individual people in the image.
Optical character recognition - reading text in the video.
Speech transcription - creating a text transcript of spoken dialog in the video.
Topics - identification of key topics discussed in the video.
Sentiment - analysis of how positive or negative segments within the video are.
Labels - label tags that identify key objects or themes throughout the video.
Content moderation - detection of adult or violent themes in the video.
Scene segmentation - a breakdown of the video into its constituent scenes.
You can use a free, standalone version of the Video Analyzer service (with some limitations), or you can connect it to an Azure Media Services resource in your Azure subscription for full functionality.
Extract custom insights
Video Analyzer for Media includes predefined models that can recognize well-known celebrities and brands, and transcribe spoken phrases into text
extend the recognition capabilities of Video Analyzer by creating custom models for
People. Add images of the faces of people you want to recognize in videos, and train a model. Video Indexer will then recognize these people in all of your videos.
Language. If your organization uses specific terminology that may not be in common usage, you can train a custom model to detect and transcribe it.
Brands. You can train a model to recognize specific names as brands, for example to identify products, projects, or companies that are relevant to your business.
Animated characters. In addition to recognizing human individuals, you may want to be able to detect the presence of individual animated characters in a video.
Use Video Analyzer widgets and APIs
While you can perform all video analysis tasks in the Video Analyzer for Media portal, you may want to incorporate the service into custom applications.
Video Analyzer for Media widgets
. You can use this technique to share insights from specific videos with others without giving them full access to your account
Video Analyzer for Media portal to play, analyze, and edit videos can be embedded in your own custom HTML interfaces.
Video Analyzer for Media API
Video Analyzer for Media provides a REST API that you can subscribe to in order to get a subscription key
Consume the REST API and automate video indexing tasks, such as uploading and indexing videos, retrieving insights, and determining endpoints for Video Analyzer widgets.
Detect, analyze, and recognize faces
Detect faces with the computer vision service
call the Analyze Image REST function (or equivalent SDK method), specifying Faces as one of the visual features to be returned.
images that contain one or more faces, the response includes details of their location in the image and the predicted age and gender of the detected person
Understand capabilities of the face service
functionality
Face detection - for each detected face, the results include an ID that identifies the face and the bounding box coordinates indicating its location in the image
Face attribute analysis - you can return a wide range of facial attributes, including
Facial landmark location - coordinates for key landmarks in relation to facial features (for example, eye corners, pupils, tip of nose, and so on)
Face comparison - you can compare faces across multiple images for similarity (to find individuals with similar facial features) and verification (to determine that a face in one image is the same person as a face in another image)
Facial recognition - you can train a model with a collection of faces belonging to specific individuals, and use the model to identify those people in new images.
You can provision Face as a single-service resource, or you can use the Face API in a multi-service Cognitive Services resource.
The Face service provides comprehensive facial detection, analysis, and recognition capabilities.
Understand considerations for face analysis
Data privacy and security.
Facial data is personally identifiable, and should be considered sensitive and private. You should ensure that you have implemented adequate protection for facial data used for model training and inferencing.
Transparency
. Ensure that users are informed about how their facial data will be used, and who will have access to it.
Fairness and inclusiveness.
Ensure that you face-based system cannot be used in a manner that is prejudicial to individuals based on their appearance, or to unfairly target individuals.
Compare and match detected faces
When a face is detected by the Face service, an ID is assigned to it and retained in the service resource for 24 hours. The ID is a GUID, with no indication of the individual's identity other than their facial features.
While the detected face ID is cached, subsequent images can be used to compare the new faces to the cached identity and determine if they are similar (in other words, they share similar facial features) or to verify that the same person appears in two images.
This ability to compare faces anonymously can be useful in systems where it's important to confirm that the same person is present on two occasions
taking images of people as they enter and leave a secured space to verify that everyone who entered leaves.
Identify options for face detection
analysis and identification
The Computer Vision service
detect human faces in an image, returning a bounding box for its location. It also returns some facial feature information about the detected face; specifically, predictions for:
Gender
Age
The Face service
Face detection (with bounding box).
Comprehensive facial feature analysis (including age, gender, emotional state, head pose, hair color, presence of facial hair, presence of spectacles, and others).
Face comparison and verification.
Facial recognition.
Implement facial recognition
To train a facial recognition model
Create a Person Group that defines the set of individuals you want to identify (for example, employees).
Add a Person to the Person Group for each individual you want to identify.
Add detected faces from multiple images to each person, preferably in various poses. The IDs of these faces will no longer expire after 24 hours (so they're now referred to as persisted faces).
Train the model.
The trained model is stored in your Face (or Cognitive Services) resource, and can be used by client applications to:
Identify individuals in images.
Verify the identify of a detected face.
Analyze new images to find faces that are similar to a known, persisted face.
Introduction
.
Face detection, analysis, and recognition is a common computer vision challenge for AI systems. The ability to detect when a person is present, identify a person's emotional state, or recognize an individual based on their facial features is a key way in which AI systems can exhibit human-like behavior and build empathy with users
https://docs.microsoft.com/en-us/azure/cognitive-services/face/
Analyze images
Provision a Computer Vision resource
Description and tag generation - determining an appropriate caption for an image, and identifying relevant "tags" that can be used as keywords to indicate its subject.
Object detection - detecting the presence and location of specific objects within the image.
Face detection - detecting the presence, location, and features of human faces in the image.
Image metadata, color, and type analysis - determining the format and size of an image, its dominant color palette, and whether it contains clip art.
Category identification - identifying an appropriate categorization for the image, and if it contains any known celebrities or landmarks.
Brand detection - detecting the presence of any known brands or logos.
Moderation rating - determine if the image includes any adult or violent content.
Optical character recognition - reading text in the image.
Smart thumbnail generation - identifying the main region of interest in the image to create a smaller "thumbnail" version.
You can provision Computer Vision as a single-service resource, or you can use the Computer Vision API in a multi-service Cognitive Services resource
Analyze an image
you can use the Analyze Image REST method or the equivalent method in the SDK for your preferred programming language
specifying the visual features you want to include in the analysis (and if you select categories, whether or not to include details of celebrities or landmarks).
Intro
Computer Vision is a branch of artificial intelligence (AI) in which software interprets visual input, often from images or video feeds.
Computer Vision cognitive service in Microsoft Azure to extract information from images.
Generate a smart-cropped thumbnail
Computer Vision service enables you to create a thumbnail with different dimensions (and aspect ratio) from the source image
You can generate thumbnails with a width and height up to 1024 pixels, with a recommended minimum size of 50x50
Thumbnails are often used to provide smaller versions of images in applications and websites.
Provision & manage Azure Cognitive Services
Create and Consume
Cognitive Services
Decision
Content Moderator
Personalizer
Anomaly Detector
Vision
Computer Vision
Custom Vision
Face
Speech
Language
Language
Translator
Applied AI Services
Azure Video Analyzer for Media - A video analysts solution build on the Video Indexer cognitive service.
Azure Immersive Reader - A reading solution that supports people of all ages and abilities.
Azure Bot Service - A cloud service for delivering conversational AI solutions, or bots.
Azure Metrics Advisor - built on the Anomaly Detector that simplifies real-time monitoring and response to critical metrics
Azure Form Recognizer. (OCR) solution that can extract semantic meaning from forms, such as invoices, receipts, and others
Azure Cognitive Search - A cloud-scale search solution that uses cognitive services to extract insights from data and documents.
Provision service resource
Multi-service resource
enables you to manage a single set of access credentials to consume multiple services at a single endpoint
a single point of billing for usage of all services.
Single-service resource
a single point of billing for usage of all services.
enables you to use separate endpoints for each service
manage access credentials for each service independently
billing separately for each service
Single-service resources generally offer a free tier (with usage restrictions)
Identify endpoints and keys
The endpoint URI
A subscription key
The resource location
. While most SDKs use the endpoint URI to connect to the service, some require the location.
Use a REST API
service functions can be called by submitting data in JSON format over an HTTP request, which may be a POST, PUT, or GET request depending on the specific function being called
any programming language or tool capable of submitting and receiving JSON over HTTP can be used to consume cognitive services
Use an SDK
Each SDK includes packages that you can install in order to use service-specific libraries in your code, and online documentation to help you determine the appropriate classes
Software development kits (SDKs) for common programming languages abstract the REST interfaces for most cognitive services.
Secure Cognitive Services
Authentication
Regenerate Keys
Aure portal
az cognitiveservices account keys regenerate
Procedure to regenerate keys:
Configure all production applications to use key 2.
Regenerate key 1
Switch all production applications to use the newly regenerated key 1.
Regenerate key 2.
By default, access to cognitive services resources is restricted by using subscription keys.
Azure Key Vault
store the subscription keys for a cognitive services resource
assign a managed identity to client applications that need to use the service
applications can then retrieve the key as needed from the key vault, without risk of exposing it to unauthorized users
Token-based authentication
sing the REST interface, some Cognitive Services support (or even require) token-based authentication
the subscription key is presented in an initial request to obtain an authentication token, which has a valid period of 10 minutes
Subsequent requests must present the token to validate that the caller has been authenticated
Azure Active Directory
Some Cognitive Services support Azure Active Directory authentication, enabling you to grant access to specific service principals or managed identities for apps and services running in Azure.
Documentation:
https://docs.microsoft.com/en-us/azure/cognitive-services/authentication?tabs=powershell
Network Security
to ensure unauthorized users cannot reach the services that you are protecting
Limiting what users can see is always a great idea
individual Cognitive Services resources can be configured to restrict access to specific network addresses
IP address that is not allowed will receive an Access Denied error
Documentation:
https://docs.microsoft.com/en-us/azure/cognitive-services/cognitive-services-security?tabs=command-line%2Ccsharp
Monitor Cognitive Services
Create alerts
to create alert rule
you must specify:
A condition on which the alert is triggered
based on a signal type
Activity Log (an entry in the activity log created by an action performed on the resource, such as regenerating its subscription keys)
Metric (a metric threshold such as the number of errors exceeding 10 in an hour).
Optional actions, such as sending an email to an administrator notifying them of the alert, or running an Azure Logic App to address the issue automatically.
The scope of the alert rule - in other words, the resource you want to monitor.
Alert rule details, such as a name for the alert rule and the resource group
creation of alert rules
based on events or metric thresholds
notifications
alerts
View metrics
Azure portal - Metrics page
add resource-specific metrics to charts.
an empty chart is created for you
add more charts as required
you can share it by exporting it to Excel or copying a link to it, and you can clone it to create a duplicate chart in the Metrics page
collects metrics relating to endpoint requests, data submitted and returned, errors, and other useful measurements
Documentation:
https://docs.microsoft.com/en-us/azure/azure-portal/azure-portal-dashboards
Monitor cost
gain cost efficiencies by only paying for services as you use them
The specific billing rate depends on the resource type
Plan Costs
you can estimate costs by using the Azure Pricing Calculator
select the specific cognitive service API you plan to use (for example, Text Analytics), the region where you plan to provision it, and the pricing tier of the instance you plan to use; and fill in the expected usage metrics and support option
View Costs
view overall costs for the subscription by selecting the Cost analysis tab
add a filter that restricts the data to reflect resources with a service name of azure cognitive services
Documentation:
https://docs.microsoft.com/en-us/azure/cognitive-services/plan-manage-costs
Manage diagnostic logging
need a destination for the log data
Event Hub in order to then forward the data on to a custom telemetry solution
connect directly to some third-party solutions
diagnostic log storage
Azure Log Analytics. enables you to query and visualize log data within the Azure porta
Azure Storage - a cloud-based data store to store log archives (which can be exported for analysis in other tools as needed).
diagnostic settings
A name for your diagnostic settings.
The categories of log event data that you want to capture.
Details of the destinations in which you want to store the log data.
capture rich operational data for a Cognitive Services resource, which can be used to analyze service usage and troubleshoot problems
View log data
It can take an hour or more before diagnostic data starts flowing to the destinations
view it in your Azure log Analytics resource by running queries
Documentation:
https://docs.microsoft.com/en-us/azure/cognitive-services/diagnostic-logging
Documentation:
https://docs.microsoft.com/en-us/azure/cognitive-services/what-are-cognitive-services
Deploy cognitive services in containers
data can stay on your local network and not be passed to the cloud. Deploying Cognitive Services in a container on-premises will also decrease the latency between the service and your local data, which can improve performance
Understand containers
Container deployment
A Docker* server.
An Azure Container Instance (ACI).
An Azure Kubernetes Service (AKS) cluster.
Docker is an open source solution for container development and management that includes a server engine that you can use to host containers
Container
definiton: A container comprises an application or service and the runtime components needed to run it, while abstracting the underlying operating system and hardware
Containers are portable across hosts, which may be running different operating systems or use different hardware - making it easier to move an application and all its dependencies
A single container host can support multiple isolated containers, each with its own specific runtime configuration - making it easier to consolidate multiple applications that have different configuration requirement.
container is encapsulated in a container image that defines the software and configuration it must support
Images can be stored in a central registry, such as Docker Hub, or you can maintain a set of images in your own registry.
Use Cognitive Services containers
The container image for the specific Cognitive Services API you want to use is downloaded and deployed to a container host, such as a local Docker server, an Azure Container Instance (ACI), or Azure Kubernetes Service (AKS).
Client applications submit data to the endpoint provided by the containerized service, and retrieve results just as they would from a Cognitive Services cloud resource in Azure.
Periodically, usage metrics for the containerized service are sent to a Cognitive Services resource in Azure in order to calculate billing for the service.
the container must be able to connect to the Cognitive Services resource in Azure periodically to send usage metrics for billing
Cognitive Services
container images
Language service
Key Phrase Extraction. Image: mcr.microsoft.com/azure-cognitive-services/keyphrase
Language Detection: mcr.microsoft.com/azure-cognitive-services/language
Sentiment Analysis v3 (English): mcr.microsoft.com/azure-cognitive-services/sentiment:3.0-en
Language detection, translation, and sentiment analysis are each separate container images.
container configuration.
Deploy image to a host.
Billing
Endpoint URI from your deployed Azure Cognitive Service; used for billing.
Eula
Value of accept to state you accept the license for the container.
ApiKey
Key from your deployed Azure Cognitive Service; used for billing.
Consuming Cognitive
Services from a Container
applications consume the containerized Cognitive Services endpoint
client application must be configured with the appropriate endpoint for your container, but does not need to provide a subscription key to be authenticated
You can implement your own authentication solution and apply network security restrictions as appropriate for your specific application scenario.
Documentation:
https://docs.microsoft.com/en-us/azure/cognitive-services/cognitive-services-container-support
Create a Language Understanding solution
Create a Language Understanding App
Intro
natural language understanding (NLU) deals with the problem of determining semantic meaning from natural language
design pattern
An app accepts natural language input from a user
A language model is used to determine semantic meaning (the user's intent)
The app performs an appropriate action
Language Understanding service enables developers to build apps based on language models that can be trained with a relatively small number of samples to discern a user's intended meaning
Provision Azure resources
for Language Understanding
you require two kinds of resource
in your Azure subscription
An Authoring resource. that you can use to train your language understanding model.
A Prediction resource. (which can be a Language Understanding - Prediction or Cognitive Services resource) to host your trained model and process requests from client applications.
Creating both the Authoring and Prediction resources is recommended when you intend to publish a prediction endpoint for client applications because they each have transaction and request limits.
The Authoring resource prioritizes training the language understanding model by giving you authoring transactions and fewer prediction endpoint requests per month for the purposes of testing
The Prediction resource prioritizes the client requests by giving you more prediction endpoint requests per month than the Authoring resource for the purposes of supporting client applications.
https://docs.microsoft.com/en-us/azure/cognitive-services/luis/luis-how-to-azure-subscription?tabs=portal
Authoring resources can be created in one of three global geographic areas
Asia Pacific (the resource is created in the Australia East Azure region)
Europe (the resource is created in the West Europe Azure region)
US (the resource is created in the West US Azure region)
To deploy a model, your prediction resource must be in an Azure location within the geographical area served by the authoring resource:
Asia Pacific
:
Australia East
Europe
France Central
North Europe
West Europe
UK South
US
All other locations
Define intents and utterances
You create a model by defining intents and associating them with one or more utterances.
Examples
GetTime
"What time is it?"
"What is the time?"
"Tell me the time"
GetWeather
"What is the weather forecast?"
"Do I need an umbrella?"
"Will it snow?"
TurnOnDevice
"Turn the light on."
"Switch on the light."
"Turn on the fan"
None
"Hello"
"Goodbye"
An intent represents a task or action the user wants to perform, or more simply the meaning of an utterance.
spend some time thinking about the domain your model must supportt, and the kinds of actions or information that users might request.
Utterances are the phrases that a user might enter when interacting with an application that uses your Language Understanding model
every model includes None intent that you should use to explicitly identify utterances that a user might submit, but for which there is no specific action required- or that fall outside of the scope of the domain for this model
Collect utterances that you think users will enter; including utterances that mean the same thing, but that are constructed in different ways
guidelines
Capture multiple different examples, or alternative ways of saying the same thing
Vary the length of the utterances from short, to medium, to long
Vary the location of the noun or subject of the utterance. Place it at the beginning, the end, or somewhere in between
Use correct grammar and incorrect grammar in different utterances to offer good training data examples
Follow the good utterances guidance in the Language Understanding documentation.
Define entities
Entity types
List entities are useful when you need an entity with a specific set of possible values
Regular Expression or RegEx entities are useful when an entity can be identified by matching a particular format of string.
Pattern.any() entities are used with patterns
Machine learned entities are the most flexible kind of entity, and should be used in most cases
Entities
Entities are used to add specific context to intents. you might define a TurnOnDevice intent that can be applied to multiple devices, and use entities to define the different devices.
Use patterns to differentiate similar utterances
TurnOnDevice
:
"Turn the {DeviceName} on."
"Switch the {DeviceName} on."
"Turn on the {DeviceName}."
GetDeviceStatus
: "Is the {DeviceName} on[?]"
TurnOffDevice
:
"Turn the {DeviceName} off."
"Switch the {DeviceName} off."
"Turn off the {DeviceName}."
utterances include a placeholder for a Pattern.any() entity named DeviceName. reducing the number of utterances required to train the model.
You could associate utterances for every possible entity with all three intents. However, a more efficient way to train the model is to define patterns that include utterance templates, like this:
The patterns defined in the utterance templates, help the model identify the intents and entity values from fewer samples:
"Turn the kitchen light on." (TurnOnDevice)
"Is the kitchen light on?" (GetDeviceStatus)
"Turn the kitchen light off." (TurnOffDevice)
These utterances are syntactically similar, with only a few differences in words or punctuation.
https://docs.microsoft.com/en-us/azure/cognitive-services/LUIS/concepts/patterns-features
a model might contain multiple intents for which utterances are likely to be similar. You can use patterns to disambiguate the intents while minimizing the number of sample utterances.
Use pre-built models
you can use prebuilt model elements that encapsulate common intents and entities.
prebuilt model elements at
three different levels of granularity
Prebuilt Domains define complete language understanding models that include predefined intents, utterances, and entities. Prebuilt domains include Calendar, Email, Weather, RestaurantReservation, HomeAutomation, and others.
Prebuilt Intents include predefined intents and utterances, such as CreateCalendarEntry, SendEmail, TurnOn, AddToDo, and others.
Prebuilt Entities define commonly used entities, such as Age, Email, PersonName, Number, Geography, DateTime, and others.
You can create your own language models by defining all the intents and utterances it requires
Using prebuilt model elements can significantly reduce the time it takes to develop a language understanding solution.
Train test publish and review a Language Understanding app
Train a model to learn intents and entities from sample utterances.
Test the model interactively, or by submitting a batch of utterances with known intent labels and comparing the predicted intents to the known label.
Publish a trained model to a prediction resource and use it from client applications.
Review the predictions made by the model based on user input and apply active learning to correct misidentified intents or entities and improve the model.
Natural language processing (NLP) solutions use language models to interpret the semantic meaning of written or spoken language
Publish and use a Language Understanding app
Set publishing configuration options
Publishing slot
Staging
. Use this slot to publish and test new versions of your language model without disrupting production
Production
. Use this slot for "live" models that are used by production applications
Publish settings
Sentiment analysis
. Enable this to include a sentiment score from 0 (negative) to 1 (positive) in predictions. This score reflects the sentiment of the input utterance.
Spelling correction
. Enable this to use the Bing Spell Check service to correct the spelling on input utterances before intent prediction.
Speech priming
. Enable this if you plan to use the language model with the Speech service. This option sends the model to the Speech service ahead of prediction to improve intent recognition from spoken input.
Process predictions
consume your Language
Understanding model
REST APIs
programming language-specific SDKs
published slot
(production or staging)
include the following
parameter
query - the utterance text to be analyzed.
show-all-intents - indicates whether to include all identified intents and their scores, or only the most likely intent.
verbose - used to include additional metadata in the results, such as the start index and length of strings identified as entities,
log used to record queries and results for use in Active Learning.
Prediction results
consist of a hierarchy of information that your application must parse
the REST interface, the results are in JSON form
SDKs present the results as an object hierarchy based on the underlying JSON
Intro
publish the Language Understanding app for your model and consume it from a client application or bot
Use a container
The Language Understanding service can also be deployed as a container
local Docker host
Azure Container Instance (ACI)
Azure Kubernetes Service (AKS) cluster.
steps you need to perform to use a Language understanding app in a container
Export the model for a container
Run the container
with required parameters
Prediction endpoint for billing
Prediction key
EULA acceptance
Mount points (input for exported model, output for logs)
Download the container image
Use the container to predict intents for client apps
Use the container to predict intents for client apps
Downloading the container image
use the docker command line tool
docker pull mcr.microsoft.com/azure-cognitive-services/language/luis:latest
Export the Language Understanding app
you can export a published app directly from its endpoint using an HTTP GET request
Ejemplo:
GET /luis/api/v2.0/package/{APP_ID}/slot/{SLOT_NAME}/gzip HTTP/1.1
Host: {AZURE_REGION}.api.cognitive.microsoft.com
Ocp-Apim-Subscription-Key: {AUTHORING_KEY}
The exported package is in *.gz (GZIP) format, which is what the container image expects.
export a model from the Language Understanding portal by selecting the Export for container
deploy your Language Understanding app in a container, export it in the appropriate packaged format.
Run the container
Ejemplo:
docker run --rm -it -p 5000:5000 ^
--memory 4g ^
--cpus 2 ^
--mount type=bind,src=c:\input,target=/input ^
--mount type=bind,src=c:\output\,target=/output ^
mcr.microsoft.com/azure-cognitive-services/language/luis ^
Eula=accept ^
Billing={ENDPOINT_URI}
ApiKey={API_KEY}
use the
docker run
command
The
mount
parameters enable the container to access local folders. Specifically, the input mount must reference the folder containing your exported Language Understanding app package, and the output folder is where the service will write logs (including Language Understanding query logs that you can use for active learning).
The
Eula, Billing, and ApiKey
parameters are used the same way they are for any Cognitive Services container - specifying acceptance of the license agreement, the prediction endpoint to which usage data should be sent for billing, and a valid subscription key for your prediction resource.
Use Language Understanding
with Speech
Understand Language Understanding
and Speech service integration
determines semantic intent from natural language input. Often, that input is in the form of spoken language
Speech SDK
used with the Speech service
used with the Language Understanding service
enabling you to use a language model to predict intents from spoken input
To use the Speech SDK with a Language Understanding model, enable the Speech priming publishing setting for your Language Understanding endpoint
Perform intent recognition
with the speech SDK
follow this pattern:
Use a
SpeechConfig
object to encapsulate the information required to connect to your Language Understanding prediction resource (not a Speech resource). Specifically, the
SpeechConfig
must be configured with the
location
and
key
of the Language Understanding prediction resource.
Optionally, use an
AudioConfig
to define the input source for the speech to be analyzed. By default, this is the default system microphone, but you can also specify an audio file.
Use the
SpeechConfig
and
AudioConfig
to create an
IntentRecognizer
object, and add the model and the intents you want to recognize to its configuration.
Use the methods of the
IntentRecognizer
object to submit utterances to the Language understanding prediction endpoint. For example, the
RecognizeOnceAsync
() method submits a single spoken utterance.
Process the response. In the case of the
RecognizeOnceAsync
() method, the result is an
IntentRecognitionResult
object that includes the following properties:
Duration
IntendId
OffsetInTicks
Properties
Reason
ResultId
Text
If the operation was successful, the Reason property has the enumerated value RecognizedIntent, and the IntentId property contains the top intent name.
Properties property, includes the full JSON prediction.
Other possible values for Result include RecognizedSpeech, which indicates that the speech was successfully transcribed (the transcription is in the Text property), but no matching intent was identified. If the result is NoMatch, the audio was successfully parsed but no speech was recognized, and if the result is Canceled, an error occurred (in which case, you can check the Properties collection for the CancellationReason property to determine what went wrong.)
Intro
determines semantic intent from natural language input. Often, that input is in the form of spoken language
Implement knowledge mining with Azure Cognitive Search
Crete a custom skill for
Azure Cognitive Search
Create a custom skill
must implement the expected schema for input and output data that is expected by skills in an Azure Cognitive Search skillset.
Input Schema
defines a JSON structure containing a record for each document to be processed. Each document has a unique identified, and a data payload with one or more inputs, like this:
Output schema
the output will contain a record for each input record, with either the results produced by the skill or details of any errors that occurred.
output value in this schema is a property bag that can contain any JSON structure, reflecting the fact that index fields are not necessarily simple data values, but can contain complex types.
Add a custom skill to a skillset
To integrate a custom skill into your indexing solution, you must add a skill for it to a skillset using the Custom.WebApiSkill skill type.
Specify the URI to your web API endpoint, including parameters and headers if necessary.
Set the context to specify at which point in the document hierarchy the skill should be called
Assign input values, usually from existing document fields
Store output in a new field, optionally specifying a target field name (otherwise the output name is used)
Introduction
there may be occasions when you have specific data extraction needs that cannot be met with the predefined skills and require some custom functionality.
Integrate the Form Recognizer service to extract data from forms
Consume an Azure Machine Learning model to integrate predicted values into an index
Any other custom logic
how to implement a custom skill as an Azure Function, and integrate it into an Azure Cognitive Search skillset.
https://docs.microsoft.com/en-us/learn/modules/create-azure-cognitive-search-solution/
Create a knowledge store
with Azure Cognitive Search
Define projections
Using the Shaper skill
The process of indexing incrementally creates a complex document that contains the various output fields from the skills in the skillset. This can result in a schema that is difficult to work with, and which includes collections of primitive data values that don't map easily to well-formed JSON.
To simplify the mapping of these field values to projections in a knowledge store, it's common to use the Shaper skill to create a new, field containing a simpler structure for the fields you want to map to projections.
The resulting JSON document is well-formed, and easier to map to a projection in a knowledge store than the more complex document that has been built iteratively by the previous skills in the enrichment pipeline.
skill in your skillset iteratively builds a JSON representation of the enriched data for the documents being indexed, and you can persist some or all of the fields in the document as projections.
based on the document structures generated by the enrichment pipeline in your indexing process
Define a knowledge store
projections
object projections,
table projections,
and file projections
define a separate projection for each type of projection, even though each projection contains lists for tables, objects, and files.
create a knowledgeStore object in the skillset that specifies the Azure Storage connection string for the storage account where you want to create projections, and the definitions of the projections themselves.
Projection types are mutually exclusive in a projection definition, so only one of the projection type lists can be populated.
For object and file projections, the specified container will be created if it does not already exist
Azure Storage table will be created for each table projection, with the mapped fields and a unique key field with the name specified in the generatedKeyName property.
Introduction
The data enrichments performed by the skills in the pipeline supplement the source data with insights such as:
The language in which a document is written.
Key phrases that might help determine the main themes or topics discussed in a document.
A sentiment score that quantifies how positive or negative a document is.
Specific locations, people, organizations, or landmarks mentioned in the content.
AI-generated descriptions of images, or image text extracted by optical character recognition (OCR).
Knowledge stores
While the index might be considered the primary output from an indexing process, the enriched data it contains might also be useful in other ways.
Since the index is essentially a collection of JSON objects, each representing an indexed record, it might be useful to export the objects as JSON files for integration into a data orchestration process using tools such as Azure Data Factory.
You may want to normalize the index records into a relational schema of tables for analysis and reporting with tools such as Microsoft Power BI.
Having extracted embedded images from documents during the indexing process, you might want to save those images as files.
Azure Cognitive Search supports these scenarios by enabling you to define a knowledge store in the skillset that encapsulates your enrichment pipeline.
knowledge store consists of projections of the enriched data, which can be JSON objects, tables, or image files
indexer runs the pipeline to create or update an index, the projections are generated and persisted in the knowledge store.
Create an Azure
Cognitive Search solution
Enhance the index
Search-as-you-type
adding a suggester to an index, you can enable two forms of search-as-you-type experience
Suggestions
- retrieve and display a list of suggested results as the user types into the search box, without needing to submit the search query.
Autocomplete
- complete partially typed search terms based on values in index fields.
To implement one or both of these capabilities, create or update an index, defining a suggester for one or more fields.
Custom scoring and result boosting
search results are sorted by a relevance score that is calculated based on a term-frequency/inverse-document-frequency (TF/IDF) algorithm
customize the way this score is calculated by defining a scoring profile that applies a weighting value to specific fields - essentially increasing the search score for documents when the search term is found in those fields.
boost results based on field values - for example, increasing the relevancy score for documents based on how recently they were modified or their file size.
After you've defined a scoring profile, you can specify its use in an individual search, or you can modify an index definition so that it uses your custom scoring profile by default.
Azure Cognitive Search supports several ways to enhance an index to provide a better user experience
Synonyms
the same thing can be referred to in multiple ways
*To be accurate, the UK and Great Britain are different entities - but they're commonly confused with one another; so it's reasonable to assume that someone searching for "United Kingdom" might be interested in results that reference "Great Britain".
To help users find the information they need, you can define synonym maps that link related terms together
To help users find the information they need, you can define synonym maps that link related terms together
when a user searches for a particular term, documents with fields that contain the term or any of its synonyms will be included in the results.
https://docs.microsoft.com/en-us/azure/search/search-synonyms
basic index and a client that can submit queries and display results, you can achieve an effective search solution.
Apply filtering
and sorting
Filtering results
By including filter criteria in a simple search expression.
By providing an OData filter expression as a $filter parameter with a full syntax search expression.
You can apply a filter to any filterable field in the index.
users to want to refine query results by filtering and sorting based on field values
OData $filter expressions are case-sensitive!
Filtering with facets
Facets are a useful way to present users with filtering criteria based on field values in a result set
work best when a field has a small number of discrete values that can be displayed as links or options in the user interface.
you must specify facetable fields for which you want to retrieve the possible values in an initial query
Sorting results
results are sorted based on the relevancy score assigned by the query process,
you can override this sort order by including an OData orderby parameter that specifies one or more sortable fields and a sort order (asc or desc).
https://docs.microsoft.com/en-us/azure/search/search-filters
https://docs.microsoft.com/en-us/azure/search/search-pagination-page-layout
Search an index
you could retrieve index entries based on simple field value matching, most search solutions use full text search semantics to query an index.
Full text search
describes search solutions that parse text-based document contents to find query terms
based on the Lucene query syntax, which provides a rich set of query operations for searching, filtering, and sorting data in indexes
Simple - An intuitive syntax that makes it easy to perform basic searches that match literal query terms submitted by a user.
Full - An extended syntax that supports complex filtering, regular expressions, and other more sophisticated queries.
Client applications submit queries by specifying a search expression along with other parameters that determine how the expression is evaluated and the results returned.
search - A search expression that includes the terms to be found.
queryType - The Lucene syntax to be evaluated (simple or full).
searchFields - The index fields to be searched.
select - The fields to be included in the results.
searchMode - Criteria for including results based on multiple search terms. For example, suppose you search for comfortable hotel. A searchMode value of Any will return documents that contain "comfortable", "hotel", or both; while a searchMode value of All will restrict results to documents that contain both "comfortable" and "hotel".
Query processing consists of four stages:
Lexical analysis
terms are analyzed and refined based on linguistic rules.
Document retrieval
terms are matched against the indexed terms, and the set of matching documents is identified.
Query parsing.
reconstructed as a tree of appropriate subqueries
term queries (finding specific individual words in the search expression - for example hotel)
phrase queries (finding multi-term phrases specified in quotation marks in the search expression - for example, "free parking")
prefix queries (finding terms with a specified prefix - for example air*, which would match aircon, air-conditioning, and airport).
Scoring
A relevance score is assigned to each result based on a term frequency/inverse document frequency (TF/IDF) calculation.
https://docs.microsoft.com/en-us/azure/search/search-query-overview
Understand the
index process
When the documents in the data source contain images, you can configure the indexer to extract the image data and place each image in a normalized_images collection
document
metadata_storage_name
metadata_author
content
normalized_image
image0
image1
Normalizing the image data in this way enables you to use the collection of images as an input for skills that extract information from image data.
each indexed document as a JSON structure, which initially consists of a document with the index fields you have mapped to fields extracted directly from the source data
Document
metadata_storage_name
metadata_author
content
document is structured hierarchically, and the skills are applied to a specific context within the hierarchy, enabling you to run the skill for each item at a particular level of the document.
During indexing, an enrichment pipeline iteratively builds the documents that combine metadata from the data source with enriched fields extracted by cognitive skills
The output fields from each skill can be used as inputs for other skills later in the pipeline, which in turn store their outputs in the document structure.
works by creating a document for each indexed entity
final document structure at the end of the pipeline are mapped to index fields by the indexer in one of two ways:
Fields extracted directly from the source data are all mapped to index fields. These mappings can be implicit (fields are automatically mapped to in fields with the same name in the index) or explicit (a mapping is defined to match a source field to an index field, often to rename the field to something more useful or to apply a function to the data value as it is mapped).
Output fields from the skills in the skillset are explicitly mapped from their hierarchical location in the output to the target field in the index.
Search components
Skillset
apply artificial intelligence (AI) skills as part of the indexing process to enrich the source data with new information, which can be mapped to index fields.
skills used by an indexer are encapsulated in a skillset that defines an enrichment pipeline in which each step enhances the source data with insights obtained by a specific AI skill
The language in which a document is written.
Key phrases that might help determine the main themes or topics discussed in a document.
A sentiment score that quantifies how positive or negative a document is.
Specific locations, people, organizations, or landmarks mentioned in the content.
AI-generated descriptions of images, or image text extracted by optical character recognition.
Custom skills that you develop to meet specific requirements.
the expectations of modern application users have driven a need for richer insights into the data.
when indexing a set of documents, file metadata such as file name, modified date, size, and author might be extracted along with the text content of the document.
when indexing data in a database, the fields in the database tables might be extracted;
Indexer
It takes the outputs extracted using the skills in the skillset, along with the data and metadata values extracted from the original data source, and maps them to fields in the index.
indexer is automatically run when it is created, and can be scheduled to run at regular intervals or run on demand to add more documents to the index
the engine that drives the overall indexing process
when you add new fields to an index or new skills to a skillset, you may need to reset the index before re-running the indexer.
Data source:
Unstructured files in Azure blob storage containers.
Tables in Azure SQL Database.
Documents in Cosmos DB.
Azure Cognitive Search can pull data from these data sources for indexing.
applications can push JSON data directly into an index, without pulling it from an existing data store.
Index
index is the searchable result of the indexing process
consists of a collection of JSON documents, with fields that contain the values extracted during indexing.
Client applications can query the index to retrieve, filter, and sort information.
index field can be configured
with the following attributes
key: Fields that define a unique key for index records.
searchable: Fields that can be queried using full-text search.
filterable: Fields that can be included in filter expressions to return only documents that match specified constraints.
sortable: Fields that can be used to order the results.
facetable: Fields that can be used to determine values for facets (user interface elements used to filter the results based on a list of known field values).
retrievable: Fields that can be included in search results (by default, all fields are retrievable unless this attribute is explicitly removed).
Azure resources
you may also need Azure resources for data storage and other application services.
Service tiers and capacity management
The pricing tier you select determines the capacity limitations of your search service and the configuration options available to you,
Free (F). Use this tier to explore the service or try the tutorials in the product documentation.
Basic (B): Use this tier for small-scale search solutions that include a maximum of 15 indexes and 2 GB of index data.
Standard (S): Use this tier for enterprise-scale solutions. There are multiple variants of this tier, including S, S2, and S3; which offer increasing capacity in terms of indexes and storage, and S3HD, which is optimized for fast read performance on smaller numbers of indexes.
Storage Optimized (L): Use a storage optimized tier (L1 or L2) when you need to create large indexes, at the cost of higher query latency.
It's important to select the most suitable pricing tier for your solution, because you can't change it later.
no longer suitable for your solution, you must create a new Azure Cognitive Search resource and recreate all indexes and objects.
you need to create an Azure Cognitive Search resource
Replicas and partitions
Replicas are instances of the search service - you can think of them as nodes in a cluster. Increasing the number of replicas can help ensure there is sufficient capacity to service multiple concurrent query requests while managing ongoing indexing operations.
Partitions are used to divide an index into multiple storage locations, enabling you to split I/O operations such as querying or rebuilding an index.
The combination of replicas and partitions you configure determines the search units used by your solution.
the number of search units is the number of replicas multiplied by the number of partitions (R x P = SU).
https://docs.microsoft.com/en-us/azure/search/search-sku-tier
Intro
the challenge of finding and extracting the information from the massive set of documents, databases, and other sources in which the information is stored.
documents are indexed and made easy to search. This solution enables agents and customers to query the index to find relevant documents and extract information from them
Azure Cognitive Search
cloud-based solution for indexing and querying a wide range of data sources, and creating comprehensive and high-scale search solution
Index documents and data from a range of sources.
Use cognitive skills to enrich index data.
Store extracted insights in a knowledge store for analysis and integration.
https://docs.microsoft.com/en-us/azure/search/
Create conversational
AI solutions
Create a bot with the
Bot Framework SDK.
Features
Activities are events, such as a user joining a conversation or sending a message
Messages can be text, speech, or visual interface elements (such as cards or buttons)
A flow of activities can form a dialog
interact with a bot by initiating activities in turns.
Activities are exchanged across channels, such as web chat, email, Microsoft Teams, and others
Has a conversational interface
bot's success
Is the bot intuitive and easy to use?
Users will not return to a bad user experience
Is the bot available on the devices and platforms that users care about?
your bot available on Microsoft Teams but most of your target audience is using Slack, the bot will not be successful
Is the bot discoverable?
integration with the proper channels
Integrating with the Teams channel will make your bot available in the Teams app.
integrating it directly into a web site
Can users solve their problems with minimal use and bot interaction?
answers to their issues or problems as quickly as possible
Does the bot solve the user issues better than alternative experiences?
If a user can reach an answer with minimal effort through other means, they are less likely to use the bot.
Factors that do not guarantee success
The more complex your bot is, in terms of AI or machine learning features, the more open it may be to issues and problems.
Adding natural language features may not always make the bot experience great
Support for speech
Responsible AI
Ensure a seamless hand-off to a human where the human-bot exchange leads to interactions that exceed the bot's competence.
Design your bot so that it respects relevant cultural norms and guards against misuse.
Be transparent about the fact that you use bots as part of your product or service.
Ensure your bot is reliable.
Articulate the purpose of your bot and take special care if your bot will support consequential use cases.
Ensure your bot treats people fairly
Ensure your bot respects user privacy
Ensure your bot handles data securely.
Ensure your bot is accessible.
Accept responsibility for your bots operation and how it affects people.
Technologies
Bot Framework Service
. A component of Azure Bot Service that provides a REST API for handling bot activities.
Bot Framework SDK
. A set of tools and libraries for end-to-end bot development that abstracts the REST interface, enabling bot development in a range of programming languages.
extensive set of tools and libraries that software engineers can use to develop bots.
Microsoft C# (.NET Core), Python, and JavaScript (Node.js)
Azure Bot Service
. A cloud service that enables bot delivery through one or more channels, and integration with other services.
Templates
Echo Bot - a simple "hello world" sample in which the bot responds to messages by echoing
Core Bot - a more comprehensive bot that includes common bot functionality, such as integration with the Language Understanding service.
Empty Bot - a basic bot skeleton.
Classes and Logic
Base Class: Bot
Use
Adapter :
class. handles communication with the user's channel
Activities
Events
joining a conversation
message being received
Bot Framework Service notifies your bot's adapter when an activity occurs in a channel by calling its Process Activity method
Adapter creates a context for the turn and calls the bot's Turn Handler method
Testing with the Bot Framework Emulator
Bot Framework Emulator is an application that enables you to run your bot a local or remote web application and connect to it from an interactive web chat interface that you can use to test your bot
Bots developed with the Bot Framework SDK are designed to run as cloud services in Azure
Implement activity
handlers and dialogs
Activity handlers
bots with short, stateless interactions
the events are triggered by activities such as users joining the conversation or a message being received.
Event methods that you can override to handle different kinds of activities.
the events are triggered by activities such as users joining the conversation or a message being received.
adapter creates a turn context for the activity and passes it to the bot's turn handler, which calls the individual, event-specific activity handler
ActivityHandler base class includes
Message received
Members joined the conversation
Members left the conversation
Message reaction received
Bot installed
Turn context
Activity handler methods include a parameter for the turn context, which you can use to access relevant information
activity occurs within the context of a turn, which represents a single two-way exchange between the user and the bot.
https://docs.microsoft.com/en-us/azure/bot-service/bot-activity-handler-concept?view=azure-bot-service-4.0
Dialogs
More complex patterns for handling stateful, multi-turn conversations.
Component dialogs
dialog that can contain other dialogs, defined in its dialog set.
the initial dialog in the component dialog is a waterfall dialog the initial dialog in the component dialog is a waterfall dialog
Each step must be completed before passing the output onto the next step
Adaptive dialogs
The recognizer analyzes natural language input (usually using the Language Understanding service) and detects intents, , which can be mapped to triggers that change the flow of the conversation - often by starting new child dialogs, which contain their own actions, triggers, and recognizers.
bot initiates a root dialog, which contains a flow of actions (which can include branches and loops), and triggers that can be initiated by actions or by a recognizer.
the flow is more flexible. allowing for interruptions, cancellations, and context switches at any point in the conversation.
Deploy a bot
Create a bot application service
bot requires a Bot Channels Registration resource, along with associated application service and application service plan.
run the az deployment group create command, referencing the deployment template and specifying your bot application registration's ID (from the az ad app create command output) and the password you specified.
Prepare your bot for deployment
Python bots, you must include a requirements.txt file listing any package dependencies that must be installed in the deployment environment.
C# and JavaScript bots, you can use the az bot prepare-deploy command to ensure your bot is properly configured with the appropriate package dependencies and build files.
Register an Azure app
using the az ad app create
specifying a display name and password for your app identity
specifying a display name and password for your app identity
Deploy your bot as a web app
The final step is to package your bot application files in a zip archive
use the az webapp deployment source config-zip command to deploy the bot code to the Azure resources you created previously
Create the Azure resources required to support your bot.
give your bot an identity it can use to access resources, and a bot application service to host the bot.
https://docs.microsoft.com/en-us/azure/bot-service/bot-builder-deploy-az-cli?view=azure-bot-service-4.0&tabs=userassigned%2Cnewgroup%2Ccsharp
details of how the bot is hosted varies, depending on the programming language and underlying runtime you have used; but the basic steps for deployment are the same.
https://docs.microsoft.com/en-us/learn/modules/create-bot-with-bot-framework-composer/
Create a Bot with the Bot Framework Composer
Understand adaptive flow
Without the ability to adapt to this kind of interruption to the conversational flow, a bot can become locked in a fixed flow that a user might find frustrating
Managing interruptions with the Bot Framework Composer
user input is provided through actions in a dialog flow, which can be configured to allow interruptions.
An interruption occurs when the recognizer identifies input that fires a trigger, signaling a conversational context change - usually by ending the current dialog flow or starting a child dialog.
he term "cancel" by ending the current dialog flow and resetting all dialog-scope variables.
ready to place an order; and then decide to add another pizza, change the selected pizza size or toppings, or cancel the order altogether and start again.
handle unexpected flow as an interruption to the programmed flow of the conversation.
Design the user experience
Text
The ability to add natural language understanding to a bot is possible. Careful consideration around language understanding is important.
My name is Terry". If you want to personalize the conversation with follow-up prompts including the user's name, your bot logic needs to parse the response and isolate the name from the rest of the text.
Text input from users is parsed to determine the intent
a better design option where the bot is specific in the prompt
bot can integrate different cognitive services to aid in language understanding, keyword, or phrase detection, and sentiment analysis. make you bot more "intelligent" but they also lead to response time delays if too many services are integrated for each response
recommended considerations
ask specific questions that do not require natural language understanding capabilities to parse the response
require specific commands from the user can often provide a good user experience while also eliminating the need for natural language understanding capability
If you are designing a bot that will answer questions based on structured or unstructured data from databases, web pages, or documents, consider using technologies like QnA Maker that are designed specifically to address this scenario.
When building natural language models, do not assume that users will provide all the required information in their initial query. Design your bot to specifically request the information it requires, guiding the user to provide that information by asking a series of questions, if necessary.
Speech
for users with differing abilities to interact with computing devices.
Using speech will require your bot to interact with the Speech cognitive services to transcribe the spoken word to text, for actions by the bot, and then synthesize the text responses to speech as the output.
You may decide that your bot application needs to support speech if it will be accessed from devices that do not contain keyboards or monitors.
features
buttons - presenting the user with buttons from which to select options. In a pizza order bot, you might decide to use buttons to represent the pizza sizes available. They are a visual way to represent choices to users and add more visual appeal when compared to text
images - using images in the bot interaction adds a graphical appearance to the bot and can enhance the user experience
text - a typical interaction that is lightweight and involves presenting text to the user and having the user respond with text input
cards - allow you to present your users with various visual, audio, and/or selectable messages and help to assist conversation flow
Different channels will render each of these components differently. If a channel doesn't support the feature, the user experience can be degraded due to poor rendering or functional impairments.
Rich user controls
provide a more guided experience with the bot.
emulate an application. Users are familiar with using applications on their computers or devices so it makes the bot use more "natural".
presents the user with discrete choices resulting in less ambiguity and misinterpretation by the bot's logic.
ease of use on mobile devices where typing text is not optimal or less-preferred by users.
Cards
Cards are programmable objects containing standardized collections of rich user controls. An advantage of cards is that they are recognized across a wide range of channels.
Adaptive cards: An open card exchange format rendered as a JSON object. Typically used for cross-channel deployment of cards. Cards adapt to the look and feel of each host channel.
Audio cards: A card that can play audio files. This card could be helpful in a bot that interacts with users who have visual impairments.
Animation cards: This type of card can play animated GIFs or short video files, for example to depict actions or status indicators.
Hero cards: A card that contains a single large image, one or more buttons, and text. Typically used to visually highlight a potential user selection.
Thumbnail cards: A card that contains a single thumbnail image, one or more buttons, and text. Typically used to visually highlight the buttons for a potential user selection.
Receipt cards: If users are able to purchase items with your bot, you can use a Receipt card to provide a transaction record for the user. The receipt can contain the items purchased, unit price, taxes, and totals.
SignIn card: A card that enables a bot to request that a user sign-in. It typically contains text and one or more buttons that the user can select to initiate the sign-in process.
SuggestedAction card: The SuggestedAction card gives the user a discrete set of options from, which to choose, but is also context aware. The actions presented are related to the next action the users needs to take and not generic in nature. The card disappears once any of the suggested actions is selected.
Video card: A card that can play videos. Typically used to open a URL and stream an available video.
Card carousel: A horizontally scrollable collection of cards that allows your user to easily view a series of possible user choices.
Presenting responses with
the Bot Framework Composer
You use language templates to define responses, which can include multiple phrases for a given type of response, or specific graphical responses.
Observe that the third option includes a placeholder for a property value - in this case the name property in the user scope.
All responses that your bot presents to users are created by the language generator for the current (or parent) dialog.
Bot Framework Composer interface includes a Response Editor that can generate the appropriate language generation code for you, making it easier to create conversational responses.
Understand dialogs
bot interaction begins with a main dialog in which the user is welcomed and the initial conversation established, and then child dialogs are triggered.
A flow of dialogs
users tend to think of the interactions as a series of "screens" or "pages".
A bot might follow a similar sequential pattern in which each "screen" is replaced by a dialog that gathers the required information before moving the user along to the next stage.
design a conversation flow based on dialogs that will gather the required information and get to a resolution efficiently.
bot will likely make use multiple dialogs to implement multi-turn conversations in which the bot gathers information from the user, storing state between turns.
Implementing dialogs with the Bot Framework Composer
implements adaptive dialogs, in which the conversation flow is flexible, allowing for interruptions, cancellations, and context switches at any point in the conversation
adaptive dialog
One or more actions that define the flow of message activities in the dialog
A Language Generator, which formulates the output presented to the user based on templates you define. and you can define responses that include graphical elements such as cards or buttons.
A recognizer, which interprets user input to determine semantic intent. Recognizers are based on the Language Understanding service by default, but you can also use other types of recognizer; such as the QnA Service or simple regular expression matches.
Triggers, which are fired by actions or based on the intent detected by the recognizer.
a dialog has memory in which values are stored as properties
Properties can be defined at various scopes
user scope (variables that store information for the lifetime of the user session with the bot)
dialog scope (variables that persist for the lifetime of the dialog).
bot initiates a main dialog, which contains a flow of actions (which can include branches and loops) in which input from users is analyzed by the recognizer, and responses are returned by the language generator.
recognizer analyzes natural language input and detects intents, which can be mapped to triggers that change the flow of the conversation - often by starting new child dialogs, which contain their own actions, triggers, and recognizers.
Get started with the Bot Framework Composer
Visual design surface in Composer eliminates the need for boilerplate code and makes bot development more accessible.
Time saved with fewer steps to set up your environment.
Use of Adaptive Dialogs allow for Language Generation (LG), which can simplify interruption handling and give bots character.
contain reusable assets in the form of JSON and Markdown files that can be bundled and packaged with a bot's source code
install the Bot Framework Composer from
https://docs.microsoft.com/composer/install-composer
.
Intro
an open-source tool that presents a visual canvas for building bots
latest SDK features so you can build sophisticated bots with relative ease
Bot Framework Composer is a visual designer that lets you quickly and easily build sophisticated conversational bots without writing code
https://docs.microsoft.com/en-us/composer/
https://microsoft.github.io/botframework-solutions/overview/virtual-assistant-solution/
Process and Translate Speech with Azure Cognitive Speech Services
Create speech-enable apps
with the speech service
Use the text-to-speech API
The Text-to-speech API, which is the primary way to perform speech synthesis.
The Text-to-speech Long Audio API, which is designed to support batch operations that convert large volumes of text to audio
Speech service offers two REST APIs for speech synthesis
Documentation:
https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-text-to-speech
Text-to-speech SDK
speech-enabled applications are built using the Speech SDK.
Use a
SpeechConfig
object to encapsulate the information required to connect to your Speech resource. Specifically, its
location
and
key
.
Optionally, use an
AudioConfig
to define the output device for the speech to be synthesized. By default, this is the default system speaker, but you can also specify an audio file, or by explicitly setting this value to a null value, you can process the audio stream object that is returned directly.
Use the
SpeechConfig
and
AudioConfig
to create a
SpeechSynthesizer
object. This object is a proxy client for the
Text-to-speech
API.
Use the methods of the
SpeechSynthesizer
object to call the underlying API functions. For example, the
SpeakTextAsync
() method uses the Speech service to convert text to spoken audio.
Process the response from the Speech service. In the case of the
SpeakTextAsync
method, the result is a
SpeechSynthesisResult
object that contains the following properties:
AudioData
Properties
Reason
ResultId
Reason
property is set to the
SynthesizingAudioCompleted
enumeration
AudioData
property contains the audio stream (which, depending on the
AudioConfig
may have been automatically sent to a speaker or file).
Use the speech-to-text API
Rest API
The S
peech-to-text AP
I, which is the primary way to perform speech recognition.
batch transcription
transcribing multiple audio files to text
https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-speech-to-text
The
Speech-to-text Short Audio API
,
which is optimized
for short streams of audio
(up to 60 seconds).
most interactive speech-enabled applications use the Speech service through a (programming) language-specific SDK
Using the Speech-to-text SDK
Use a
SpeechConfig
object to encapsulate the information required to connect to your Speech resource. Specifically, its
location
and
key
.
Optionally, use an
AudioConfig
to define the input source for the audio to be transcribed. By default, this is the default system microphone, but you can also specify an audio file.
Use the
SpeechConfig
and
AudioConfig
to create a
SpeechRecognizer
object. This object is a proxy client for the
Speech-to-text
API.
Use the methods of the
SpeechRecognizer
object to call the underlying API functions. For example, the
RecognizeOnceAsync
() method uses the Speech service to asynchronously transcribe a single spoken utterance.
Process the response from the Speech service. In the case of the
RecognizeOnceAsync
() method, the result is a
SpeechRecognitionResult
object that includes the following properties:
Properties
Duration
OffsetInTicks
Properties
Reason
ResultId
Text
Reason
enumerated value RecognizedSpeech
Text property contains the transcription
Result
NoMatch (indicating that the audio was successfully parsed but no speech was recognized)
NoMatch (indicating that the audio was successfully parsed but no speech was recognized)
Provision an Azure resource for speech
The
location
in which the resource is deployed (for example, eastus)
One of the keys
assigned to your resource.
You can use either a dedicated Speech resource or a multi-service Cognitive Services resource.
Configure audio format and voices
Audio format
Audio file type
Sample-rate
Bit-depth
supported formats are indicated in the SDK using the
SpeechSynthesisOutputFormat
enumeration.
To specify the required output format, use the
SetSpeechSynthesisOutputFormat
method of the
SpeechConfig
object
speechConfig.SetSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Riff24Khz16BitMonoPcm
);
Documentation:
https://docs.microsoft.com/en-us/dotnet/api/microsoft.cognitiveservices.speech.speechsynthesisoutputformat?view=azure-dotnet
you can use a SpeechConfig object to customize the audio that is returned
Voices
provides multiple voices that you can use to personalize your speech-enabled applications
Standard voices - synthetic voices created from audio samples.
Neural voices - more natural sounding voices created using deep neural networks.
To specify a voice for speech synthesis in the
SpeechConfig
, set its
SpeechSynthesisVoiceName
property to the voice
speechConfig.SpeechSynthesisVoiceName = "en-GB-George";
Intro
Speech-to-Text: An API that enables speech recognition in which your application can accept spoken input.
Text-to-Speech: An API that enables speech synthesis in which your application can provide spoken output.
Speech Translation: An API that you can use to translate spoken input into multiple languages.
Speaker Recognition: An API that enables your application to recognize individual speakers based on their voice.
Intent Recognition: An API that integrates with the Language Understanding service to determine the semantic meaning of spoken input.
Use Speech Synthesis.Markup Language
Speech SDK enables you to submit plain text to be synthesized into speech
the service also supports an XML-based syntax for describing characteristics of the speech you want to generate
Speech Synthesis Markup Language (SSML)
syntax offers greater control over
how the spoken output sounds
Specify a speaking style, such as "excited" or "cheerful" when using a neural voice.
Insert pauses or silence.
Specify phonemes (phonetic pronunciations), for example to pronounce the text "SQL" as "sequel".
Adjust the prosody of the voice (affecting the pitch, timbre, and speaking rate).
Use common "say-as" rules, for example to specify that a given string should be expressed as a date, time, telephone number, or other form.
Insert recorded speech or audio, for example to include a standard recorded message or simulate background noise.
To submit an SSML description to the Speech service, you can use the SpeakSsmlAsync() method, like this:
speechSynthesizer.SpeakSsmlAsync(ssml_string)
;
Documentation:
https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/
Translate speech with
the speech service
Intro
how to use its API through one of the supported software development kits (SDKs)
builds on speech recognition by recognizing and transcribing spoken input in a specified language, and returning translations of the transcription in one or more other languages
Provision an Azure resource
for speech translation
You can use either a dedicated Speech resource or a multi-service Cognitive Services resource
The location in which the resource is deployed (for example, eastus)
One of the keys assigned to your resource.
enabling developers to add end-to-end, real-time, speech translations to their applications or services
Migrate from Bing Speech
Migrate from Translator Speech API
Translate speech to text
Use a
SpeechConfig
object to encapsulate the information required to connect to your Speech resource. Specifically, its
location
and
key
.
Use a
SpeechTranslationConfig
object to specify the speech recognition language (the language in which the input speech is spoken) and the target languages into which it should be translated.
Optionally, use an
AudioConfig
to define the input source for the audio to be transcribed. By default, this is the default system microphone, but you can also specify an audio file.
Use the
SpeechConfig
,
SpeechTranslationConfig
, and
AudioConfig
to create a
TranslationRecognizer
object. This object is a proxy client for the Speech service translation API.
Use the methods of the
TranslationRecognizer
object to call the underlying API functions. For example, the
RecognizeOnceAsync
() method uses the Speech service to asynchronously translate a single spoken utterance.
Process the response from the Speech service. In the case of the
RecognizeOnceAsync
() method, the result is a
SpeechRecognitionResult
object that includes the following properties:
the
Reason
property has the enumerated value
RecognizedSpeech
, the Text property contains the transcription in the original language, and the
Translations
property contains a dictionary of the translations (using the two-character ISO language code, such as "en" for English, as a key).
You can in fact use the SpeechTranslationConfig to synthesize the translation directly, but this only works when translating to a single language, and results in an audio stream that is typically saved as a file rather than sent directly to a speaker.
Synthesize transaltions
synthesize the translation as speech to create speech-to-speech translation solutions
Event-based synthesis
When you want to perform 1:1 translation (translating from one source language into a single target language)
Specify the desired voice for the translated speech in the
TranslationConfig
.
Create an event handler for the
TranslationRecognizer
object's
Synthesizing
event.
In the event handler, use the
GetAudio
() method of the
Result
parameter to retrieve the byte stream of translated audio.
The TranslationRecognizer returns translated transcriptions of spoken input - essentially translating audible speech to text.
Manual synthesis
You can use manual synthesis to generate audio translations for one or more target languages
Use a
TranslationRecognizer
to translate spoken input into text transcriptions in one or more target languages.
Iterate through the
Translations
dictionary in the result of the translation operation, using a
SpeechSynthesizer
to synthesize an audio stream for each language.
Process and translate text
with Azure Cognitive Services
Extract Insights from text with Language service
Analyze sentiment
evaluate how positive or negative a text document is
Evaluating a movie, book, or product by quantifying sentiment based on reviews
Prioritizing customer service responses to correspondence received through email or social media messaging
the response includes overall document sentiment and individual sentence sentiment for each document submitted to the service
Overall document sentiment
is based on
If all sentences are neutral, the overall sentiment is neutral.
If sentence classifications include only positive and neutral, the overall sentiment is positive.
If the sentence classifications include only negative and neutral, the overall sentiment is negative.
If the sentence classifications include positive and negative, the overall sentiment is mixed.
Extract key phrases
is the process of evaluating the text of a document, or documents, and then identifying the main points around the context of the document(s)
works best for larger documents (the maximum size that can be analyzed is 5,120 characters).
the REST interface enables you to submit one or more documents for analysis.
Extract entities
Person
Location
DateTime
Organization
Address
Email
URL
Entities are grouped into categories and subcategories
Detect language
Scenarios
content stores that collect arbitrary text, where language is unknown
could involve a chat bot to determine which language they are using and allow you to configure your bot responses in the appropriate language
The response also returns a score, which reflects the confidence of the model (a value between 0 and 1).
can work with documents or single phrases
document size must be under 5,120 characters per document and each collection is restricted to 1,000 items (IDs)
Mixed language content within the same document returns the language with the largest representation in the content, but with a lower positive rating, reflecting the marginal strength of that assessment.
when there is ambiguity as to the language content . As a result, the response for the language name and ISO code will indicate (unknown) and the score value will be returned as NaN, or Not a Number
Extract linked entities
Entity linking can be used to disambiguate entities of the same name by referencing an article in a knowledge base
As with all Language service functions, you can submit one or more documents for analysis
Provision a Language resource
Language service
Sentiment analysis - quantifying how positive or negative the text is.
Key phrase extraction - identifying important words and phrases in the text that indicate the main points.
Named entity recognition - detecting references to entities, including people, locations, time periods, organizations
Language detection
Entity linking - identifying specific entities by providing reference links to Wikipedia articles.
Azure resources for text analysis
single-service Language resource
multi-service Cognitive Services resource
call the Language APIs by submitting requests in JSON format to the REST interface
any of the available programming language-specific SDKs
Documentation:
https://docs.microsoft.com/en-us/azure/cognitive-services/language-service/
Translate Text with
the Translator Service
Intro
ability to exchange information between speakers of different languages is often a critical requirement for global solutions
Translator Azure Cognitive Service provides an API for translating text between 90 supported languages
Provision a Translator resource
Used for
Language detection
One-to-many translation
Script transliteration (converting text from its native script to an alternative script)
Azure resources for Translator
You can provision a single-service Translator resource, or you can use the Text Analytics API in a multi-service Cognitive Services resource
you can use the
location
where you deployed the resource and one of its
subscription keys
to call the Translator APIs from your code
submitting requests in JSON format to the REST interface, or by using any of the available programming language-specific SDKs.
Understand language detection
translation and transliteration
Language detection
You can use the detect REST function to detect the language in which text is written.
Translation
use the translate function; specifying a single from parameter to indicate the source language, and one or more to parameters to specify the languages into which you want the text translated.
Transliteration
so rather than translate it to a different language, you may want to transliterate it to a different script - for example to render the text in Latin script (as used by English language text).
Specify translation options
Word alignment
In written English (using Latin script), spaces are used to separate words. However, in some other languages (and more specifically, scripts) this is not always the case.
it's difficult to understand the relationship between the characters in the source text and the corresponding characters in the translation. To resolve this problem, you can specify the
includeAlignment
parameter with a value of true
Sentence length
useful to know the length of a translation, for example to determine how best to display it in a user interface
setting the
includeSentenceLength
parameter to true.
Profanity filtering
profanityAction parameter
NoAction
: Profanities are translated along with the rest of the text.
Deleted
: Profanities are omitted in the translation.
Marked
: Profanities are indicated using the technique indicated in the
profanityMarker
parameter (if supplied).
Documentation:
https://docs.microsoft.com/en-us/azure/cognitive-services/translator/reference/v3-0-translate
Define customs translation
create a custom model
use the Custom Translator portal
Create a workspace linked to your Translator resource
Create a project
Upload training data files
Train a model
Your custom model is assigned a unique
category Id
, which you can specify in
translate
calls to your
Translator
resource by using the
category
parameter, causing translation to be performed by your custom model instead of the default model.
https://docs.microsoft.com/en-us/azure/cognitive-services/translator/custom-translator/overview
you may need to develop a translation solution for businesses or industries in that have specific vocabularies of terms that require custom translation.
Documentation:
https://docs.microsoft.com/en-us/azure/cognitive-services/translator/