Please enable JavaScript.
Coggle requires JavaScript to display documents.
DL and NPL - Coggle Diagram
DL and NPL
commun techniques
Vectorization : Vectorization is jargon for a classic approach of converting input data from its raw format (i.e. text ) into vectors of real numbers which is the format that ML models support.
Tokenization: Tokenization is the process of breaking down the given text into the smallest
unit in a sentence called a token.
• Lemmatization: Lemmatization is the process of finding the form of the related word in the dictionary. It is different from Stemming. It involves longer processes to calculate than Stemming.
-
-
Challenges
Lack of de-identification of medical corpora: identification and remove identity( personnal) informations
-
-