Lexemes
Source code is written by programmers using ASCII characters. Compiler breaks down ASCII characters into its component parts, called "lexemes".
Lexemes - smallest unit of language. Lexemes cannot be broken down further without losing meaning.
Compiler scans the source code character by character looking for keywords that it recognises from its lexicon. These are labelled as lexemes, since they cannot be broken down without losing their meaning. e.g. it may find keywords such as IF, WHILE, FOR, END and so on.
After the compiler finds the keywords, connects the characters between the keywords to form new lexemes by looking for a character that separates one item from the next - typically this separator is a space.
Exceptions
After the analyser has separated all the items, it removes redundant characters such as tabs and white space. Almost all compilers ignore whitespace (exception of Python) and remove it from the code during lexical analysis.
Source code often contains comments - used to explain the code to other programmers who might read it. Have no purpose within the program.
The start of a comment is marked using comment symbols (often two slashes //). Anything between the comment symbols and the end-of-line character is ignored by the compiler and is removed during lexical analysis.
Tokens
The compiler converts lexemes into binary sequences of fixed length, called tokens. Each lexeme is converted into a token, a fixed-length binary number.
As the compiler is scanning the source code, it finds lexemes, converts them into tokens, and notes what each of the tokens does. Whole source code has been converted by the compiler into a token table in memory that connects all this information together.
"=", "+" and ";" are recognised by the compiler as keywords (two operators and an end of line marker respectively).
After finding these, the compiler connects together the unrecognised characters and a space into the lexeme, which isn't a recognised keyword, so the compiler assumes it is a variable or constant and labels it as such. As lexemes are found, they are converted into tokens and recorded in the token table alongside their assumed purpose: Lexical analysis is stripping the source code into its component parts. Ready for the next stage. Lexical analysis breaks down the source code into equally sized tokens held as a token table in memory.