Please enable JavaScript.
Coggle requires JavaScript to display documents.
Grokking algorithms 3 (Chapter 11: Where to go next (Inverted indexes (a…
Grokking algorithms 3
Chapter 11: Where to go next
RECAP
Trees
A binary search tree: For every node, the nodes to its left are smaller in value, and the nodes to the right are larger in value
If you’re interested in databases or more-advanced data structures, check these out:
• B-trees
• Red-black trees
• Heaps
• Splay trees
Inverted indexes
a hash that maps words to places where they appear. This
data structure is called an inverted index
it’s commonly used to
build search engines
The Fourier transform
the Fourier transform will tell you the
ingredients in the smoothie
The Fourier
transform is great for processing signals
You can also use it to compress
music
try to predict upcoming earthquakes and analyze DNA
MapReduce
why are distributed algorithms useful?
The map fuction
The reduce funtion
Bloom filters and HyperLogLog
Bloom filters
HeyperLogLog
Parallel algorithms
The SHA algorithms
Comparing files
Checking passwords
Locality-sensitive hashing
Diffie-Dellman key exchange
Linear programming
Epilogue
Chapter 9: Dynamic programming
The knapsack problem
Dynamic programming
Dynamic programming starts by
solving subproblems and builds up to solving the big problem
Every dynamic-programming algorithm starts with a grid
Knasack problem FAQ
What happens if you add an item?
What happens if you change the order of the rows?
The answer doesn’t change. The order of the rows doesn’t matter.
Can you fill in the grid columns-wise instead of row-wise?
What happens if you add a smaller item?
Can you steal fractions of an item?
You can’t. With the dynamic-programming solution, you either take the item or not. There’s no way for it to figure out that you should take half an item
Longest common substring
Making the grid
Filling in the grid
The solution
Longest common subsequence
Longest common subsequence-solution
RECAP
• Dynamic programming is useful when you’re trying to optimize something given a constraint.
• You can use dynamic programming when the problem can be broken into discrete subproblems.
• Every dynamic-programming solution involves a grid.
• The values in the cells are usually what you’re trying to optimize.
• Each cell is a subproblem, so think about how you can divide your problem into subproblems.
• There’s no single formula for calculating a dynamic-programming solution.
Chapter 10: K-nearest neighbors
Classifying oranges vs. grapefruit
If you’re trying to classify
something, you might want to try KNN first
More neighbors are oranges than grapefruit. So this fruit is probably an
orange.
building a recommendations system
Feauture extraction
In the grapefruit example, you compared fruit based on how
big they are and how red they are. Size and color are the features
you’re comparing
Regression
• Classification = categorization into a group
• Regression = predicting a response (like a number)
You’re trying to predict how many
loaves to make for today. You have a set of features:
• Weather on a scale of 1 to 5 (1 = bad, 5 = great).
• Weekend or holiday? (1 if it’s a weekend or a holiday, 0 otherwise.)
• Is there a game on? (1 if yes, 0 if no.)
=> this is to day, Today is a weekend day with good weather
a,b,d,e gần today nhất
Take an average of the loaves sold on those days, and you get 218.75.
That’s how many loaves you should make for today!
Picking good features
Picking the right features means
• Features that directly correlate to the movies you’re trying to
recommend
• Features that don’t have a bias (for example, if you ask the users to only rate comedy movies, that doesn’t tell you whether they like action movies)
Introduction to machine learning
OCR
OCR stands for optical character recognition. It means you can take a photo of a page of text, and your computer will automatically read the text for you
You can use KNN for this:
Go through a lot of images of numbers, and extract features of those numbers.
When you get a new image, extract the features of that image, and see what its nearest neighbors are!
The first step of OCR, where you go through images of numbers and extract features, is called training
Building a spam filter
Spam filters use another simple algorithm called the Naive Bayes
classifier
Suppose you get an email with the subject “collect your million dollars now!” Is it spam?
You can break this sentence into words. Then, for each word, see what the probability is for that word to show up in a spam email
Predicting the stock market
Predicting the future is hard, and it’s almost impossible
when there are so many variables involved.
RECAP
• KNN is used for classification and regression and involves looking at the k-nearest neighbors.
• Classification = categorization into a group.
• Regression = predicting a response (like a number).
Feature extraction means converting an item (like a fruit or a user) into a list of numbers that can be compared.
• Picking good features is an important part of a successful KNN algorithm.