layout |
---|
page |
Here are a few algorithms that exist that can be implemented in Big Data. The choice depends on what you are looking for out of the data and what kind of information you have/can use (user privacy).
-
A priori Suggestions/recommendations of goods, what goods usually appear together in sets : Amazon, book purchase ...
-
C4.5 Algorithm Improvement (and easier to implement then ID3)
- Helps create a decision tree around a variable.
- Useful if doing most likely situation (predicting behaviour statistics)
-
Clustering: Trying to organize a set of information into subsets that exhibit the same properties. There are lots of algorithms for clustering.
-
K Means Clustering: One seemingly easy algorithm. See this visualization for more on k means clustering
-
K Nearest Neighbors: Good for ranked user suggestions/ feature extraction
-
PageRank: Finding critical nodes in a graph, famously used by Google
- Useful for Wikipedia indexing project for instance
- Works by creating a graph and generating random walks in the connections.
- The data must be all connected together (!)
- If you want to recommend movies, you can either go by doing user/user relations, find neighbors, and recommend based on neighbors' histories: collaborative filtering
- The other way is to find information about the movie: actors, type of movie, or director, and create an algorithm to map these features in numbers to recommend movies from there: feature based filtering
This is applicable to a number of algorithms.