Thursday, April 23, 2020

Lintcode 550. Top K Frequent Words II

ind top k frequent words in realtime data stream.
Implement three methods for Topk Class:
  1. TopK(k). The constructor.
  2. add(word). Add a new word.
  3. topk(). Get the current top k frequent words.

Example

Example 1:
Input:
TopK(2)
add("lint")
add("code")
add("code")
topk()
Output:["code", "lint"]
Explanation:
"code" appears twice and "lint" appears once, they are the two most frequent words.
Example 2:
Input:
TopK(1)
add("aa")
add("ab")
topk()
Output:["aa"]
Explanation:
"aa" and "ab" appear once , but aa's dictionary order is less than ab's.

Notice

If two words have the same frequency, rank them by dictionary order.
Analysis:
The main difference between the top k frequent word I is now we keep adding new words. So the count of each word is changing dynamically. 

It means in the minHeap, we need to support an update operation update(E e), which is a O(N) operation. Why? It first needs to find the node. Since there is no order in priority queue, the search operation takes O(N) time. Once we find the element, we need to first remove it (O(logn)) and insert a new node O(logn). 

So do we have some other data structure providing similar operations but the search can be done in O(log(N)) time? Yes, we can use a TreeSet/TreeMap which is literally a red-black tree. Since it's a ordered tree, the search now takes O(log n) time and update takes O(logn) as well. 

Code (Java):
public class TopK {
    private Map<String, Integer> wordCountMap;
    private TreeSet<String> topKTreeSet;
    private int k;
    
    /*
    * @param k: An integer
    */public TopK(int k) {
        // do intialization if necessary
        this.k = k;
        this.wordCountMap = new HashMap<>();
        this.topKTreeSet = new TreeSet<>(new MyTreeSetComparator());
    }

    /*
     * @param word: A string
     * @return: nothing
     */
    public void add(String word) {
        // write your code here
        int count = 0;
        if (wordCountMap.containsKey(word)) {
            count = wordCountMap.get(word);
            if (topKTreeSet.contains(word)) {
                topKTreeSet.remove(word);
            }
        }
        
        count += 1;
        wordCountMap.put(word, count);
        topKTreeSet.add(word); // count has been bumped up by 1
        
        if (topKTreeSet.size() > k) {
            topKTreeSet.pollFirst();
        }
    }

    /*
     * @return: the current top k frequent words.
     */
    public List<String> topk() {
        // write your code here
        List<String> ans = new ArrayList<>(k);
        Iterator<String> iter = topKTreeSet.iterator();
        while (iter.hasNext()) {
            String word = iter.next();
            ans.add(word);
        }
        
        Collections.reverse(ans);
        
        return ans;
    }
    
    private class MyTreeSetComparator implements Comparator<String> {
        @Override
        public int compare(String a, String b) {
            int freqA = wordCountMap.get(a);
            int freqB = wordCountMap.get(b);
            
            if (freqA != freqB) {
                return freqA - freqB;
            }
            
            return b.compareTo(a);
        }
    }
}

No comments:

Post a Comment