Friday, September 5, 2014

Leetcode: Substring with Concatenation of All Words

You are given a string, S, and a list of words, L, that are all of the same length. Find all starting indices of substring(s) in S that is a concatenation of each word in L exactly once and without any intervening characters.
For example, given:
S"barfoothefoobarman"
L["foo", "bar"]
You should return the indices: [0,9].
(order does not matter).
Understand the problem:
The problem gives a string S and a list of words L, find all string indices of substrings in S such that the substring is a concatenation of each word in L exactly once without any intervening characters. 

Native Solution:
A very straight-forward solution is check each substring with the length of the word in L existed in the dictionary. If yes, we continue to search next until find all words in the dictionary. But how to guarantee that each word appears only once at the concatenation. We can use a hash set to add all visited words into the set. Then the next question is how could we know that we visited all words in the dictionary. We can use a counter, each time wee see a substring in the dictionary, we increase the counter. If the counter equals to the number of elements of the dictionary, we know that we visited all words. 


Wrong Answer:
public class Solution {
    public ArrayList<Integer> findSubstring(String S, String[] L) {
        ArrayList<Integer> result = new ArrayList<Integer>();
        if (S == null || S.length() == 0 || L == null || L.length == 0)
            return result;
        
        HashSet<String> hashSet = new HashSet<String>();
        
        int num = L.length;
        int len = L[0].length();
        
        int start = 0;
        int count = 0;
        for (int i = 0; i < S.length(); i += len) {
            if (i + len > S.length()) break;
            if (!hashSet.contains(S.substring(i, i + len)) && isDictWords(S.substring(i, i + len), L)) {
                hashSet.add(S.substring(i, i + len));
                count++;
                if (count == num) {
                    result.add(start);
                    hashSet.clear();
                    count = 0;
                    start = i;
                }
            } else if (hashSet.contains(S.substring(i, i + len)) && isDictWords(S.substring(i, i + len), L)) {
                start = i;
            } else {
                start = i + len;
                count = 0;
                hashSet.clear();
            }
        }
        return result;
    }
    
    private boolean isDictWords(String str, String[] dict) {
        for (int i = 0; i < dict.length; i++) {
            for (int j = 0; j < str.length(); j++) {
                if (str.charAt(j) != dict[i].charAt(j)) {
                    break;
                } else if (str.charAt(j) == dict[i].charAt(j) && j == str.length() - 1) {
                    return true;
                }
            }
        }
        return false;
    }
}

So why the solution above is wrong? That is because we iterate through the input string with stride equals to the length of the word in the dictionary. However, the matched substring may not start from that strided point, e.g. 
Input:"lingmindraboofooowingdingbarrwingmonkeypoundcake", ["fooo","barr","wing","ding","wing"]
Output:[]
Expected:[13]

So after fixing that bug, the code becomes below:
public class Solution {
    public ArrayList<Integer> findSubstring(String S, String[] L) {
        ArrayList<Integer> result = new ArrayList<Integer>();
        if (S == null || S.length() == 0 || L == null || L.length == 0)
            return result;
        
        HashSet<String> hashSet = new HashSet<String>();
        
        int num = L.length;
        int len = L[0].length();
        
        int start = 0;
        int count = 0;
        int i = 0;
        while (i < S.length() - len + 1) {
            if (!hashSet.contains(S.substring(i, i + len)) && isDictWords(S.substring(i, i + len), L)) {
                hashSet.add(S.substring(i, i + len));
                count++;
                if (count == num) {
                    result.add(start);
                    hashSet.clear();
                    count = 0;
                    start = i + len;
                }
                i += len;
            } else if (hashSet.contains(S.substring(i, i + len)) && isDictWords(S.substring(i, i + len), L)) {
                start = i;
                i += len;
            } else if (!isDictWords(S.substring(i, i + len), L)) {
                start = i + 1;
                count = 0;
                hashSet.clear();
                i++;
            }
        }
        return result;
    }
    
    private boolean isDictWords(String str, String[] dict) {
        for (int i = 0; i < dict.length; i++) {
            for (int j = 0; j < str.length(); j++) {
                if (str.charAt(j) != dict[i].charAt(j)) {
                    break;
                } else if (str.charAt(j) == dict[i].charAt(j) && j == str.length() - 1) {
                    return true;
                }
            }
        }
        return false;
    }
}


But the code above still failed the test case above. Why? Look carefully of the dictionary in the test case above, it contains duplicated word "wing", although I don't think a dictionary should contain duplicated words. So we must modify the code to reflect this property. 

Correct Code:
public class Solution {
    public ArrayList<Integer> findSubstring(String S, String[] L) {
        ArrayList<Integer> result = new ArrayList<Integer>();
        
        if (S == null || S.length() == 0 || L == null || L.length == 0) {
            return result;
        }
        
        HashMap<String, Integer> hashMap = new HashMap<String, Integer>();
        
        for (int i = 0; i < L.length; i++) {
            if (hashMap.containsKey(L[i])) {
                hashMap.put(L[i], hashMap.get(L[i]) + 1);
            } else {
                hashMap.put(L[i], 1);
            }
        }
        
        int i = 0;
        int size = L[0].length();
        while (S.length() - i >= L.length * size) {
            HashMap<String, Integer> dict = new HashMap<String, Integer>(hashMap);
            
            for (int j = 0; j < L.length * size; j += size) {
                if (j + size > L.length * size) break;
                String sub = S.substring(i + j, i + j + size);
                if (dict.containsKey(sub)) {
                    if (dict.get(sub) == 1) {
                        dict.remove(sub);
                    } else {
                        dict.put(sub, dict.get(sub) - 1);
                    }
                } else {
                    break;
                }
            }
            if (dict.isEmpty()) {
                result.add(i);
            }
            i++;
        }
        return result;
    }
}

Discussion:
The pitfall of this problem is dictionary may contain duplicated keys. To avoid this, we used a hash map to note the number of times each dictionary word occurs. If we found one word, we decrease the counter for the word in dictionary. If the counter is 1, we delete the word. So the time complexity is O(n * m * l), where n is the length of the string, m is the number of elements in the dictionary, and l is the length of the dictionary word. The space complexity is O(m * l). 

Summary:
This problem can be categorized as the same group of substring problem, where the basic idea is to maintain two pointers, one points to the string, another one points to the dictionary. The mainly thing to note is the pointers points to the string can move only one step at a time. 

Update on 11/19/2014:
public class Solution {
 public List<Integer> findSubstring(String S, String[] L) {
  List<Integer> result = new ArrayList<Integer>();
  if (S == null || S.length() == 0 || L == null || L.length == 0) {
   return result;
        }

        int m = L[0].length();
        Map<String, Integer> map = new HashMap<String, Integer>();
        for (String str : L) {
         if (map.containsKey(str)) {
             int count = map.get(str) + 1;
             map.put(str, count);
            } else {
             map.put(str, 1);
            }
        }

        for (int start = 0; start < S.length(); start++) {
         if (S.length() - start < (L.length * m)) {
             break;
            }
            
            Map<String, Integer> curMap = new HashMap<String, Integer>(map);
            for (int end = start; end < S.length(); end += m) {
                if (S.length() - end < (curMap.size() * m)) {
                    break;
                }
                String token = S.substring(end, end + m);
                if (curMap.containsKey(token)) {
              int count = curMap.get(token);
              if (count == 1) {
                     curMap.remove(token);
                    } else {
                     count--;
                     curMap.put(token, count);
                    }
                } else {
                 break;
                }
                if (curMap.isEmpty()) {
                 result.add(start);
                 break;
                }
            } 
        }
        return result;
    }
}

No comments:

Post a Comment