Tuesday, August 5, 2014

Leetcode: Longest Substring Without Repeating Characters

Given a string, find the length of the longest substring without repeating characters. For example, the longest substring without repeating letters for "abcabcbb" is "abc", which the length is 3. For "bbbbb" the longest substring is "b", with the length of 1.

Analysis:
The question asks for finding the longest substring without repeating characters, and return its length. So the problem has two requirements: 

1. Find a substring without repeating characters, and keep its length
2. Return the longest length

We can give several examples from empty string to common cases:
a. For empty string, its substring is empty as well, so return the length 0
b. For string length equals to 1, the longest substring is itself, return 1;
c. For string "aab", its longest substring is "ab", return 2;
d. For string "abcabcbb", the longest substring is "abc", return 3.

Naive Solution:
The crux of this question is to find longest substring without repeating characters. So for each character in the string, we need to check if the character is repeated before. How to check efficiently? If we compare the character with all its previous characters, the time complexity would be O(n^2). That is, in worst case, the string contains no repeated characters, so the longest substring is the string itself. 

How about using hash table? Specifically, we can use Java HashMap. The key stores the character while value stores the index. Why we need to store the index? Consider the string "abcdaef". When we encounter the second a, we need to reverse the index back starting from b. So the length of the longest substring is "bcdaef", i.e, 6.

Code (Java):
public class Solution {
    public int lengthOfLongestSubstring(String s) {
        // check if string is empty or length equals to 1
        if (s.length() < 2) return s.length(); 
        
        HashMap<Character, Integer> hashMap = new HashMap<Character, Integer>();
        int maxLen = 1;
        int len = 0;
        
        for (int i = 0; i < s.length(); i++) {
            if (!hashMap.containsKey(s.charAt(i))) {
                hashMap.put(s.charAt(i), i);
                len++;
            } else { // found repeated
                i = hashMap.get(s.charAt(i));
                maxLen = Math.max(maxLen, len);
                hashMap.clear();
                len = 0;
            }
        }
        return Math.max(maxLen, len);
     }
}

Discussion:
The solution looks straight-forward. There are several points need to take note. First of all, when we found a repeated character, we need to move the index back to the first occurrence of the character. In addition, we must clean the HashMap and the counter. That are tiny things likely to forget.  

Now let's take a look at the complexity. It is a little bit complicated since we move back the index once we found a duplicate. In the best case, where string has no duplicates. There is no need to move back the index, and we only need to iterate the string once, so the time complexity is O(n). In average case, the time complexity is O(n * m), where m is the length of the longest substring. Why? For a string "abcabc", for each character, it has to move m steps until meets the duplicated character. Thus the complexity is O(nm). Regarding to the space complexity, since the hash table stores at most m key-value pairs, the space complexity is O(m). 

A Better Solution:
The native solution has O(nm) complexity since we always moved back the index to the position next to the first occurrence of the duplicated character. Do we have a method that never move back the index? Then the time complexity would become O(n). 

Remember we have two pointers solution. In the first round, move one pointer until we reach a duplicated character, then use another pointer points to the first occurrence of the duplicated element. Then instead of moving back the frond pointer, we continually iterate the first pointer in the second round and update the back pointer to the position next to the first occurrence of the duplicated element. 


Code (Java):
public class Solution {
    public int lengthOfLongestSubstring(String s) {
        // if the string is empty or has only one character
        if (s.length() < 2) return s.length();
        
        HashMap<Character, Integer> hashMap = new HashMap<Character, Integer>();
        int j = 0;
        int maxLen = 1;
        
        for (int i = 0; i < s.length(); i++) {
            if (!hashMap.containsKey(s.charAt(i))) {
                hashMap.put(s.charAt(i), i);
            } else {
                maxLen = Math.max(maxLen, i - j);
                for (int k = j; k < i; k++) {
                    if (s.charAt(k) == s.charAt(i)) {
                        j = k + 1;
                        break;
                    }
                }
            }    
        }
        return Math.max(maxLen, s.length() - j);
    }
}

Discussion:
In  this solution, since we never revert the front pointer back, the time complexity is O(n) and the space complexity is O(m). Note that When we found a duplicated element, we must update the corresponding j pointer. But why not just use hashMap.get(charAt(i)) + 1 ? 
That is because we never update the position of the duplicated keys in the hash table. For example, for the string "abababc", when we meet the third a, the j pointer will be moved back to the first b, which is wrong. 

A Wrong Solution:
public class Solution {
    public int lengthOfLongestSubstring(String s) {
        // if the string is empty or has only one character
        if (s.length() < 2) return s.length();
        
        HashMap<Character Integer> hashMap = new HashMap<Character Integer>();
        int j = 0;
        int maxLen = 1;
        
        for (int i = 0; i < s.length(); i++) {
            if (!hashMap.containsKey(s.charAt(i))) {
                hashMap.put(s.charAt(i), i);
            } else {
                maxLen = Math.max(maxLen, i - j);
                j = hashMap.get(s.charAt(i)) + 1;
                hashMap.put(s.charAt(i), i);
            }    
        }
        return Math.max(maxLen, s.length() - j);
    }
}

Why this solution is wrong? It updates the value for the duplicated key. Honestly, I haven't figured out the reason. I will come back later.

Update on 10/14:
The problem of the wrong solution is the left boundary might move back. For example,
abcdba, for the second a, the left boundary will move back to the first b, which wrongly get a longer substring. 

Another Thought:
The bast solution above would have O(n) time and O(m) space. It seems that time complexity has reached the optimal, how about the space complexity? Can we have a constant space solution? 

The answer is yes. Actually if we can assume that the string is encoded by ASCII code, it has at most 256 different characters in the string. Therefore, we could create a bit map with size of 256. If a character occurred before, we mark the corresponding map as true. 

In this solution, the space complexity is constant, i.e. 256, no matter how long the string is. 

Summary:
In this post, we learned how to find a longest substring without repeating characters. We introduced three methods: naive solution with O(nm) time complexity, linear solution with O(m) space complexity and linear solution with constant space complexity. 

The problem itself is not very difficult. But it is tricky to devise the two pointers approach. It is very important to deduce your idea to interviewers instead of memorizing solution. 

Update on 11/16/14:
Actually we don't need to maintain a hash map, but just a set is fine. That is because we only need to save the visited characters in the set.

Code (Java):
public class Solution {
    public int lengthOfLongestSubstring(String s) {
        if (s == null || s.length() == 0) {
            return 0;
        }
        
        int start = 0;
        int end = 0;
        int len = 1;
        Set<Character> set = new HashSet<Character>();
        
        for (end = 0; end < s.length(); end++) {
            if (!set.contains(s.charAt(end))) {
                set.add(s.charAt(end));
            } else {
                len = Math.max(len, end - start);
                
                // move the start pointer
                while (start < end) {
                    if (s.charAt(start) != s.charAt(end)) {
                        set.remove(s.charAt(start));
                    } else {
                        break;
                    }
                    start++;
                }
                
                start++;
            }
        }
        len = Math.max(len, end - start);
        return len;
    }
}






1 comment:

  1. Great article. This is a very popular interview question. I also wrote an in-depth analysis at https://goo.gl/UaLw1E.

    ReplyDelete