Buttercola: subString

Given a string, find the length of the longest substring without repeating characters. For example, the longest substring without repeating letters for "abcabcbb" is "abc", which the length is 3. For "bbbbb" the longest substring is "b", with the length of 1.

Analysis:
The question asks for finding the longest substring without repeating characters, and return its length. So the problem has two requirements:

1. Find a substring without repeating characters, and keep its length
2. Return the longest length

We can give several examples from empty string to common cases:
a. For empty string, its substring is empty as well, so return the length 0
b. For string length equals to 1, the longest substring is itself, return 1;
c. For string "aab", its longest substring is "ab", return 2;
d. For string "abcabcbb", the longest substring is "abc", return 3.

Naive Solution:
The crux of this question is to find longest substring without repeating characters. So for each character in the string, we need to check if the character is repeated before. How to check efficiently? If we compare the character with all its previous characters, the time complexity would be O(n^2). That is, in worst case, the string contains no repeated characters, so the longest substring is the string itself.

How about using hash table? Specifically, we can use Java HashMap. The key stores the character while value stores the index. Why we need to store the index? Consider the string "abcdaef". When we encounter the second a, we need to reverse the index back starting from b. So the length of the longest substring is "bcdaef", i.e, 6.

Code (Java):

public class Solution {
    public int lengthOfLongestSubstring(String s) {
        // check if string is empty or length equals to 1
        if (s.length() < 2) return s.length(); 
        
        HashMap<Character, Integer> hashMap = new HashMap<Character, Integer>();
        int maxLen = 1;
        int len = 0;
        
        for (int i = 0; i < s.length(); i++) {
            if (!hashMap.containsKey(s.charAt(i))) {
                hashMap.put(s.charAt(i), i);
                len++;
            } else { // found repeated
                i = hashMap.get(s.charAt(i));
                maxLen = Math.max(maxLen, len);
                hashMap.clear();
                len = 0;
            }
        }
        return Math.max(maxLen, len);
     }
}

Discussion:
The solution looks straight-forward. There are several points need to take note. First of all, when we found a repeated character, we need to move the index back to the first occurrence of the character. In addition, we must clean the HashMap and the counter. That are tiny things likely to forget.

Now let's take a look at the complexity. It is a little bit complicated since we move back the index once we found a duplicate. In the best case, where string has no duplicates. There is no need to move back the index, and we only need to iterate the string once, so the time complexity is O(n). In average case, the time complexity is O(n * m), where m is the length of the longest substring. Why? For a string "abcabc", for each character, it has to move m steps until meets the duplicated character. Thus the complexity is O(nm). Regarding to the space complexity, since the hash table stores at most m key-value pairs, the space complexity is O(m).

A Better Solution:
The native solution has O(nm) complexity since we always moved back the index to the position next to the first occurrence of the duplicated character. Do we have a method that never move back the index? Then the time complexity would become O(n).

Remember we have two pointers solution. In the first round, move one pointer until we reach a duplicated character, then use another pointer points to the first occurrence of the duplicated element. Then instead of moving back the frond pointer, we continually iterate the first pointer in the second round and update the back pointer to the position next to the first occurrence of the duplicated element.

Code (Java):

public class Solution {
    public int lengthOfLongestSubstring(String s) {
        // if the string is empty or has only one character
        if (s.length() < 2) return s.length();
        
        HashMap<Character, Integer> hashMap = new HashMap<Character, Integer>();
        int j = 0;
        int maxLen = 1;
        
        for (int i = 0; i < s.length(); i++) {
            if (!hashMap.containsKey(s.charAt(i))) {
                hashMap.put(s.charAt(i), i);
            } else {
                maxLen = Math.max(maxLen, i - j);
                for (int k = j; k < i; k++) {
                    if (s.charAt(k) == s.charAt(i)) {
                        j = k + 1;
                        break;
                    }
                }
            }    
        }
        return Math.max(maxLen, s.length() - j);
    }
}

Discussion:
In this solution, since we never revert the front pointer back, the time complexity is O(n) and the space complexity is O(m). Note that When we found a duplicated element, we must update the corresponding j pointer. But why not just use hashMap.get(charAt(i)) + 1 ?
That is because we never update the position of the duplicated keys in the hash table. For example, for the string "abababc", when we meet the third a, the j pointer will be moved back to the first b, which is wrong.

A Wrong Solution:

public class Solution {
    public int lengthOfLongestSubstring(String s) {
        // if the string is empty or has only one character
        if (s.length() < 2) return s.length();
        
        HashMap<Character Integer> hashMap = new HashMap<Character Integer>();
        int j = 0;
        int maxLen = 1;
        
        for (int i = 0; i < s.length(); i++) {
            if (!hashMap.containsKey(s.charAt(i))) {
                hashMap.put(s.charAt(i), i);
            } else {
                maxLen = Math.max(maxLen, i - j);
                j = hashMap.get(s.charAt(i)) + 1;
                hashMap.put(s.charAt(i), i);
            }    
        }
        return Math.max(maxLen, s.length() - j);
    }
}

Why this solution is wrong? It updates the value for the duplicated key. Honestly, I haven't figured out the reason. I will come back later.

Update on 10/14:
The problem of the wrong solution is the left boundary might move back. For example,
abcdba, for the second a, the left boundary will move back to the first b, which wrongly get a longer substring.

Another Thought:
The bast solution above would have O(n) time and O(m) space. It seems that time complexity has reached the optimal, how about the space complexity? Can we have a constant space solution?

The answer is yes. Actually if we can assume that the string is encoded by ASCII code, it has at most 256 different characters in the string. Therefore, we could create a bit map with size of 256. If a character occurred before, we mark the corresponding map as true.

In this solution, the space complexity is constant, i.e. 256, no matter how long the string is.

Summary:
In this post, we learned how to find a longest substring without repeating characters. We introduced three methods: naive solution with O(nm) time complexity, linear solution with O(m) space complexity and linear solution with constant space complexity.

The problem itself is not very difficult. But it is tricky to devise the two pointers approach. It is very important to deduce your idea to interviewers instead of memorizing solution.

Update on 11/16/14:
Actually we don't need to maintain a hash map, but just a set is fine. That is because we only need to save the visited characters in the set.

Code (Java):

public class Solution {
    public int lengthOfLongestSubstring(String s) {
        if (s == null || s.length() == 0) {
            return 0;
        }
        
        int start = 0;
        int end = 0;
        int len = 1;
        Set<Character> set = new HashSet<Character>();
        
        for (end = 0; end < s.length(); end++) {
            if (!set.contains(s.charAt(end))) {
                set.add(s.charAt(end));
            } else {
                len = Math.max(len, end - start);
                
                // move the start pointer
                while (start < end) {
                    if (s.charAt(start) != s.charAt(end)) {
                        set.remove(s.charAt(start));
                    } else {
                        break;
                    }
                    start++;
                }
                
                start++;
            }
        }
        len = Math.max(len, end - start);
        return len;
    }
}

Update on 5/2/18:

class Solution {
    public int lengthOfLongestSubstring(String s) {
        if (s == null || s.length() == 0) {
            return 0;
        }
        
        int ans = 0;
        Set<Character> set = new HashSet<>();
        int j = 0;
        int i = 0;
        
        for (i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
            
            if (!set.contains(c)) {
                set.add(c);
            } else {
                // First note the current length
                //
                ans = Math.max(ans, i - j);
                
                // Then move forward the j
                //
                while (s.charAt(j) != c) {
                    set.remove(s.charAt(j));
                    j++;
                }
                
                // move one more step for j
                //
                j++;
            }
        }
        
        ans = Math.max(ans, i - j);
        
        return ans;
    }
}

Implement strStr().

Returns a pointer to the first occurrence of needle in haystack, or null if needle is not part of haystack.

Analysis:
As usual, before we come out a solution, we must understand the question very well. Ask questions if there is any ambiguities in it.

In this question, we are actually asked for finding a substring from a string.
The question looks simple at first glance but very tricky.

First of all, let's look at the return value. The question asks for returning a pointer to the first occurrence of needle in haystack. And the method of the question defines a return value as a string. Therefore, we can return a substring(i) where i points to the first occurrence of the substring. For instance, for haystack = "abcde", needle = "cd", the return value is "cde"

Then we can take several examples. The crux of the question is to handle boundary condition well. Let's take a look at it from null string.

1. If either of the string null. We should return null. Note that you should always check null string before calling length() else it will return NullPointerException.

2. If both strings are empty string, we should return empty string instead of null. That is because empty string is substring of any string

3. If haystack is empty, while needle string is not, we should return null. To make it more generic, if length of haystack is less than needle, it must return null.

4. If haystack is not empty while needle is empty, we should return haystack. Again, that is because empty string is a substring of all strings. Therefore, we may merge the condition check 2 and 4 into one together, i.e, only check if needle is empty, if yes, return haystack.

Solution:
Use two pointers, one points i, the haystack string, while the other points j, to the need string. For each i, we check if charAt(i) == charAt(j) and its subsequent characters in j matches. Note that you should check if i is overflowed. For instance, in the condition haystack = "abcde", while needle = "efg", the first match happens at end of the haystack string, when we increase the pointer of i and j, we should make sure i is not overflow.

Code (Java):

public class Solution {
    public String strStr(String haystack, String needle) {
        // if two strings are null,  return null
        if (haystack == null || needle == null) return null;
        
        // if two strings are empty, return empty instead of NULL
        //if (haystack.isEmpty() && needle.isEmpty()) return "";
        
        // if needle string is empty, return haystack
        if (needle.isEmpty()) return haystack;
        
        int lenHaystack = haystack.length();
        int lenNeedle = needle.length();
        
        // if length of haystack less than needle, return empty string
        if (lenHaystack < lenNeedle) return null;
        
        for (int i = 0; i < lenHaystack; i++) {
             for (int j = 0; j < lenNeedle; j++) {
                // check if haystack is overflow
                if (i + j >= lenHaystack) return null;
                if (haystack.charAt(i + j) != needle.charAt(j)) break;
                else if (j == lenNeedle - 1) return haystack.substring(i);
            }
        }
        return null;
    }
}

Discussion:
For haystack length is n and needle length m, the worst case time complexity is O(mn). That is because for each character in haystack, we should check every character in needle until we find a match. In the worst case there is no matched substring, so the complexity in worst case is O(mn). Since we did not allocate additional space the space complexity is constant.

Summary:
This question is very simple but quite error-prone. It is actually very hard to write a bug-free implementation at once. The crux is you should consider every possible conditions and check. The rule of thumb for string problems is first to check if the input string(s) is null, and consider the return values. In most cases, null string is the most unacceptable string because it points to a null address. Then check if it is an empty string and consider the return value.
Take care if the string if is overflowed and make a condition to check.

Update on 11/19/2014:

Implement strStr().

Returns the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

Update (2014-11-02):
The signature of the function had been updated to return the index instead of the pointer. If you still see your function signature returns a char * or String, please click the reload button to reset your code definition.

Code (Java):

public class Solution {
    public int strStr(String haystack, String needle) {
     if (needle == null || needle.length() == 0) {
         return 0;
        }

        if (haystack == null || haystack.length() == 0) {
         return -1;
        }

        for (int i = 0; i < haystack.length() - needle.length() + 1; i++) {
         int j = 0;
         for (j = 0; j < needle.length(); j++) {
          if (haystack.charAt(i + j) != needle.charAt(j)) {
           break;
                } 
            }
            if (j == needle.length()) {
             return i;
            }
        }

        return -1;
    }
}

Update on 9/28/15:

public class Solution {
    public int strStr(String haystack, String needle) {
        for (int i = 0; i <= haystack.length() - needle.length(); i++) {
            int j = 0;
            int start = i;
            for (j = 0; j < needle.length(); j++) {
                if (start < haystack.length() && haystack.charAt(start) == needle.charAt(j)) {
                    start++;
                } else {
                    break;
                }
            }
            if (j == needle.length()) {
                return i;
            }
        }
        
        return -1;
    }
}

Update on 4/14/19:

public class Solution {
    /**
     * @param source: 
     * @param target: 
     * @return: return the index
     */
    public int strStr(String source, String target) {
        for (int i = 0; i < source.length() - target.length() + 1; i++) {
            int j = 0;
            while (j < target.length() && source.charAt(i + j) == target.charAt(j)) {
                j++;
            }
            
            if (j == target.length()) {
                return i;
            }
        }
        
        return -1;
    }
}

Buttercola

Tuesday, August 5, 2014

Leetcode: Longest Substring Without Repeating Characters

Monday, August 4, 2014

Leetcode: Implement strStr()