Thursday, August 14, 2014

Leetcode: Word Ladder

Given two words (start and end), and a dictionary, find the length of shortest transformation sequence from start to end, such that:
  1. Only one letter can be changed at a time
  2. Each intermediate word must exist in the dictionary
For example,
Given:
start = "hit"
end = "cog"
dict = ["hot","dot","dog","lot","log"]
As one shortest transformation is "hit" -> "hot" -> "dot" -> "dog" -> "cog",
return its length 5.
Note:
  • Return 0 if there is no such transformation sequence.
  • All words have the same length.
  • All words contain only lowercase alphabetic characters.
Understand the problem:
The question gives two words and a dictionary, find the length of  the shortest transformation sequence from  start to end. Note the two requirements. 

Naive Solution:
One straight-forward solution is to use BFS + a queue. The idea is to add the start into the queue, dequeue, and check if it is one letter difference between the end. If yes, we are done, return the length 2; otherwise, we add all neighbors into the queue. The neighbors are defined as one letter difference between the element just dequeued. Then repeat to check if all its neighbours is one letter difference between the end. If not, add the neighbours again. 
One thing needs to handle is to mark the words in the dictionary as visited. The easiest way is just to remove the visited words in the dictionary. 

Code (Java):
public class Solution {
    public int ladderLength(String start, String end, Set<String> dict) {
        if (start == null || start.isEmpty() || end == null || end.isEmpty())
            return 0;
        
        int length = 0;
        Queue<String> queue = new LinkedList<String>();
        queue.offer(start);
        
        HashSet<String> shadow = new HashSet<String>();
        
        while (!queue.isEmpty()) {
            int size = queue.size();
            for (int i = 0; i < size; i++) {
                String curr = queue.poll();
                if (isAdj(curr, end)) {
                    return length + 1;
                } else {
                    for (String str : dict) {
                        if (isAdj(curr, str) && !shadow.contains(str)) {
                            queue.offer(str);
                            //dict.remove(str);
                            shadow.add(str);
                        }
                    }
                    length++;
                }
            }
        }
        return 0;
    }
    
    // determine if two strings have only one different letter
    private boolean isAdj(String str1, String str2) {
        int count = 0;
        for (int i = 0; i < str1.length(); i++) {
            if (str1.charAt(i) != str2.charAt(i)) {
                count++;
            }
        }
        
        return count == 1;
    }
}

Discussion:
Now let's analyze this solution. Note that instead of removing the dictionary, we used a shadow hash set to store the visited nodes. So we don't have to modify the input parameters. For each node dequeued in the queue, it takes O(n) time to find its neighbor. Since there will be totally n nodes enqueued into the queue, it takes totally O(n^2) time. Also consider the complexity of determining if two strings are adjacent, which has time of O(m), where m is the length of the string. The total time complexity is O(n^2 * m). The space complexity is O(n) + O(n) = O(n). 

A Better Solution:
In the naive solution, for each node which is not adjacent to the end, we look for all its neighbors in the dictionary.   This will take O(n * m) time, where n is the number of strings in the dictionary, and m is the length of a string. If we can do this in constant time, or at least proportional to m, the overall time complexity would be O(n * m). Note that 
  • All words contain only lowercase alphabetic characters.
Does it give us a hint? We can change a character of the string at a time and traverse all possible changes, then compare with the end string. If yes, we are done and simply return the length. Else if the changed string equals to the dictionary, add the string into the queue. Since lowercase alphabetic characters have exactly 26 different characters,  it takes O(26 * m) time to traverse all possible changes for a given string. 

Code (Java):
public class Solution {
    public int ladderLength(String start, String end, Set<String> dict) {
        if (start == null || start.isEmpty() || end == null || end.isEmpty())
            return 0;
        
        int length = 1;
        Queue<String> queue = new LinkedList<String>();
        queue.offer(start);
        
        HashSet<String> visited = new HashSet<String>();
        
        while (!queue.isEmpty()) {
            int size = queue.size();
            for (int i = 0; i < size; i++) {
                String curr = queue.poll();
                for (int j = 0; j < curr.length(); j++) {
                    char[] charCurr = curr.toCharArray();
                    for (char c = 'a'; c < 'z'; c++) {
                        charCurr[j] = c;  // change one character at a time
                        String strCurr = new String(charCurr);
                        if (strCurr.equals(end)) {
                            return length + 1;
                        } else {
                            if (dict.contains(strCurr) && !visited.contains(strCurr)) {
                                queue.offer(strCurr);
                                visited.add(strCurr);
                            }
                        }
                    }
                }
            }
            length++;
        }
        return 0;
    }
}

Discussion:
Note the line  17 should be in the outer loop, since we can change at most a character at a time. So if we traversed all possible characters for a position, we should recover it back to the original string. 

As we discussed before, the time complexity is O(n * m). The space complexity is O(n). 

Summary:
This question can be categorized into the graph theory, where each node represents a word, and each edge connects two neighbors. For this question, it actually asks us to find the shortest path in a graph. In such case a BFS would be the best solution. Also note that BFS will usually come with a queue. 

2 comments:

  1. your first solution wont pass this example:

    "hot", "dot", "dat", "lot", "log", "dog"

    ReplyDelete