Thursday, November 26, 2015

Airbnb: K Edit Distance

Question:
Given a list of word and a target word, output all the words for each the edit distance with the target no greater than k.

e.g. [abc, abd, abcd, adc], target "ac", k = 1,

output = [abc, adc]

Naive Solution:
A naive solution would be, for each word in the list, calculate the edit distance with the target word. If it is equal or less than k, output to the result list. 

If we assume that the average length of the words is m, and the total number of words in the list is n, the total time complexity is O(n * m^2). 

A better solution with a trie:
The problem with the previous solution is if the given list of the words is like ab, abc, abcd, each time we need to repeatedly calculate the edit distance with the target word. If we can combine the prefix of all words together, we can save lots of time. 

Code (Java):
import java.io.*;
import java.util.*;

public class Solution{
  public List<String> getKEditDistance(String[] words, String target, int k) {
    List<String> result = new ArrayList<>();
    
    if (words == null || words.length == 0 || 
        target == null || target.length() == 0 || 
        k < 0) {
      return result;
    }
    
    Trie trie = new Trie();
    for (String word : words) {
      trie.add(word);
    }
    
    TrieNode root = trie.root;
    
    int[] prev = new int[target.length() + 1];
    for (int i = 0; i < prev.length; i++) {
      prev[i] = i;
    }
    
    getKEditDistanceHelper("", target, k, root, prev, result);
    
    return result;
  }
  
  private void getKEditDistanceHelper(String curr, String target, int k, TrieNode root, 
                                      int[] prevDist, List<String> result) {
    if (root.isLeaf) {
      if (prevDist[target.length()] <= k) {
        result.add(curr);
      } else {
        return;
      }
    }
    
    for (int i = 0; i < 26; i++) {
      if (root.children[i] == null) {
        continue;
      }
      
      int[] currDist = new int[target.length() + 1];
      currDist[0] = curr.length() + 1;
      for (int j = 1; j < prevDist.length; j++) {
        if (target.charAt(j - 1) == (char) (i + 'a')) {
          currDist[j] = prevDist[j - 1];
        } else {
          currDist[j] = Math.min(Math.min(prevDist[j - 1], prevDist[j]), 
                                 currDist[j - 1]) + 1;
        }
      }
      
      getKEditDistanceHelper(curr + (char) (i + 'a'), target, k, 
         root.children[i], currDist, result);
    }
  }
  
  public static void main(String[] args) {
    Solution solution = new Solution();
    String[] input = new String[]{"abcd", "abc", "abd", "ad"};
    String target = "ac";
    int k = 1;
    
    List<String> result = solution.getKEditDistance(input, target, k);
    
    for (String s : result) {
      System.out.println(s);
    }
  }
  
  class TrieNode {
    TrieNode[] children;
    boolean isLeaf;
    
    public TrieNode() {
      children = new TrieNode[26];
      
    }
  }
  
  class Trie {
    TrieNode root;
    
    public Trie() {
      root = new TrieNode();
    }
    
    // Add a word into trie
    public void add(String s) {
      if (s == null || s.length() == 0) {
        return;
      }
      
      TrieNode p = root;
      for (int i = 0; i < s.length(); i++) {
        char c = s.charAt(i);
        if (p.children[c - 'a'] == null) {
          p.children[c - 'a'] = new TrieNode();
        }
        
        if (i == s.length() - 1) {
          p.children[c - 'a'].isLeaf = true;
        }
        
        p = p.children[c - 'a'];
      }
    }
  }
}


6 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. In the line 47, why currDist[0] = curr.length() + 1? I think it should be currDist[0] = curr.length().

    ReplyDelete
    Replies
    1. The current character hasn't been added to curr yet, so need to +1.

      Delete
  3. What's the time complexity of the solution?

    ReplyDelete
  4. can you explain on what's your logic in Kdistance function ? I know all the building trie thing, but i dont understand what exactly need to do in k-edit distance ...
    are you using dynamic programming ?

    ReplyDelete
  5. can you explain on what's your logic in Kdistance function ? I know all the building trie thing, but i dont understand what exactly need to do in k-edit distance ...
    are you using dynamic programming ?

    ReplyDelete