Wednesday, November 12, 2014

Facebook: Binary tree serialization and deserialization

Source:
http://www.geeksforgeeks.org/serialize-deserialize-binary-tree/
http://www.ninechapter.com/problem/51/


Understand the problem:
Serialization is to store tree in a file so that it can be later restored. The structure of tree must be maintained. Deserialization is reading tree back from file.

Following are some simpler versions of the problem:


If given Tree is Binary Search Tree?
If the given Binary Tree is Binary Search Tree, we can store it by either storing preorder or postorder traversal. In case of Binary Search Trees, only preorder or postorder traversal is sufficient to store structure information.

If given Binary Tree is Complete Tree?
A Binary Tree is complete if all levels are completely filled except possibly the last level and all nodes of last level are as left as possible (Binary Heaps are complete Binary Tree). For a complete Binary Tree, level order traversal is sufficient to store the tree. We know that the first node is root, next two nodes are nodes of next level, next four nodes are nodes of 2nd level and so on.

If given Binary Tree is Full Tree?
A full Binary is a Binary Tree where every node has either 0 or 2 children. It is easy to serialize such trees as every internal node has 2 children. We can simply store preorder traversal and store a bit with every node to indicate whether the node is an internal node or a leaf node.

How to store a general Binary Tree?
1. Use inorder and postorder traversal, or inorder and preorder traversals.
A simple solution is to store both Inorder and Preorder traversals. This solution requires requires space twice the size of Binary Tree.We can save space by storing Preorder traversal and a marker for NULL pointers.
Leetcode Link:
http://buttercola.blogspot.com/2014/08/leetcode-construct-binary-tree-from_19.html
http://buttercola.blogspot.com/2014/08/leetcode-construct-binary-tree-from.html

Discussion:
The space complexity is O(2 * n) since you need to save two copies of the traversals. 

2. DFS Solution:
The DFS solution is based on preorder traversal. 
For the binary tree serialization, if we see a null child, just save it as a "#" marker. 
For e.g. 
                     1
                    /   \               
                  2     3
                /   \
              4     5
So the output is 1 2 4  # # 5 # # 3 # #.

For the deserialization, if we see a # key, return null. So the process is still like the preorder traversal. 

Code (Java):

public class Solution2 {
    public String serialdfs(TreeNode root) {
        if (root == null) {
            return "";
        }
        
        StringBuilder sb = new StringBuilder();
        
        serialdfsHelper(root, sb);
        
        return sb.toString();
    }
    
    private void serialdfsHelper(TreeNode root, StringBuilder sb) {
        if (root == null) {
            sb.append("#");
            return;
        }
        
        sb.append(root.val);
        
        serialdfsHelper(root.left, sb);
        serialdfsHelper(root.right, sb);
    }
    
    public int i = 0;
    public TreeNode deserialdfs(String s) {
        if (s == null || s.length() == 0) {
            return null;
        }
        
        return deserialdfsHelper(s);
    }
    
    private TreeNode deserialdfsHelper(String s) {
        if (i >= s.length() || s.charAt(i) == '#') {
            return null;
        }
        
        TreeNode root = new TreeNode(s.charAt(i) - '0');
        i++;
        root.left = deserialdfsHelper(s);
        i++;
        root.right = deserialdfsHelper(s);
        
        return root;
    }
    
    public static void main(String[] args) {
      Solution2 sol = new Solution2();

      TreeNode root = new TreeNode(1);
      root.left = new TreeNode(2);
      root.right = new TreeNode(3);
      root.left.left = new TreeNode(4);
      root.left.right = new TreeNode(5);


      String result = sol.serialdfs(root);
      
      TreeNode root2 = sol.deserialdfs(result);

      String result2 = sol.serialdfs(root2);

      System.out.println(result2);
    }
}

Discussion:
Now let's discuss the space complexity. How much additional space is needed? Suppose that the tree is complete binary tree, and each internal node has two children. The answer is n + 1 additional nodes needed, where n is the number of nodes in the binary tree. Why? 
That is because for a complete binary tree, the number of nodes in each level is a geometric sequence. So if the tree has n nodes in total, the n + 1 level has n + 1 nodes. It is kind of wasting space because we need to add two # characters for each leaf node. 

3. BFS Solution:
For the example above, the DFS serialization of the tree is 
1 2 4 # # 5 # # 3 # #
How about BFS traversal, i.e, level-order traversal? Then the serialization would be:
1 2 3 4 5 # #
Note that we could truncate the trialing # characters to save additional spaces. So you can see that using BFS could greatly save the space. 

Serailization of the binary tree using BFS is simple. It is just like the binary tree level-order traversal by using a queue. The only trick is if the tree node dequeued is a null node, append a # character to the result list. 

Deserialization is kind of tricky. For a complete tree with node index i, i starts from 0, we know that its left child's index is 2 * i + 1, and right child's index is 2 * i + 2. So we could use this property to find a node's left and right children for a given root index i.

Code (Java):
import java.util.*;

class Solution {
    /**
     * This method will be invoked first, you should design your own algorithm 
     * to serialize a binary tree which denote by a root node to a string which
     * can be easily deserialized by your own "deserialize" method later.
     */
    public String serialize(TreeNode root) {
        // write your code here
        if (root == null) {
            return "";
        }
        
        StringBuilder sb = new StringBuilder();
        Queue<TreeNode> queue = new LinkedList<TreeNode>();
        queue.offer(root);
        
        while(!queue.isEmpty()) {
            TreeNode curr = queue.poll();
            if (curr != null) {
                sb.append(curr.val);
            } else {
                sb.append("#");
            }
            
            if (curr != null) {
                queue.offer(curr.left);
                queue.offer(curr.right);
            }
        }
        
        return sb.toString();
    }

    public TreeNode deserialize(String data) {
        if (data == null || data.length() == 0) {
            return null;
        }
        
        return deserializeHelper(data, 0);
    }
    
    private TreeNode deserializeHelper(String data, int start) {
        if (start >= data.length() || data.charAt(start) == '#') {
                return null;
        }
        
        TreeNode root = new TreeNode(data.charAt(start) - '0');
        root.left = deserializeHelper(data, 2 * start + 1);
        root.right = deserializeHelper(data, 2 * start + 2);
        
        return root;
        
    }

    public static void main(String[] args) {
      Solution sol = new Solution();

      TreeNode root = new TreeNode(1);
      root.right = new TreeNode(2);

      String result = sol.serialize(root);
      
      TreeNode root2 = sol.deserialize(result);

      String result2 = sol.serialize(root2);

      System.out.println(result2);
    }
}

No comments:

Post a Comment