Buttercola: 2014

Friday, December 12, 2014

Data Structure & Algorithms : Hash Tables

Hash Function: the hash function that we use uniformly distributes keys among the integer values between 0 and M-1.

How to resolve Hash Collision?

Hashing with separate chaining.
A hash function converts keys into array indices. The second component of a hashing algorithm is collision resolution: a strategy for handling the case when two or more keys to be inserted hash to the same index. A straightforward approach to collision resolution is to build, for each of the M array indices, a linked list of the key-value pairs whose keys hash to that index. The basic idea is to choose M to be sufficiently large that the lists are sufficiently short to enable efficient search through a two-step process: hash to find the list that could contain the key, then sequentially search through that list for the key.

Code (Java):

/*************************************************************************
 *  Compilation:  javac SeparateChainingHashST.java
 *  Execution:    java SeparateChainingHashST
 *
 *  A symbol table implemented with a separate-chaining hash table.
 * 
 *  % java SeparateChainingHashST
 *
 *************************************************************************/

public class SeparateChainingHashST＆lt;Key, Value> {
    private static final int INIT_CAPACITY = 4;

    // largest prime <= 2^i for i = 3 to 31
    // not currently used for doubling and shrinking
    // private static final int[] PRIMES = {
    //    7, 13, 31, 61, 127, 251, 509, 1021, 2039, 4093, 8191, 16381,
    //    32749, 65521, 131071, 262139, 524287, 1048573, 2097143, 4194301,
    //    8388593, 16777213, 33554393, 67108859, 134217689, 268435399,
    //    536870909, 1073741789, 2147483647
    // };

    private int N;                                // number of key-value pairs
    private int M;                                // hash table size
    private SequentialSearchST<Key, Value>[] st;  // array of linked-list symbol tables


    // create separate chaining hash table
    public SeparateChainingHashST() {
        this(INIT_CAPACITY);
    } 

    // create separate chaining hash table with M lists
    public SeparateChainingHashST(int M) {
        this.M = M;
        st = (SequentialSearchST<Key, Value>[]) new SequentialSearchST[M];
        for (int i = 0; i < M; i++)
            st[i] = new SequentialSearchST<Key, Value>();
    } 

    // resize the hash table to have the given number of chains b rehashing all of the keys
    private void resize(int chains) {
        SeparateChainingHashST<Key, Value> temp = new SeparateChainingHashST<Key, Value>(chains);
        for (int i = 0; i < M; i++) {
            for (Key key : st[i].keys()) {
                temp.put(key, st[i].get(key));
            }
        }
        this.M  = temp.M;
        this.N  = temp.N;
        this.st = temp.st;
    }

    // hash value between 0 and M-1
    private int hash(Key key) {
        return (key.hashCode() & 0x7fffffff) % M;
    } 

    // return number of key-value pairs in symbol table
    public int size() {
        return N;
    } 

    // is the symbol table empty?
    public boolean isEmpty() {
        return size() == 0;
    }

    // is the key in the symbol table?
    public boolean contains(Key key) {
        return get(key) != null;
    } 

    // return value associated with key, null if no such key
    public Value get(Key key) {
        int i = hash(key);
        return st[i].get(key);
    } 

    // insert key-value pair into the table
    public void put(Key key, Value val) {
        if (val == null) {
            delete(key);
            return;
        }

        // double table size if average length of list >= 10
        if (N >= 10*M) resize(2*M);

        int i = hash(key);
        if (!st[i].contains(key)) N++;
        st[i].put(key, val);
    } 

    // delete key (and associated value) if key is in the table
    public void delete(Key key) {
        int i = hash(key);
        if (st[i].contains(key)) N--;
        st[i].delete(key);

        // halve table size if average length of list <= 2
        if (M > INIT_CAPACITY && N <= 2*M) resize(M/2);
    } 

    // return keys in symbol table as an Iterable
    public Iterable<Key> keys() {
        Queue<Key> queue = new Queue<Key>();
        for (int i = 0; i < M; i++) {
            for (Key key : st[i].keys())
                queue.enqueue(key);
        }
        return queue;
    } 


   /***********************************************************************
    *  Unit test client.
    ***********************************************************************/
    public static void main(String[] args) { 
        SeparateChainingHashST<String, Integer> st = new SeparateChainingHashST<String, Integer>();
        for (int i = 0; !StdIn.isEmpty(); i++) {
            String key = StdIn.readString();
            st.put(key, i);
        }

        // print keys
        for (String s : st.keys()) 
            StdOut.println(s + " " + st.get(s)); 

    }

}

Hashing with linear probing.

Another approach to implementing hashing is to store N key-value pairs in a hash table of size M > N, relying on empty entries in the table to help with with collision resolution. Such methods are called open-addressing hashing methods. The simplest open-addressing method is called linear probing: when there is a collision (when we hash to a table index that is already occupied with a key different from the search key), then we just check the next entry in the table (by incrementing the index). There are three possible outcomes:

key equal to search key: search hit
empty position (null key at indexed position): search miss
key not equal to search key: try next entry

Code (Java):

/*************************************************************************
 *  Compilation:  javac LinearProbingHashST.java
 *  Execution:    java LinearProbingHashST
 *  
 *  Symbol table implementation with linear probing hash table.
 *
 *  % java LinearProbingHashST
 *  128.112.136.11
 *  208.216.181.15
 *  null
 *
 *
 *************************************************************************/


public class LinearProbingHashST<Key, Value> {
    private static final int INIT_CAPACITY = 4;

    private int N;           // number of key-value pairs in the symbol table
    private int M;           // size of linear probing table
    private Key[] keys;      // the keys
    private Value[] vals;    // the values


    // create an empty hash table - use 16 as default size
    public LinearProbingHashST() {
        this(INIT_CAPACITY);
    }

    // create linear proving hash table of given capacity
    public LinearProbingHashST(int capacity) {
        M = capacity;
        keys = (Key[])   new Object[M];
        vals = (Value[]) new Object[M];
    }

    // return the number of key-value pairs in the symbol table
    public int size() {
        return N;
    }

    // is the symbol table empty?
    public boolean isEmpty() {
        return size() == 0;
    }

    // does a key-value pair with the given key exist in the symbol table?
    public boolean contains(Key key) {
        return get(key) != null;
    }

    // hash function for keys - returns value between 0 and M-1
    private int hash(Key key) {
        return (key.hashCode() & 0x7fffffff) % M;
    }

    // resize the hash table to the given capacity by re-hashing all of the keys
    private void resize(int capacity) {
        LinearProbingHashST<Key, Value> temp = 
            new LinearProbingHashST<Key, Value>(capacity);
        for (int i = 0; i < M; i++) {
            if (keys[i] != null) {
                temp.put(keys[i], vals[i]);
            }
        }
        keys = temp.keys;
        vals = temp.vals;
        M    = temp.M;
    }

    // insert the key-value pair into the symbol table
    public void put(Key key, Value val) {
        if (val == null) {
            delete(key);
            return;
        }

        // double table size if 50% full
        if (N >= M/2) resize(2*M);

        int i;
        for (i = hash(key); keys[i] != null; i = (i + 1) % M) {
            if (keys[i].equals(key)) { vals[i] = val; return; }
        }
        keys[i] = key;
        vals[i] = val;
        N++;
    }

    // return the value associated with the given key, null if no such value
    public Value get(Key key) {
        for (int i = hash(key); keys[i] != null; i = (i + 1) % M) 
            if (keys[i].equals(key))
                return vals[i];
        return null;
    }

    // delete the key (and associated value) from the symbol table
    public void delete(Key key) {
        if (!contains(key)) return;

        // find position i of key
        int i = hash(key);
        while (!key.equals(keys[i])) {
            i = (i + 1) % M;
        }

        // delete key and associated value
        keys[i] = null;
        vals[i] = null;

        // rehash all keys in same cluster
        i = (i + 1) % M;
        while (keys[i] != null) {
            // delete keys[i] an vals[i] and reinsert
            Key   keyToRehash = keys[i];
            Value valToRehash = vals[i];
            keys[i] = null;
            vals[i] = null;
            N--;  
            put(keyToRehash, valToRehash);
            i = (i + 1) % M;
        }

        N--;        

        // halves size of array if it's 12.5% full or less
        if (N > 0 && N <= M/8) resize(M/2);

        assert check();
    }

    // return all of the keys as in Iterable
    public Iterable<Key> keys() {
        Queue<Key> queue = new Queue<Key>();
        for (int i = 0; i < M; i++)
            if (keys[i] != null) queue.enqueue(keys[i]);
        return queue;
    }

    // integrity check - don't check after each put() because
    // integrity not maintained during a delete()
    private boolean check() {

        // check that hash table is at most 50% full
        if (M < 2*N) {
            System.err.println("Hash table size M = " + M + "; array size N = " + N);
            return false;
        }

        // check that each key in table can be found by get()
        for (int i = 0; i < M; i++) {
            if (keys[i] == null) continue;
            else if (get(keys[i]) != vals[i]) {
                System.err.println("get[" + keys[i] + "] = " 
                  + get(keys[i]) + "; vals[i] = " + vals[i]);
                return false;
            }
        }
        return true;
    }


/***********************************************************************
    *  Unit test client.
    ***********************************************************************/
    public static void main(String[] args) { 
        LinearProbingHashST<String, Integer> st = new LinearProbingHashST<String, Integer>();
        for (int i = 0; !StdIn.isEmpty(); i++) {
            String key = StdIn.readString();
            st.put(key, i);
        }

        // print keys
        for (String s : st.keys()) 
            StdOut.println(s + " " + st.get(s)); 
    }
}

Data Structure & Algorithms : Sorting

Quick Sort:

public class Quick {
    public void quickSort(int[] A) {
        if (A == null || A.length <= 1) {
            return;
        }
        
        //Collections.shuffle(A);
        quickSortHelper(A, 0, A.length - 1);
    }

    private void quickSortHelper(int[] A, int lo, int hi) {
        if (lo >= hi) {
            return;
        }

        int j = partition(A, lo, hi);
        quickSortHelper(A, lo, j - 1);
        quickSortHelper(A, j + 1, hi);
    }
    
     //  partition the subarray a[lo..hi] so that a[lo..j-1] <= a[j] <=
     //  a[j+1..hi]
     //  and return the index j.
    private int partition(int[] A, int lo, int hi) {
        int pivot = A[lo];
        int i = lo + 1;
        int j = hi;

        while (i <= j) {
            while (i <= j && A[i] < pivot) {
                i++;
            }

            while (i <= j && A[j] > pivot) {
                j--;
            }
            if (i <= j) {
                swap(A, i, j);
            }
        }

        // swap the pivot 
        swap(A, lo, j);

        return j;
    }

    private void swap(int[] A, int i, int j) {
        int temp = A[i];
        A[i] = A[j];
        A[j] = temp; 
    }
    
    public static void main(String[] args) {
         int[] A = new int[]{-5,4,2,-6};

         Quick quick = new Quick();
         quick.quickSort(A);

         for (int i = 0; i < A.length; i++) {
            System.out.print(A[i] + " ");
         }

         System.out.println();
    }
}

Follow-up:
When is the quick sort has time complexity of O(n^2)?

The Quick sort performs worst ie, at O(n^2) when all the values of the pivot chosen is either the largest or smallest of the taken set. Consider this example.

1 2 3 4 5

The pivot chosen say is 1, you will have 4 elements on the right side of the pivot and no elements on the left side. Applying this same logic recursively and the pivot chosen is 2, 3, 4, 5 respectively, we have attained a situation where this sort has performed at its worst possible time.

It has been recommended and proven that Quicksort performs well if the input is shuffled well.

Moreover, selection of a sort usually depends on a clear knowledge about the input domain. For example, if the input is huge, then there is something called as external sort which may use external memory. If the input size is very small, we may go for a merge sort but not for medium and huge input sets since it uses extra memory. The main advantage of Quick sort is its "in-place"ness meaning, no extra memory is being used for the input data. Its worst case time on paper is O(n^2) but still is widely preferred and used. My point here is, sorting algorithms can be changed based on the knowledge on the input set and its a matter of preference.

Merge Sort:

public class MergeSort {
    public void mergeSort(int[] A) {
        if (A == null || A.length <= 1) {
            return;
        }
        
        int[] Aux = new int[A.length];

        divide(A, Aux, 0, A.length - 1);
    }

    private void divide(int[] A, int[] Aux, int lo, int hi) {
        if (lo >= hi) {
            return;
        }

        int mid = lo + (hi - lo) / 2;

        divide(A, Aux, lo, mid);
        divide(A, Aux, mid + 1, hi);

        merge(A, Aux, lo, mid, hi);
    }

    private void merge(int[] A, int[] Aux, int lo, int mid, int hi) {
        // copy A to Aux
        for (int k = lo; k <= hi; k++) {
            Aux[k] = A[k];
        }

        int i = lo;
        int j = mid + 1;

        for (int k = lo; k <= hi; k++) {
            if (i > mid) {
                A[k] = Aux[j++];
            } else if (j > hi) {
                A[k] = Aux[i++];
            } else if (Aux[i] <= Aux[j]) {
                A[k] = Aux[i++];
            } else {
                A[k] = Aux[j++];
            }
        }
    }

    public static void main(String[] args) {
         int[] A = new int[]{3,1};

         MergeSort merge = new MergeSort();
         merge.mergeSort(A);

         for (int i = 0; i < A.length; i++) {
            System.out.print(A[i] + " ");
         }

         System.out.println();
    }
}

Data Structure & Algorithms: Binary Search Tree (BST)

http://algs4.cs.princeton.edu/32bst/

A complete Java implementation:

/*************************************************************************
 *  Compilation:  javac BST.java
 *  Execution:    java BST
 *  Dependencies: StdIn.java StdOut.java Queue.java
 *  Data files:   http://algs4.cs.princeton.edu/32bst/tinyST.txt  
 *
 *  A symbol table implemented with a binary search tree.
 * 
 *  % more tinyST.txt
 *  S E A R C H E X A M P L E
 *  
 *  % java BST < tinyST.txt
 *  A 8
 *  C 4
 *  E 12
 *  H 5
 *  L 11
 *  M 9
 *  P 10
 *  R 3
 *  S 0
 *  X 7
 *
 *************************************************************************/

import java.util.NoSuchElementException;

public class BST<Key extends Comparable<Key>, Value> {
    private Node root;             // root of BST

    private class Node {
        private Key key;           // sorted by key
        private Value val;         // associated data
        private Node left, right;  // left and right subtrees
        private int N;             // number of nodes in subtree

        public Node(Key key, Value val, int N) {
            this.key = key;
            this.val = val;
            this.N = N;
        }
    }

    // is the symbol table empty?
    public boolean isEmpty() {
        return size() == 0;
    }

    // return number of key-value pairs in BST
    public int size() {
        return size(root);
    }

    // return number of key-value pairs in BST rooted at x
    private int size(Node x) {
        if (x == null) return 0;
        else return x.N;
    }

   /***********************************************************************
    *  Search BST for given key, and return associated value if found,
    *  return null if not found
    ***********************************************************************/
    // does there exist a key-value pair with given key?
    public boolean contains(Key key) {
        return get(key) != null;
    }

    // return value associated with the given key, or null if no such key exists
    public Value get(Key key) {
        return get(root, key);
    }

    private Value get(Node x, Key key) {
        if (x == null) return null;
        int cmp = key.compareTo(x.key);
        if      (cmp < 0) return get(x.left, key);
        else if (cmp > 0) return get(x.right, key);
        else              return x.val;
    }

   /***********************************************************************
    *  Insert key-value pair into BST
    *  If key already exists, update with new value
    ***********************************************************************/
    public void put(Key key, Value val) {
        if (val == null) { delete(key); return; }
        root = put(root, key, val);
        assert check();
    }

    private Node put(Node x, Key key, Value val) {
        if (x == null) return new Node(key, val, 1);
        int cmp = key.compareTo(x.key);
        if      (cmp < 0) x.left  = put(x.left,  key, val);
        else if (cmp > 0) x.right = put(x.right, key, val);
        else              x.val   = val;
        x.N = 1 + size(x.left) + size(x.right);
        return x;
    }

   /***********************************************************************
    *  Delete
    ***********************************************************************/

    public void deleteMin() {
        if (isEmpty()) throw new NoSuchElementException("Symbol table underflow");
        root = deleteMin(root);
        assert check();
    }

    private Node deleteMin(Node x) {
        if (x.left == null) return x.right;
        x.left = deleteMin(x.left);
        x.N = size(x.left) + size(x.right) + 1;
        return x;
    }

    public void deleteMax() {
        if (isEmpty()) throw new NoSuchElementException("Symbol table underflow");
        root = deleteMax(root);
        assert check();
    }

    private Node deleteMax(Node x) {
        if (x.right == null) return x.left;
        x.right = deleteMax(x.right);
        x.N = size(x.left) + size(x.right) + 1;
        return x;
    }

    public void delete(Key key) {
        root = delete(root, key);
        assert check();
    }

    private Node delete(Node x, Key key) {
        if (x == null) return null;
        int cmp = key.compareTo(x.key);
        if      (cmp < 0) x.left  = delete(x.left,  key);
        else if (cmp > 0) x.right = delete(x.right, key);
        else { 
            if (x.right == null) return x.left;
            if (x.left  == null) return x.right;
            Node t = x;
            x = min(t.right);
            x.right = deleteMin(t.right);
            x.left = t.left;
        } 
        x.N = size(x.left) + size(x.right) + 1;
        return x;
    } 


   /***********************************************************************
    *  Min, max, floor, and ceiling
    ***********************************************************************/
    public Key min() {
        if (isEmpty()) return null;
        return min(root).key;
    } 

    private Node min(Node x) { 
        if (x.left == null) return x; 
        else                return min(x.left); 
    } 

    public Key max() {
        if (isEmpty()) return null;
        return max(root).key;
    } 

    private Node max(Node x) { 
        if (x.right == null) return x; 
        else                 return max(x.right); 
    } 

    public Key floor(Key key) {
        Node x = floor(root, key);
        if (x == null) return null;
        else return x.key;
    } 

    private Node floor(Node x, Key key) {
        if (x == null) return null;
        int cmp = key.compareTo(x.key);
        if (cmp == 0) return x;
        if (cmp <  0) return floor(x.left, key);
        Node t = floor(x.right, key); 
        if (t != null) return t;
        else return x; 
    } 

    public Key ceiling(Key key) {
        Node x = ceiling(root, key);
        if (x == null) return null;
        else return x.key;
    }

    private Node ceiling(Node x, Key key) {
        if (x == null) return null;
        int cmp = key.compareTo(x.key);
        if (cmp == 0) return x;
        if (cmp < 0) { 
            Node t = ceiling(x.left, key); 
            if (t != null) return t;
            else return x; 
        } 
        return ceiling(x.right, key); 
    } 

   /***********************************************************************
    *  Rank and selection
    ***********************************************************************/
    public Key select(int k) {
        if (k < 0 || k >= size()) return null;
        Node x = select(root, k);
        return x.key;
    }

    // Return key of rank k. 
    private Node select(Node x, int k) {
        if (x == null) return null; 
        int t = size(x.left); 
        if      (t > k) return select(x.left,  k); 
        else if (t < k) return select(x.right, k-t-1); 
        else            return x; 
    } 

    public int rank(Key key) {
        return rank(key, root);
    } 

    // Number of keys in the subtree less than key.
    private int rank(Key key, Node x) {
        if (x == null) return 0; 
        int cmp = key.compareTo(x.key); 
        if      (cmp < 0) return rank(key, x.left); 
        else if (cmp > 0) return 1 + size(x.left) + rank(key, x.right); 
        else              return size(x.left); 
    } 

   /***********************************************************************
    *  Range count and range search.
    ***********************************************************************/
    public Iterable<Key> keys() {
        return keys(min(), max());
    }

    public Iterable<Key> keys(Key lo, Key hi) {
        Queue queue = new Queue();
        keys(root, queue, lo, hi);
        return queue;
    } 

    private void keys(Node x, Queue<Key> queue, Key lo, Key hi) { 
        if (x == null) return; 
        int cmplo = lo.compareTo(x.key); 
        int cmphi = hi.compareTo(x.key); 
        if (cmplo < 0) keys(x.left, queue, lo, hi); 
        if (cmplo <= 0 && cmphi >= 0) queue.enqueue(x.key); 
        if (cmphi > 0) keys(x.right, queue, lo, hi); 
    } 

    public int size(Key lo, Key hi) {
        if (lo.compareTo(hi) > 0) return 0;
        if (contains(hi)) return rank(hi) - rank(lo) + 1;
        else              return rank(hi) - rank(lo);
    }


    // height of this BST (one-node tree has height 0)
    public int height() { return height(root); }
    private int height(Node x) {
        if (x == null) return -1;
        return 1 + Math.max(height(x.left), height(x.right));
    }


    // level order traversal
    public Iterable<Key> levelOrder() {
        Queue<Key> keys = new Queue<Key>();
        Queue<Node> queue = new Queue<Node>();
        queue.enqueue(root);
        while (!queue.isEmpty()) {
            Node x = queue.dequeue();
            if (x == null) continue;
            keys.enqueue(x.key);
            queue.enqueue(x.left);
            queue.enqueue(x.right);
        }
        return keys;
    }

  /*************************************************************************
    *  Check integrity of BST data structure
    *************************************************************************/
    private boolean check() {
        if (!isBST())            StdOut.println("Not in symmetric order");
        if (!isSizeConsistent()) StdOut.println("Subtree counts not consistent");
        if (!isRankConsistent()) StdOut.println("Ranks not consistent");
        return isBST() && isSizeConsistent() && isRankConsistent();
    }

    // does this binary tree satisfy symmetric order?
    // Note: this test also ensures that data structure is a binary tree since order is strict
    private boolean isBST() {
        return isBST(root, null, null);
    }

    // is the tree rooted at x a BST with all keys strictly between min and max
    // (if min or max is null, treat as empty constraint)
    // Credit: Bob Dondero's elegant solution
    private boolean isBST(Node x, Key min, Key max) {
        if (x == null) return true;
        if (min != null && x.key.compareTo(min) <= 0) return false;
        if (max != null && x.key.compareTo(max) >= 0) return false;
        return isBST(x.left, min, x.key) && isBST(x.right, x.key, max);
    } 

    // are the size fields correct?
    private boolean isSizeConsistent() { return isSizeConsistent(root); }
    private boolean isSizeConsistent(Node x) {
        if (x == null) return true;
        if (x.N != size(x.left) + size(x.right) + 1) return false;
        return isSizeConsistent(x.left) && isSizeConsistent(x.right);
    } 

    // check that ranks are consistent
    private boolean isRankConsistent() {
        for (int i = 0; i < size(); i++)
            if (i != rank(select(i))) return false;
        for (Key key : keys())
            if (key.compareTo(select(rank(key))) != 0) return false;
        return true;
    }


   /*****************************************************************************
    *  Test client
    *****************************************************************************/
    public static void main(String[] args) { 
        BST<String, Integer> st = new BST<String, Integer>();
        for (int i = 0; !StdIn.isEmpty(); i++) {
            String key = StdIn.readString();
            st.put(key, i);
        }

        for (String s : st.levelOrder())
            StdOut.println(s + " " + st.get(s));

        StdOut.println();

        for (String s : st.keys())
            StdOut.println(s + " " + st.get(s));
    }
}

Another BST deletion:
http://buttercola.blogspot.com/2014/12/data-structure-algorithms-binary-search.html

Data Structure & Algorithm: Heap

http://algs4.cs.princeton.edu/24pq/

Priority Queue: Remove the largest / smallest item.

APIs:
Priority queue API

PQ elementary implementations: (PQ with N items)
insert delete max max
Unordered array 1 N N
Ordered array N 1 1
Goal logN logN logN

Binary heap: Array representation of a heap-ordered complete binary tree.
Heap-ordered binary tree: Parent's key is no smaller than children's keys.

Binary heap properties:

Largest key is A[1], which is the root of the binary tree (Don't use A[0]).
Parent of node k is k / 2
Children of node k is 2 * k and k * k + 1

Algorithms on heaps:
We represent a heap of size N in private array pq[] of length N+1, with pq[0] unused and the heap in pq[1] through pq[N]. We access keys only through private helper functions less() and exch(). The heap operations that we consider work by first making a simple modification that could violate the heap condition, then traveling through the heap, modifying the heap as required to ensure that the heap condition is satisfied everywhere. We refer to this process as reheapifying, or restoring heap order.

Bottom-up reheapify (swim). If the heap order is violated because a node's key becomes larger than that node's parents key, then we can make progress toward fixing the violation by exchanging the node with its parent. After the exchange, the node is larger than both its children (one is the old parent, and the other is smaller than the old parent because it was a child of that node) but the node may still be larger than its parent. We can fix that violation in the same way, and so forth, moving up the heap until we reach a node with a larger key, or the root.

private void swim(int k) {
   while (k > 1 && less(k/2, k)) {
      exch(k, k/2);
      k = k/2;
   }
}

insert. We add the new item at the end of the array, increment the size of the heap, and then swim up through the heap with that item to restore the heap condition.

Complexity O(logn).

    /**
     * Adds a new key to the priority queue.
     * @param x the new key to add to the priority queue
     */
    public void insert(Key x) {

        // double size of array if necessary
        if (N >= pq.length - 1) resize(2 * pq.length);

        // add x, and percolate it up to maintain heap invariant
        pq[++N] = x;
        swim(N);
        assert isMaxHeap();
    }

Top-down heapify (sink). If the heap order is violated because a node's key becomes smaller than one or both of that node's children's keys, then we can make progress toward fixing the violation by exchanging the node with the larger of its two children. This switch may cause a violation at the child; we fix that violation in the same way, and so forth, moving down the heap until we reach a node with both children smaller, or the bottom.

private void sink(int k) {
   while (2*k <= N) {
      int j = 2*k;
      if (j < N && less(j, j+1)) j++;
      if (!less(k, j)) break;
      exch(k, j);
      k = j;
   }
}

Remove the maximum. Get the root. Exchange the root with the node at end. Then sink it down.

Complexity: O(logn).

    /**
     * Removes and returns a largest key on the priority queue.
     * @return a largest key on the priority queue
     * @throws java.util.NoSuchElementException if priority queue is empty.
     */
    public Key delMax() {
        if (isEmpty()) throw new NoSuchElementException("Priority queue underflow");
        Key max = pq[1];
        exch(1, N--);
        sink(1);
        pq[N+1] = null;     // to avoid loiterig and help with garbage collection
        if ((N > 0) && (N == (pq.length - 1) / 4)) resize(pq.length / 2);
        assert isMaxHeap();
        return max;
    }

Entire algorithms to build a binary heap:

 
/*************************************************************************
 *  Compilation:  javac MaxPQ.java
 *  Execution:    java MaxPQ < input.txt
 *  
 *  Generic max priority queue implementation with a binary heap.
 *  Can be used with a comparator instead of the natural order,
 *  but the generic Key type must still be Comparable.
 *
 *  % java MaxPQ < tinyPQ.txt 
 *  Q X P (6 left on pq)
 *
 *  We use a one-based array to simplify parent and child calculations.
 *
 *  Can be optimized by replacing full exchanges with half exchanges 
 *  (ala insertion sort).
 *
 *************************************************************************/

import java.util.Comparator;
import java.util.Iterator;
import java.util.NoSuchElementException;

/**
 *  The MaxPQ class represents a priority queue of generic keys.
 *  It supports the usual insert and delete-the-maximum
 *  operations, along with methods for peeking at the maximum key,
 *  testing if the priority queue is empty, and iterating through
 *  the keys.
 *  *  This implementation uses a binary heap.
 *  The insert and delete-the-maximum operations take
 *  logarithmic amortized time.
 *  The max, size, and is-empty operations take constant time.
 *  Construction takes time proportional to the specified capacity or the number of
 *  items used to initialize the data structure.
 *  

*  For additional documentation, see Section 2.4 of
 *  Algorithms, 4th Edition by Robert Sedgewick and Kevin Wayne.
 *
 *  @author Robert Sedgewick
 *  @author Kevin Wayne
 */

public class MaxPQ<Key> implements Iterable<Key> {
    private Key[] pq;                    // store items at indices 1 to N
    private int N;                       // number of items on priority queue
    private Comparator<Key> comparator;  // optional Comparator

    /**
     * Initializes an empty priority queue with the given initial capacity.
     * @param initCapacity the initial capacity of the priority queue
     */
    public MaxPQ(int initCapacity) {
        pq = (Key[]) new Object[initCapacity + 1];
        N = 0;
    }

    /**
     * Initializes an empty priority queue.
     */
    public MaxPQ() {
        this(1);
    }

    /**
     * Initializes an empty priority queue with the given initial capacity,
     * using the given comparator.
     * @param initCapacity the initial capacity of the priority queue
     * @param comparator the order in which to compare the keys
     */
    public MaxPQ(int initCapacity, Comparator<Key> comparator) {
        this.comparator = comparator;
        pq = (Key[]) new Object[initCapacity + 1];
        N = 0;
    }

    /**
     * Initializes an empty priority queue using the given comparator.
     * @param comparator the order in which to compare the keys
     */
    public MaxPQ(Comparator<Key> comparator) {
        this(1, comparator);
    }

    /**
     * Initializes a priority queue from the array of keys.
     * Takes time proportional to the number of keys, using sink-based heap construction.
     * @param keys the array of keys
     */
    public MaxPQ(Key[] keys) {
        N = keys.length;
        pq = (Key[]) new Object[keys.length + 1]; 
        for (int i = 0; i < N; i++)
            pq[i+1] = keys[i];
        for (int k = N/2; k >= 1; k--)
            sink(k);
        assert isMaxHeap();
    }
      


    /**
     * Is the priority queue empty?
     * @return true if the priority queue is empty; false otherwise
     */
    public boolean isEmpty() {
        return N == 0;
    }

    /**
     * Returns the number of keys on the priority queue.
     * @return the number of keys on the priority queue
     */
    public int size() {
        return N;
    }

    /**
     * Returns a largest key on the priority queue.
     * @return a largest key on the priority queue
     * @throws java.util.NoSuchElementException if the priority queue is empty
     */
    public Key max() {
        if (isEmpty()) throw new NoSuchElementException("Priority queue underflow");
        return pq[1];
    }

    // helper function to double the size of the heap array
    private void resize(int capacity) {
        assert capacity > N;
        Key[] temp = (Key[]) new Object[capacity];
        for (int i = 1; i <= N; i++) temp[i] = pq[i];
        pq = temp;
    }


    /**
     * Adds a new key to the priority queue.
     * @param x the new key to add to the priority queue
     */
    public void insert(Key x) {

        // double size of array if necessary
        if (N >= pq.length - 1) resize(2 * pq.length);

        // add x, and percolate it up to maintain heap invariant
        pq[++N] = x;
        swim(N);
        assert isMaxHeap();
    }

    /**
     * Removes and returns a largest key on the priority queue.
     * @return a largest key on the priority queue
     * @throws java.util.NoSuchElementException if priority queue is empty.
     */
    public Key delMax() {
        if (isEmpty()) throw new NoSuchElementException("Priority queue underflow");
        Key max = pq[1];
        exch(1, N--);
        sink(1);
        pq[N+1] = null;     // to avoid loiterig and help with garbage collection
        if ((N > 0) && (N == (pq.length - 1) / 4)) resize(pq.length / 2);
        assert isMaxHeap();
        return max;
    }


   /***********************************************************************
    * Helper functions to restore the heap invariant.
    **********************************************************************/

    private void swim(int k) {
        while (k > 1 && less(k/2, k)) {
            exch(k, k/2);
            k = k/2;
        }
    }

    private void sink(int k) {
        while (2*k <= N) {
            int j = 2*k;
            if (j < N && less(j, j+1)) j++;
            if (!less(k, j)) break;
            exch(k, j);
            k = j;
        }
    }

   /***********************************************************************
    * Helper functions for compares and swaps.
    **********************************************************************/
    private boolean less(int i, int j) {
        if (comparator == null) {
            return ((Comparable) pq[i]).compareTo(pq[j]) < 0;
        }
        else {
            return comparator.compare(pq[i], pq[j]) < 0;
        }
    }

    private void exch(int i, int j) {
        Key swap = pq[i];
        pq[i] = pq[j];
        pq[j] = swap;
    }

    // is pq[1..N] a max heap?
    private boolean isMaxHeap() {
        return isMaxHeap(1);
    }

    // is subtree of pq[1..N] rooted at k a max heap?
    private boolean isMaxHeap(int k) {
        if (k > N) return true;
        int left = 2*k, right = 2*k + 1;
        if (left  <= N && less(k, left))  return false;
        if (right <= N && less(k, right)) return false;
        return isMaxHeap(left) && isMaxHeap(right);
    }


   /***********************************************************************
    * Iterator
    **********************************************************************/

    /**
     * Returns an iterator that iterates over the keys on the priority queue
     * in descending order.
     * The iterator doesn't implement remove() since it's optional.
     * @return an iterator that iterates over the keys in descending order
     */
    public Iterator<Key> iterator() { return new HeapIterator(); }

    private class HeapIterator implements Iterator<Key> {

        // create a new pq
        private MaxPQ<Key> copy;

        // add all items to copy of heap
        // takes linear time since already in heap order so no keys move
        public HeapIterator() {
            if (comparator == null) copy = new MaxPQ<Key>(size());
            else                    copy = new MaxPQ<Key>(size(), comparator);
            for (int i = 1; i <= N; i++)
                copy.insert(pq[i]);
        }

        public boolean hasNext()  { return !copy.isEmpty();                     }
        public void remove()      { throw new UnsupportedOperationException();  }

        public Key next() {
            if (!hasNext()) throw new NoSuchElementException();
            return copy.delMax();
        }
    }

    /**
     * Unit tests the MaxPQ data type.
     */
    public static void main(String[] args) {
        MaxPQ<String> pq = new MaxPQ<String>();
        while (!StdIn.isEmpty()) {
            String item = StdIn.readString();
            if (!item.equals("-")) pq.insert(item);
            else if (!pq.isEmpty()) StdOut.print(pq.delMax() + " ");
        }
        StdOut.println("(" + pq.size() + " left on pq)");
    }

}

Thursday, December 11, 2014

Java: Static class / static variable / static method / static block

http://www.geeksforgeeks.org/static-class-in-java/
http://www.geeksforgeeks.org/static-keyword-in-java/
http://www.geeksforgeeks.org/g-fact-79/

1. Static class in Java:
Can a class be static in Java ?
The answer is YES, we can have static class in java. In java, we have static instance variables as well as static methods and also static block. Classes can also be made static in Java.

Java allows us to define a class within another class. Such a class is called a nested class. The class which enclosed nested class is known as Outer class. In java, we can’t make Top level class static. Only nested classes can be static.

What are the differences between static and non-static nested classes?
Following are major differences between static nested class and non-static nested class. Non-static nested class is also called Inner Class.

Nested static class doesn’t need reference of Outer class, but Non-static nested class or Inner class requires Outer class reference.
Inner class(or non-static nested class) can access both static and non-static members of Outer class. A static class cannot access non-static members of the Outer class. It can access only static members of Outer class.
An instance of Inner class cannot be created without an instance of outer class and an Inner class can reference data and methods defined in Outer class in which it nests, so we don’t need to pass reference of an object to the constructor of the Inner class. For this reason Inner classes can make program simple and concise.

Code example:

/* Java program to demonstrate how to implement static and non-static
   classes in a java program. */
class OuterClass{
   private static String msg = "GeeksForGeeks";
    
   // Static nested class
   public static class NestedStaticClass{
      
       // Only static members of Outer class is directly accessible in nested 
       // static class 
       public void printMessage() {
 
         // Try making 'message' a non-static variable, there will be 
         // compiler error  
         System.out.println("Message from nested static class: " + msg); 
       }
    }
    
    // non-static nested class - also called Inner class
    public class InnerClass{
        
       // Both static and non-static members of Outer class are accessible in 
       // this Inner class
       public void display(){
          System.out.println("Message from non-static nested class: "+ msg);
       }
    }
} 
class Main
{
    // How to create instance of static and non static nested class?
    public static void main(String args[]){
        
       // create instance of nested Static class
       OuterClass.NestedStaticClass printer = new OuterClass.NestedStaticClass();
        
       // call non static method of nested static class
       printer.printMessage();   
  
       // In order to create instance of Inner class we need an Outer class 
       // instance. Let us create Outer class instance for creating 
       // non-static nested class
       OuterClass outer = new OuterClass();        
       OuterClass.InnerClass inner  = outer.new InnerClass();
        
       // calling non-static method of Inner class
       inner.display();
        
       // we can also combine above steps in one step to create instance of 
       // Inner class
       OuterClass.InnerClass innerObject = new OuterClass().new InnerClass();
        
       // similarly we can now call Inner class method
       innerObject.display();
    }
}

2. Static data members
Static keyword is used for almost same purpose in both C++ and Java. There are some differences though. This post covers similarities and differences of static keyword in C++ and Java.

Like C++, static data members in Java are class members and shared among all objects. For example, in the following Java program, static variable count is used to count the number of objects created.

Example:

class Test {
    static int count = 0;
 
    Test() { 
       count++; 
    }    
    public static void main(String arr[]) {
       Test t1 = new Test();
       Test t2 = new Test();
       System.out.println("Total " + count + " objects created");        
    }
}

Output:

Total 2 objects created

3. Static member methods
Like C++, static data members and static methods can be accessed without creating an object. They can be accessed using class name. For example, in the following program, static data member count and static method fun() are accessed without any object.

Code (Java):

class Test {
    static int count = 0;
    public static void fun() { 
       System.out.println("Static fun() called"); 
    }    
}   
      
class Main
{
    public static void main(String arr[]) {
       System.out.println("Test.count = " + Test.count);        
       Test.fun();
    }
}

So the interesting question is: when to use static member methods?
From http://stackoverflow.com/questions/2671496/java-when-to-use-static-methods

One rule-of-thumb: ask yourself "does it make sense to call this method, even if no Obj has been constructed yet?" If so, it should definitely be static.

(Btw, the converse isn't always true: you might sometimes have a method which involves two Car objects, and still want it to be static. E.g. Car theMoreEfficientOf( Car c1, Car c2 ). Although this could be converted to a non-static version, some would argue that since there isn't a "privileged" choice of which Car is more important, you shouldn't force a caller to choose one Car as the object you'll invoke the method on. This situation accounts for a fairly small fraction of all static methods, though.)

So in a class Car you might have a method double convertMpgToKpl(double mpg) which would be static, because one might want to know what 35mpg converts to, even if nobody has ever built a Car. But void setMileage(double mpg) (which sets the efficiency of one particular Car) can't be static since it's inconceivable to call the method before any Car has been constructed.

In addition, static methods have some constraints.
1) They can only call other static methods. For example, the following program fails in compilation. fun() is non-static and it is called in static main()

Code (Java):

class Main {
    public static void main(String args[]) {   
        System.out.println(fun());
    } 
    int fun() {
        return 20;
    } 
}

2) They must only access static data.
3) They cannot access this or super . For example, the following program fails in compilation.

Code (Java):

class Base {
    static int x = 0;       
}   
 
class Derived extends Base 
{
   public static void fun() {
       System.out.println(super.x); // Compiler Error: non-static variable 
                                  // cannot be referenced from a static context
   }   
}

4. Static local variables
Unlike C++, Java doesn’t support static local variables. For example, the following Java program fails in compilation.

Example (Java):

class Test {
   public static void main(String args[]) { 
     System.out.println(fun());
   } 
   static int fun()  {
     static int x= 10; //Compiler Error: Static local variables are not allowed
     return x--;
   }
}

5. Static blocks in Java:
http://www.geeksforgeeks.org/g-fact-79/

Unlike C++, Java supports a special block, called static block (also called static clause) which can be used for static initializations of a class. This code inside static block is executed only once: the first time you make an object of that class or the first time you access a static member of that class (even if you never make an object of that class). For example, check output of following Java program.

// filename: Main.java
class Test {
    static int i;
    int j;
     
    // start of static block 
    static {
        i = 10;
        System.out.println("static block called ");
    }
    // end of static block 
}
 
class Main {
    public static void main(String args[]) {
 
        // Although we don't have an object of Test, static block is 
        // called because i is being accessed in following statement.
        System.out.println(Test.i); 
    }
}

Output:
static block called
10

Also, static blocks are executed before constructors. For example, check output of following Java program.

// filename: Main.java
class Test {
    static int i;
    int j;
    static {
        i = 10;
        System.out.println("static block called ");
    }
    Test(){
        System.out.println("Constructor called");
    }
}
 
class Main {
    public static void main(String args[]) {
 
       // Although we have two objects, static block is executed only once.
       Test t1 = new Test();
       Test t2 = new Test();
    }
}

Output:
static block called
Constructor called
Constructor called

Wednesday, December 10, 2014

System Design: What happened if you tap a URL at a browser?

原文:http://igoro.com/archive/what-really-happens-when-you-navigate-to-a-url/

作为一个软件开发者，你一定会对网络应用如何工作有一个完整的层次化的认知，同样这里也包括这些应用所用到的技术：像浏览器，HTTP，HTML，网络服务器，需求处理等等。

本文将更深入的研究当你输入一个网址的时候，后台到底发生了一件件什么样的事～

1. 首先嘛，你得在浏览器里输入要网址:

2. 浏览器查找域名的IP地址

导航的第一步是通过访问的域名找出其IP地址。DNS查找过程如下：

浏览器缓存 – 浏览器会缓存DNS记录一段时间。有趣的是，操作系统没有告诉浏览器储存DNS记录的时间，这样不同浏览器会储存个自固定的一个时间（2分钟到30分钟不等）。
系统缓存 – 如果在浏览器缓存里没有找到需要的记录，浏览器会做一个系统调用（windows里是gethostbyname）。这样便可获得系统缓存中的记录。
路由器缓存 – 接着，前面的查询请求发向路由器，它一般会有自己的DNS缓存。
ISP DNS 缓存 – 接下来要check的就是ISP缓存DNS的服务器。在这一般都能找到相应的缓存记录。
递归搜索 – 你的ISP的DNS服务器从跟域名服务器开始进行递归搜索，从.com顶级域名服务器到Facebook的域名服务器。一般DNS服务器的缓存中会有.com域名服务器中的域名，所以到顶级服务器的匹配过程不是那么必要了。

DNS递归查找如下图所示：

500px-An_example_of_theoretical_DNS_recursion_svg

DNS有一点令人担忧，这就是像wikipedia.org 或者 facebook.com这样的整个域名看上去只是对应一个单独的IP地址。还好，有几种方法可以消除这个瓶颈：

循环 DNS 是DNS查找时返回多个IP时的解决方案。举例来说，Facebook.com实际上就对应了四个IP地址。
负载平衡器 是以一个特定IP地址进行侦听并将网络请求转发到集群服务器上的硬件设备。一些大型的站点一般都会使用这种昂贵的高性能负载平衡器。
地理 DNS 根据用户所处的地理位置，通过把域名映射到多个不同的IP地址提高可扩展性。这样不同的服务器不能够更新同步状态，但映射静态内容的话非常好。
Anycast 是一个IP地址映射多个物理主机的路由技术。美中不足，Anycast与TCP协议适应的不是很好，所以很少应用在那些方案中。

大多数DNS服务器使用Anycast来获得高效低延迟的DNS查找。

3. 浏览器给web服务器发送一个HTTP请求

因为像Facebook主页这样的动态页面，打开后在浏览器缓存中很快甚至马上就会过期，毫无疑问他们不能从中读取。

所以，浏览器将把一下请求发送到Facebook所在的服务器：

GET http://facebook.com/ HTTP/1.1
 Accept: application/x-ms-application, image/jpeg, application/xaml+xml, [...]
 User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...]
 Accept-Encoding: gzip, deflate
 Connection: Keep-Alive
 Host: facebook.com
 Cookie: datr=1265876274-[...]; locale=en_US; lsd=WW[...]; c_user=2101[...]

GET 这个请求定义了要读取的URL： “http://facebook.com/”。浏览器自身定义 (User-Agent 头)，和它希望接受什么类型的相应 (Accept andAccept-Encoding 头). Connection头要求服务器为了后边的请求不要关闭TCP连接。

请求中也包含浏览器存储的该域名的cookies。可能你已经知道，在不同页面请求当中，cookies是与跟踪一个网站状态相匹配的键值。这样cookies会存储登录用户名，服务器分配的密码和一些用户设置等。Cookies会以文本文档形式存储在客户机里，每次请求时发送给服务器。

用来看原始HTTP请求及其相应的工具很多。作者比较喜欢使用fiddler，当然也有像FireBug这样其他的工具。这些软件在网站优化时会帮上很大忙。

除了获取请求，还有一种是发送请求，它常在提交表单用到。发送请求通过URL传递其参数(e.g.: http://robozzle.com/puzzle.aspx?id=85)。发送请求在请求正文头之后发送其参数。

像“http://facebook.com/”中的斜杠是至关重要的。这种情况下，浏览器能安全的添加斜杠。而像“http: //example.com/folderOrFile”这样的地址，因为浏览器不清楚folderOrFile到底是文件夹还是文件，所以不能自动添加斜杠。这时，浏览器就不加斜杠直接访问地址，服务器会响应一个重定向，结果造成一次不必要的握手。

4. facebook服务的永久重定向响应

图中所示为Facebook服务器发回给浏览器的响应：

HTTP/1.1 301 Moved Permanently
 Cache-Control: private, no-store, no-cache, must-revalidate, post-check=0,
 pre-check=0
 Expires: Sat, 01 Jan 2000 00:00:00 GMT
 Location: http://www.facebook.com/
 P3P: CP="DSP LAW"
 Pragma: no-cache
 Set-Cookie: made_write_conn=deleted; expires=Thu, 12-Feb-2009 05:09:50 GMT;
 path=/; domain=.facebook.com; httponly
 Content-Type: text/html; charset=utf-8
 X-Cnection: close
 Date: Fri, 12 Feb 2010 05:09:51 GMT
 Content-Length: 0

服务器给浏览器响应一个301永久重定向响应，这样浏览器就会访问“http://www.facebook.com/” 而非“http://facebook.com/”。

为什么服务器一定要重定向而不是直接发会用户想看的网页内容呢？这个问题有好多有意思的答案。

其中一个原因跟搜索引擎排名有关。你看，如果一个页面有两个地址，就像http://www.igoro.com/ 和http://igoro.com/，搜索引擎会认为它们是两个网站，结果造成每一个的搜索链接都减少从而降低排名。而搜索引擎知道301永久重定向是什么意思，这样就会把访问带www的和不带www的地址归到同一个网站排名下。

还有一个是用不同的地址会造成缓存友好性变差。当一个页面有好几个名字时，它可能会在缓存里出现好几次。

5. 浏览器跟踪重定向地址

现在，浏览器知道了“http://www.facebook.com/”才是要访问的正确地址，所以它会发送另一个获取请求：

GET http://www.facebook.com/ HTTP/1.1
 Accept: application/x-ms-application, image/jpeg, application/xaml+xml, [...]
 Accept-Language: en-US
 User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...]
 Accept-Encoding: gzip, deflate
 Connection: Keep-Alive
 Cookie: lsd=XW[...]; c_user=21[...]; x-referer=[...]
 Host: www.facebook.com

头信息以之前请求中的意义相同。

6. 服务器“处理”请求

服务器接收到获取请求，然后处理并返回一个响应。

这表面上看起来是一个顺向的任务，但其实这中间发生了很多有意思的东西- 就像作者博客这样简单的网站，何况像facebook那样访问量大的网站呢！

Web 服务器软件web服务器软件（像IIS和阿帕奇）接收到HTTP请求，然后确定执行什么请求处理来处理它。请求处理就是一个能够读懂请求并且能生成HTML来进行响应的程序（像ASP.NET,PHP,RUBY...）。
举个最简单的例子，需求处理可以以映射网站地址结构的文件层次存储。像http://example.com/folder1/page1.aspx这个地址会映射/httpdocs/folder1/page1.aspx这个文件。web服务器软件可以设置成为地址人工的对应请求处理，这样 page1.aspx的发布地址就可以是http://example.com/folder1/page1。
请求处理请求处理阅读请求及它的参数和cookies。它会读取也可能更新一些数据，并讲数据存储在服务器上。然后，需求处理会生成一个HTML响应。

所有动态网站都面临一个有意思的难点 -如何存储数据。小网站一半都会有一个SQL数据库来存储数据，存储大量数据和/或访问量大的网站不得不找一些办法把数据库分配到多台机器上。解决方案有：sharding （基于主键值讲数据表分散到多个数据库中），复制，利用弱语义一致性的简化数据库。

委托工作给批处理是一个廉价保持数据更新的技术。举例来讲，Fackbook得及时更新新闻feed，但数据支持下的“你可能认识的人”功能只需要每晚更新（作者猜测是这样的，改功能如何完善不得而知）。批处理作业更新会导致一些不太重要的数据陈旧，但能使数据更新耕作更快更简洁。

7. 服务器发回一个HTML响应

图中为服务器生成并返回的响应：

HTTP/1.1 200 OK
 Cache-Control: private, no-store, no-cache, must-revalidate, post-check=0,
 pre-check=0
 Expires: Sat, 01 Jan 2000 00:00:00 GMT
 P3P: CP="DSP LAW"
 Pragma: no-cache
 Content-Encoding: gzip
 Content-Type: text/html; charset=utf-8
 X-Cnection: close
 Transfer-Encoding: chunked
 Date: Fri, 12 Feb 2010 09:05:55 GMT
 
 2b3Tn@[...]

整个响应大小为35kB，其中大部分在整理后以blob类型传输。

内容编码头告诉浏览器整个响应体用gzip算法进行压缩。解压blob块后，你可以看到如下期望的HTML：

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"    
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"
 lang="en" id="facebook" class=" no_js">
 <head>
 <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
 <meta http-equiv="Content-language" content="en" />
 ...

关于压缩，头信息说明了是否缓存这个页面，如果缓存的话如何去做，有什么cookies要去设置（前面这个响应里没有这点）和隐私信息等等。

请注意报头中把Content-type设置为“text/html”。报头让浏览器将该响应内容以HTML形式呈现，而不是以文件形式下载它。浏览器会根据报头信息决定如何解释该响应，不过同时也会考虑像URL扩展内容等其他因素。

8. 浏览器开始显示HTML

在浏览器没有完整接受全部HTML文档时，它就已经开始显示这个页面了：

9. 浏览器发送获取嵌入在HTML中的对象

在浏览器显示HTML时，它会注意到需要获取其他地址内容的标签。这时，浏览器会发送一个获取请求来重新获得这些文件。

下面是几个我们访问facebook.com时需要重获取的几个URL：

图片http://static.ak.fbcdn.net/rsrc.php/z12E0/hash/8q2anwu7.gif
http://static.ak.fbcdn.net/rsrc.php/zBS5C/hash/7hwy7at6.gif
…
CSS 式样表http://static.ak.fbcdn.net/rsrc.php/z448Z/hash/2plh8s4n.css
http://static.ak.fbcdn.net/rsrc.php/zANE1/hash/cvtutcee.css
…
JavaScript 文件
http://static.ak.fbcdn.net/rsrc.php/zEMOA/hash/c8yzb6ub.js
http://static.ak.fbcdn.net/rsrc.php/z6R9L/hash/cq2lgbs8.js
…

这些地址都要经历一个和HTML读取类似的过程。所以浏览器会在DNS中查找这些域名，发送请求，重定向等等...

但不像动态页面那样，静态文件会允许浏览器对其进行缓存。有的文件可能会不需要与服务器通讯，而从缓存中直接读取。服务器的响应中包含了静态文件保存的期限信息，所以浏览器知道要把它们缓存多长时间。还有，每个响应都可能包含像版本号一样工作的ETag头（被请求变量的实体值），如果浏览器观察到文件的版本 ETag信息已经存在，就马上停止这个文件的传输。

试着猜猜看“fbcdn.net”在地址中代表什么？聪明的答案是"Facebook内容分发网络"。Facebook利用内容分发网络（CDN）分发像图片，CSS表和JavaScript文件这些静态文件。所以，这些文件会在全球很多CDN的数据中心中留下备份。

静态内容往往代表站点的带宽大小，也能通过CDN轻松的复制。通常网站会使用第三方的CDN。例如，Facebook的静态文件由最大的CDN提供商Akamai来托管。

举例来讲，当你试着ping static.ak.fbcdn.net的时候，可能会从某个akamai.net服务器上获得响应。有意思的是，当你同样再ping一次的时候，响应的服务器可能就不一样，这说明幕后的负载平衡开始起作用了。

10. 浏览器发送异步（AJAX）请求

在Web 2.0伟大精神的指引下，页面显示完成后客户端仍与服务器端保持着联系。

以 Facebook聊天功能为例，它会持续与服务器保持联系来及时更新你那些亮亮灰灰的好友状态。为了更新这些头像亮着的好友状态，在浏览器中执行的 JavaScript代码会给服务器发送异步请求。这个异步请求发送给特定的地址，它是一个按照程式构造的获取或发送请求。还是在Facebook这个例子中，客户端发送给http://www.facebook.com/ajax/chat/buddy_list.php一个发布请求来获取你好友里哪个在线的状态信息。

提起这个模式，就必须要讲讲"AJAX"-- “异步JavaScript 和 XML”，虽然服务器为什么用XML格式来进行响应也没有个一清二白的原因。再举个例子吧，对于异步请求，Facebook会返回一些JavaScript的代码片段。

除了其他，fiddler这个工具能够让你看到浏览器发送的异步请求。事实上，你不仅可以被动的做为这些请求的看客，还能主动出击修改和重新发送它们。AJAX请求这么容易被蒙，可着实让那些计分的在线游戏开发者们郁闷的了。（当然，可别那样骗人家~）

Facebook聊天功能提供了关于AJAX一个有意思的问题案例：把数据从服务器端推送到客户端。因为HTTP是一个请求-响应协议，所以聊天服务器不能把新消息发给客户。取而代之的是客户端不得不隔几秒就轮询下服务器端看自己有没有新消息。

这些情况发生时长轮询是个减轻服务器负载挺有趣的技术。如果当被轮询时服务器没有新消息，它就不理这个客户端。而当尚未超时的情况下收到了该客户的新消息，服务器就会找到未完成的请求，把新消息做为响应返回给客户端。

总结一下

希望看了本文，你能明白不同的网络模块是如何协同工作的

分享是最好的记忆

Wednesday, December 3, 2014

Data Structure and Algorithm: Trie (search and insert)

http://www.geeksforgeeks.org/trie-insert-and-search/

Trie is an efficient information retrieval data structure. Using trie, search complexities can be brought to optimal limit (key length). If we store keys in binary search tree, a well balanced BST will need time proportional to M * log N, where M is maximum string length and N is number of keys in tree. Using trie, we can search the key in O(M) time. However the penalty is on trie storage requirements.

Every node of trie consists of multiple branches. Each branch represents a possible character of keys. We need to mark the last node of every key as leaf node. A trie node field value will be used to distinguish the node as leaf node (there are other uses of the value field). A simple structure to represent nodes of English alphabet can be as following,

struct trie_node

{

int value; /* Used to mark leaf nodes */

trie_node_t *children[ALPHABET_SIZE];

};

Inserting a key into trie is simple approach. Every character of input key is inserted as an individual trie node. Note that the children is an array of pointers to next level trie nodes. The key character acts as an index into the array children. If the input key is new or an extension of existing key, we need to construct non-existing nodes of the key, and mark leaf node. If the input key is prefix of existing key in trie, we simply mark the last node of key as leaf. The key length determines trie depth.

Searching for a key is similar to insert operation, however we only compare the characters and move down. The search can terminate due to end of string or lack of key in trie. In the former case, if the value field of last node is non-zero then the key exists in trie. In the second case, the search terminates without examining all the characters of key, since the key is not present in trie.

The following picture explains construction of trie using keys given in the example below,

                       root
                    /   \    \
                    t   a     b
                    |   |     |
                    h   n     y
                    |   |  \  |
                    e   s  y  e
                 /  |   |
                 i  r   w
                 |  |   |
                 r  e   e
                        |
                        r

In the picture, every character is of type trie_node_t. For example, the root is of type trie_node_t, and it’s children a, b and t are filled, all other nodes of root will be NULL. Similarly, “a” at the next level is having only one child (“n”), all other children are NULL. The leaf nodes are in blue.

Insert and search costs O(key_length), however the memory requirements of trie isO(ALPHABET_SIZE * key_length * N) where N is number of keys in trie. There are efficient representation of trie nodes (e.g. compressed trie, ternary search tree, etc.) to minimize memory requirements of trie.

Code (Java):
From this website: http://pathakalgo.blogspot.com/2012/11/trie-data-structure-implementation-in.html

public class PrefixTree
{
    static TrieNode createTree()
    {
        return(new TrieNode('\0', false));
    }
    
    static void insertWord(TrieNode root, String word)
    {
        int offset = 97;
        int l = word.length();
        char[] letters = word.toCharArray();
        TrieNode curNode = root;
        
        for (int i = 0; i < l; i++)
        {
            if (curNode.links[letters[i]-offset] == null)
                curNode.links[letters[i]-offset] = new TrieNode(letters[i], i == l-1 ? true : false);
            curNode = curNode.links[letters[i]-offset];
        }
    }

    static boolean find(TrieNode root, String word)
    {
        char[] letters = word.toCharArray();
        int l = letters.length;
        int offset = 97;
        TrieNode curNode = root;
        
        int i;
        for (i = 0; i < l; i++)
        {
            if (curNode == null)
                return false;
            curNode = curNode.links[letters[i]-offset];
        }
        
        if (i == l && curNode == null)
            return false;
        
        if (curNode != null && !curNode.fullWord)
            return false;
        
        return true;
    }
    
    static void printTree(TrieNode root, int level, char[] branch)
    {
        if (root == null)
            return;
        
        for (int i = 0; i < root.links.length; i++)
        {
            branch[level] = root.letter;
            printTree(root.links[i], level+1, branch);    
        }
        
        if (root.fullWord)
        {
            for (int j = 1; j <= level; j++)
                System.out.print(branch[j]);
            System.out.println();
        }
    }
    
    public static void main(String[] args)
    {
        TrieNode tree = createTree();
        
        String[] words = {"an", "ant", "all", "allot", "alloy", "aloe", "are", "ate", "be".};
        for (int i = 0; i < words.length; i++)
            insertWord(tree, words[i]);
        
        char[] branch = new char[50];
        printTree(tree, 0, branch);
        
        String searchWord = "all";
        if (find(tree, searchWord))
        {
            System.out.println("The word was found");
        }
        else
        {
            System.out.println("The word was NOT found");
        }
    }
}

class TrieNode
{
    char letter;
    TrieNode[] links;
    boolean fullWord;
    
    TrieNode(char letter, boolean fullWord)
    {
        this.letter = letter;
        links = new TrieNode[26];
        this.fullWord = fullWord;
    }
}

Tuesday, December 2, 2014

Data Structure and Algorithm: Back pack problem

Backpack
1. 0 / 1 back pack problem.
Given N non-negative integers and a target, Can we pick up any numbers from the N integers and its sum equals to the target. Return true of false

f[n + 1][Target]
f[i][S] Given First i numbers, can we pick up some numbers and their sums equal to S
Transit function: f[i][S] = f[i - 1][S- a[i]] (picked A[i])or f[i - 1][S] (not picked A[i])
Initial state: f[0][0] = true; f[0][1 ... SUM] = false;
Final state: f[n][target] .

Code (Java):

public class Solution {
    public boolean backpack(int[] A, int target) {
        if (A == null || A.length == 0 || target < 0) {
            return false;
        }
        
        boolean[][] dp = new boolean[A.length + 1][target + 1];
        dp[0][0] = true;
        
        for (int i = 1; i <= A.length; i++) {
            for (int j = 0; j <= target; j++) {
                dp[i][j] = dp[i - 1][j];
                if (j - A[i - 1] >= 0) {
                    dp[i][j] = dp[i][j] || dp[i - 1][j - A[i - 1]];
                }
            }
        }
        return dp[A.length][target];
    }

    public static void main(String[] args) {
        Solution sol = new Solution();
        int[] A = new int[]{2,3,5,7};
        int target = 17;

        System.out.println(sol.backpack(A, target));
    
    }
}

Note that target could equal to 0.

2. Lintcode: 0/1 Back pack II

13%

Accepted

Given n items with size A[i], an integer m denotes the size of a backpack. How full you can fill this backpack?

Note

You can not divide any item into small pieces.

Example

If we have 4 items with size [2, 3, 5, 7], the backpack size is 11, we can select 2, 3 and 5, so that the max size we can fill this backpack is 10. If the backpack size is 12. we can select [2, 3, 7] so that we can fulfill the backpack.

You function should return the max size we can fill in the given backpack.

Understand the problem:
It is another 0/1 back pack problem. The crux of the problem is to pick up the item from the array of which the sum is cloest to the target value.

So it is another DP problem, and we could define the DP functions.

dp[A.length() + 1][m + 1], where dp[i][j] means the sum that pick some from first i items of which the sum are cloest to j.
Initial state: all 0s, so no need to initialize.
Transit function: dp[i][j] = Math.max(dp[i - 1][j], dp[i - 1][j - A[i]] + A[i]);
Final state: dp[A.length()][m].

Code (Java):

public class Solution {
    /**
     * @param m: An integer m denotes the size of a backpack
     * @param A: Given n items with size A[i]
     */
    public int backPack(int m, ArrayList<integer> A) {
        // write your code here
        if (A == null || A.size() == 0 || m < 0) {
            return 0;
        }
        
        int[][] dp = new int[A.size() + 1][m + 1];
        
        
        // initialization
        // no needed since all 0s
        
        for (int i = 1; i <= A.size(); i++) {
            for (int j = 0; j <= m; j++) {
                int noPick = dp[i - 1][j];
                int pick = 0;
                
                if (j - A.get(i - 1) >= 0) {
                    pick = dp[i - 1][j - A.get(i - 1)] + A.get(i - 1);
                }
                
                dp[i][j] = Math.max(pick, noPick);
            }
        }
        
        return dp[A.size()][m];
    }
}

Follow-up: How about we want to find a solution?
The idea is to use back track from the end, each time we choose the optimal solution until we reached the beginning.

Code (Java):

import java.util.*;
public class Solution {
    public List<Integer> backPack(int[] A, int target) {
        List<Integer> result = new ArrayList<Integer>();
        
        if (A == null || A.length == 0 || target <= 0) {
            return result;
        }
        
        int[][] dp = new int[A.length + 1][target + 1];
        
        for (int i = 1; i <= A.length; i++) {
            for (int j = 0; j <= target; j++) {
                int noChoose = dp[i - 1][j];
                int choose = 0;
                if (A[i - 1] <= j) {
                    choose = dp[i - 1][j - A[i - 1]] + A[i - 1];
                }
                
                dp[i][j] = Math.max(choose, noChoose);
            }
        }
        
        findSolution(dp, A, result, A.length, target);
        
        return result;
    }
    
    private void findSolution(int[][] dp, int[] A, List<Integer> 
        result, int i, int j) {
        if (i == 0 || j == 0) {
            return;
        }
        
        if (j - A[i - 1] >= 0 && (dp[i - 1][j - A[i - 1]] + A[i - 1] > 
          dp[i - 1][j])) {
            result.add(A[i - 1]);
            findSolution(dp, A, result, i - 1, j - A[i - 1]);
        } else {
            findSolution(dp, A, result, i - 1, j);
        }
    }

    public static void main(String[] args) {
        Solution sol = new Solution();

        int[] A = new int[]{2,3,5,7};
        int target = 11;

        List<Integer> result = sol.backPack(A, target);

        for (Integer elem : result) {
            System.out.print(elem + " ");
        }

        System.out.println();
    }
}

A 1D space DP solution:
Note that for back pack solution, the DP only relies on dp[i - 1], so we may reduce the space cost by using a 1D array.

Code (Java):

public class Solution {
    /**
     * @param m: An integer m denotes the size of a backpack
     * @param A: Given n items with size A[i]
     */
    public int backPack(int m, ArrayList&ltInteger&gt; A) {
        // write your code here
        if (A == null || A.size() == 0 || m < 0) {
            return 0;
        }
        
        //int[][] dp = new int[A.size() + 1][m + 1];
        int[] dp = new int[m + 1];
        
        // initialization
        // no needed since all 0s
        
        for (int i = 0; i < A.size(); i++) {
            //dp[0] = 0;
            for (int j = m; j >= 0; j--) {
                //int noPick = dp[i - 1][j];
                int noPick = dp[j];
                int pick = 0;
                
                if (j - A.get(i) >= 0) {
                    //pick = dp[i - 1][j - A.get(i - 1)] + A.get(i - 1);
                    pick = dp[j - A.get(i)] + A.get(i);
                }
                
                //dp[i][j] = Math.max(pick, noPick);
                dp[j] = Math.max(pick, noPick);
            }
        }
        
        return dp[m];
    }
}

Noted that the inner loop should start from the end in backward order. That is because we the result on dp[j] depends on dp[i - 1][j - A[i]]. So if we update the dp[j] from left to right, the previous results will be overwritten.

3. A general 0/1 back pack problem:
http://www.geeksforgeeks.org/dynamic-programming-set-10-0-1-knapsack-problem/

Given weights and values of n items, put these items in a knapsack of capacity W to get the maximum total value in the knapsack. In other words, given two integer arrays val[0..n-1] and wt[0..n-1] which represent values and weights associated with n items respectively. Also given an integer W which represents knapsack capacity, find out the maximum value subset of val[] such that sum of the weights of this subset is smaller than or equal to W.

Understand the problem:
The problem has two conditions to satisfy:
1. Pick up the some items from the array, such that 1. the sum of the value could reach the maximum, and 2. its weight should be less than the capacity W.

So we can solve the dp problem by defining the following:

dp[n + 1][W + 1], where dp[i][j] means from the first i items we choose some, and its weight sum is less than the j, as well as the value sum is maximal.
Initial state: 0.
Transit function: There should have two cases.

If wt[i] > j, we could not choose this item since it is over-weighted. In this case, dp[i][j] = dp[i - 1][j];
Else, dp[i][j] = Math.max(dp[i - 1][j] , dp[i - 1][j - wt[i]] + val[i]);

Final state: return dp[n][W]

Code (Java):

public class Solution {
    public int knapSack(int W, int[] val, int[] wt) {
        if (val == null || val.length == 0 || wt == null || wt.length == 0 || W <= 0) {
            return 0;
        }
        
        int[][] dp = new int[val.length + 1][W + 1];
        
        for (int i = 1; i <= val.length; i++) {
            for (int j = 0; j <= W; j++) {
                dp[i][j] = dp[i - 1][j]; // no pickup
                
                if (wt[i - 1] <= j) {
                    dp[i][j] = Math.max(dp[i][j], 
                      dp[i - 1][j - wt[i - 1]] + val[i - 1]);
                }
            }
        }
        
        return dp[val.length][W];
    }

    public static void main(String[] args) {
        Solution sol = new Solution();
        int[] val = new int[]{60, 100, 120};
        int[] wt = new int[]{10, 20, 30};
        int W = 50;

        System.out.println(sol.knapSack(W, val, wt));
    }
}

1D space Solution:

public class Solution {
    public int knapSack(int W, int[] val, int[] wt) {
        if (val == null || val.length == 0 || wt == null || wt.length == 0 || W <= 0) {
            return 0;
        }
        
        int[] dp = new int[W + 1];
        
        for (int i = 1; i <= val.length; i++) {
            for (int j = W; j >= 0; j--) {
                if (wt[i - 1] <= j) {
                    dp[j] = Math.max(dp[j], dp[j - wt[i - 1]] + val[i - 1]);
                }
            }
        }
        
        return dp[W];
    }

    public static void main(String[] args) {
        Solution sol = new Solution();
        int[] val = new int[]{60, 100, 120};
        int[] wt = new int[]{10, 20, 30};
        int W = 50;

        System.out.println(sol.knapSack(W, val, wt));
    }
}

4. A multiple DP problem -- Each item can be picked up by k times.
In this question, each element could be selected at most n[i] times. Find the maximum value with the sum of the weight less or equal to W.

Again, the definition of the DP matrix is the same as above, ie,
dp[n + 1][W + 1], whereas dp[i][j] means picking the first i items with the sum of the weight less or equal to j, dp[i][j] is the maximum sum of the value.

The transit function could be as follows:
dp[i][j] = Math.max(dp[i - 1][j - k * wt[i]] + k * val[i]), where as 0<= k <= n[i]

Code (Java):

public class Solution {
    public int knapSack(int W, int[] val, int[] wt, int[] n) {
        if (val == null || val.length == 0 || 
            wt == null || wt.length == 0 || 
            n == null || n.length == 0 || W <= 0) {
            
            return 0;
        }
        
        int[][] dp = new int[val.length + 1][W + 1];
        
        for (int i = 1; i <= val.length; i++) {
            for (int j = 0; j <= W; j++) {
                for (int k = 0; k <= n[i - 1]; k++) {
                    if (k * wt[i - 1] <= j) {
                        dp[i][j] = Math.max(dp[i][j], 
                          dp[i - 1][j - k * wt[i - 1]] + k * val[i - 1]);
                    }
                }
            }
        }
        
        return dp[val.length][W];       
    }

    public static void main(String[] args) {
        Solution sol = new Solution();
        int[] val = new int[]{60, 100, 120};
        int[] wt = new int[]{10, 20, 30};
        int[] n = new int[]{100,100,100,100};
        int W = 50;

        System.out.println(sol.knapSack(W, val, wt, n));
    }
}

5. A multiple back pack problem -- each item can be picked up by unlimited number of times
In this case, we only need to change little of the code above. We change the inner most loop from k <= n[i - 1] to k * wt[i - 1] <= j.

Code (Java):

public class Solution {
    public int knapSack(int W, int[] val, int[] wt) {
        if (val == null || val.length == 0 || 
            wt == null || wt.length == 0 || 
            W <= 0) {
            
            return 0;
        }
        
        int[][] dp = new int[val.length + 1][W + 1];
        
        for (int i = 1; i <= val.length; i++) {
            for (int j = 0; j <= W; j++) {
                for (int k = 0; k * wt[i - 1] <= j; k++) {
                    dp[i][j] = Math.max(dp[i][j], 
                      dp[i - 1][j - k * wt[i - 1]] + k * val[i - 1]);
                }
            }
        }
        
        return dp[val.length][W];       
    }

    public static void main(String[] args) {
        Solution sol = new Solution();
        int[] val = new int[]{60, 100, 120};
        int[] wt = new int[]{10, 20, 30};
        int[] n = new int[]{100,100,100,100};
        int W = 50;

        System.out.println(sol.knapSack(W, val, wt));
    }
}

Other back pack DP problems:
1. k sum
http://buttercola.blogspot.com/2014/11/lintcode-k-sum.html
http://www.lintcode.com/en/problem/k-sum/#

Given n distinct positive integers, integer k (k <= n) and a number target.

Find k numbers where sum is target. Calculate how many solutions there are?

Example

Given [1,2,3,4], k=2, target=5. There are 2 solutions:

[1,4] and [2,3], return 2.

Understand the problem:
It is another back pack problem, choosing k items out of n items, where its sum equals to k.
So we could still use back pack DP solution.

Solution:
Idea: Using back pack dp

Definition: dp[n + 1][k + 1][target + 1], where dp[i][j][m] means choosing j elements out of the first i elements, and its sum equals to m
Initial state: dp[0][0][0] = 1, dp[i][0][0] = 1, 1 <= i <= n
Transit function:

dp[i][j][m] = dp[i - 1][j][m] // no choose item i
if (A[i - 1] <= m) dp[i][j][m] += dp[i - 1][j - 1][m - A[i - 1]] // choose item i

Final state: dp[n][k][target];

Code (Java):

public class Solution {
    /**
     * @param A: an integer array.
     * @param k: a positive integer (k <= length(A))
     * @param target: a integer
     * @return an integer
     */
    public int solutionKSum(int A[],int k,int target) {
        // write your code here
        if ((A == null || A.length == 0) && k == 0 && target == 0) {
            return 1;
        }
        
        if (A == null || A.length == 0 || k <= 0 || target <= 0 || k > A.length) {
            return 0;
        }
        
        int[][][] dp = new int[A.length + 1][k + 1][target + 1];
        dp[0][0][0] = 1;
        for (int i = 1; i <= A.length; i++) {
            dp[i][0][0] = 1;
        }
        
        for (int i = 1; i <= A.length; i++) {
            for (int j = 1; j <= k; j++) {
                for (int m = 1; m <= target; m++) {
                    dp[i][j][m] = dp[i - 1][j][m]; // no choose item i
                    
                    if (A[i - 1] <= m) {
                        dp[i][j][m] += dp[i - 1][j - 1][m - A[i - 1]]; // chose item i
                    }
                }
            }
        }
        return dp[A.length][k][target];
    }
}