Sunday, November 30, 2014

Lintcode: Minimum adjustment cost

Given an integer array, adjust each integers so that the difference of every adjcent integers are not greater than a given number target.
If the array before adjustment is A, the array after adjustment is B, you should minimize the sum of |A[i]-B[i]| 
Note
You can assume each number in the array is a positive integer and not greater than 100
Example
Given [1,4,2,3] and target=1, one of the solutions is [2,3,2,3], the adjustment cost is 2 and it's minimal. Return 2.

Understand the problem:

Lintcode: K sum

Given n distinct positive integers, integer k (k <= n) and a number target.
Find k numbers where sum is target. Calculate how many solutions there are?
Example
Given [1,2,3,4], k=2, target=5. There are 2 solutions:
[1,4] and [2,3], return 2.

Understand the problem:
It is another back pack problem, choosing k items out of n items, where its sum equals to k. 
So we could still use back pack DP solution.

Solution:
Idea: Using back pack dp

  • Definition:    dp[n + 1][k + 1][target + 1], where dp[i][j][m] means choosing j elements out of the first i elements, and its sum equals to m
  • Initial state:  dp[0][0][0] = 1, dp[i][0][0] = 1, 1 <= i <= n
  • Transit function:  
    • dp[i][j][m] = dp[i - 1][j][m]   // no choose item i
    • if (A[i - 1] <= m) dp[i][j][m] += dp[i - 1][j - 1][m - A[i - 1]]  // choose item i
  • Final state:  dp[n][k][target];
Code (Java):
public class Solution {
    /**
     * @param A: an integer array.
     * @param k: a positive integer (k <= length(A))
     * @param target: a integer
     * @return an integer
     */
    public int solutionKSum(int A[],int k,int target) {
        // write your code here
        if ((A == null || A.length == 0) && k == 0 && target == 0) {
            return 1;
        }
        
        if (A == null || A.length == 0 || k <= 0 || target <= 0 || k > A.length) {
            return 0;
        }
        
        int[][][] dp = new int[A.length + 1][k + 1][target + 1];
        dp[0][0][0] = 1;
        for (int i = 1; i <= A.length; i++) {
            dp[i][0][0] = 1;
        }
        
        for (int i = 1; i <= A.length; i++) {
            for (int j = 1; j <= k; j++) {
                for (int m = 1; m <= target; m++) {
                    dp[i][j][m] = dp[i - 1][j][m]; // no choose item i
                    
                    if (A[i - 1] <= m) {
                        dp[i][j][m] += dp[i - 1][j - 1][m - A[i - 1]]; // chose item i
                    }
                }
            }
        }
        return dp[A.length][k][target];
    }
}

Thursday, November 27, 2014

Lintcode: Longest Common Substring (LCS)

Given two strings, find the longest common substring.
Return the length of it.
Note
The characters in substring should occur continiously in original string. This is different with subsequnce.
Understand the problem:
In computer science, the longest common substring problem is to find the longest string (or strings) that is a substring (or are substrings) of two or more strings.

For example, ABCD, and EBCA, the LCS is BC, which has the length of 2. 

A DP Solution:

  • Definition: dp[A.length() + 1][B.length() + 1] , where as dp[i][j] means the LCS ended with i and j
  • Initial state: all 0s
  • Transit function: if (A.charAt(i) == B.charAt(j)), dp[i][j] = dp[i - 1][j - 1] + 1. Else, dp[i][j] = 0
  • Final state: Math.max(dp[0 ... A.length()][0 ... B.length()]);

            j
       #  E  B  C  A
  #   0  0   0  0  0
  A  0   0  0  0  1
i B  0   0  1  0  0
  C  0   0  0  2  0
  D  0   0  0  0  0

Code (Java):
public class Solution {
    /**
     * @param A, B: Two string.
     * @return: the length of the longest common substring.
     */
    public int longestCommonSubstring(String A, String B) {
        if (A == null || A.length() == 0 || B == null || B.length() == 0) {
            return 0;
        }
        
        int[][] dp = new int[A.length() + 1][B.length() + 1];
        int lcs = 0;
        
        for (int i = 1; i <= A.length(); i++) {
            for (int j = 1; j <= B.length(); j++) {
                if (A.charAt(i - 1) == B.charAt(j - 1)) {
                    dp[i][j] = dp[i - 1][j - 1] + 1;
                } else {
                    dp[i][j] = 0;
                }
                lcs = Math.max(lcs, dp[i][j]);
            }
        }
        return lcs;
    }
}

Lintcode: Longest Common Subsequence (LCS)

Given two strings, find the longest comment subsequence (LCS).
Your code should return the length of LCS.
Example
For "ABCD" and "EDCA", the LCS is "A" (or D or C), return 1
For "ABCD" and "EACB", the LCS is "AC", return 2

Understand the problem:
The longest common subsequence (LCS) problem is the problem of finding the longest subsequence common to all sequences in a set of sequences (often just two sequences). It differs from problems of finding common substrings: unlike substrings, subsequences are not required to occupy consecutive positions within the original sequences. The longest common subsequence problem is a classic computer science problem, the basis of data comparison programs such as the diff utility, and has applications in bioinformatics. It is also widely used by revision control systems such as Git for reconciling multiple changes made to a revision-controlled collection of files.

A recursive Solution:
http://www.geeksforgeeks.org/dynamic-programming-set-4-longest-common-subsequence/

A DP Solution:
  • Definition: dp[A.length() +  1][B.length() + 1], where dp[i][j] means the LCS between string A[0, ... i] to B[0, ..., j]. 
  • Initialization: dp[0][0] = 0; dp[0][j] = 0; dp[i][0] = 0;
  • Transit function: 
if (A.charAt(i) == B.charAt(j)) {
    dp[i][j] = dp[i - 1][j - 1] + 1;
} else {
    dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
}
  • Final state: dp[A.length()][B.length()]

Code (Java):
public class Solution {
    /**
     * @param A, B: Two strings.
     * @return: The length of longest common subsequence of A and B.
     */
    public int longestCommonSubsequence(String A, String B) {
        if (A == null || A.length() == 0 || B == null || B.length() == 0) {
            return 0;
        }
        
        int[][] dp = new int[A.length() + 1][B.length() + 1];
        
        for (int i = 1; i <= A.length(); i++) {
            for (int j = 1; j <= B.length(); j++) {
                if (A.charAt(i - 1) == B.charAt(j - 1)) {
                    dp[i][j] = dp[i - 1][j - 1] + 1;
                } else {
                    dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
                }
            }
        }
        
        return dp[A.length()][B.length()];
    }
}



Lintcode: Longest increasing subsequence (LIS)


Given a sequence of integers, find the longest increasing subsequence (LIS).You code should return the length of the LIS.Example
For [5, 4, 1, 2, 3], the LIS is [1, 2, 3], return 3For [4, 2, 4, 5, 3, 7], the LIS is [4, 4, 5, 7], return 4

Understand the problem:
The longest Increasing Subsequence (LIS) problem is to find the length of the longest subsequence of a given sequence such that all elements of the subsequence are sorted in increasing order. For example, length of LIS for { 10, 22, 9, 33, 21, 50, 41, 60, 80 } is 6 and LIS is {10, 22, 33, 50, 60, 80}.

The problem is a classical computer science problem, LIS. Note that the problem asks to find out the subsequence, not substring. 

A DP Solution:
This is a sequence DP problem, so we define 
  • dp[N], whereas dp[i] denotes the length of the LIS including the array element arr[i]. 
  • Initial state: dp[i] = 1
  • Transit function: for each j, where 0 <= j < i, if (A[i] >= A[j]), dp[i] = Math.max(dp[i], dp[j] + 1);
  • Final state: result = 1, Math.max(result, dp[i]);

Code (Java):
public class Solution {
    /**
     * @param nums: The integer array
     * @return: The length of LIS (longest increasing subsequence)
     */
    public int longestIncreasingSubsequence(int[] nums) {
        // write your code here
        if (nums == null || nums.length == 0) {
            return 0;
        }
        
        if (nums.length == 1) {
            return 1;
        }
        
        int[] dp = new int[nums.length];
        for (int i = 0; i < nums.length; i++) {
            dp[i] = 1;
        }
        
        for (int i = 0; i < nums.length; i++) {
            for (int j = 0; j < i; j++) {
                if (nums[i] >= nums[j]) {
                    dp[i] = Math.max(dp[i], dp[j] + 1);
                }
            }
        }
        
        int result = 1;
        for (int i = 0; i < nums.length; i++) {
            result = Math.max(dp[i], result);
        }
        
        return result;
    }
}

A wrong Solution:
Here is a wrong solution using the greedy idea. The basic idea is to use two loops, the outer loop i, starts from 0, and the inner loop starts from i + 1. For each inner loop, we scan through the array and compare the current value with the last one. The code is as follow:

Code (Java):
public class Solution {
    /**
     * @param nums: The integer array
     * @return: The length of LIS (longest increasing subsequence)
     */
    public int longestIncreasingSubsequence(int[] nums) {
        // write your code here
        if (nums == null || nums.length == 0) {
            return 0;
        }
        
        if (nums.length == 1) {
            return 1;
        }
        
        int maxLen = 1;
        for (int i = 0; i < nums.length; i++) {
            int prev = nums[i];
            int len = 1;
            for (int j = i + 1; j < nums.length; j++) {
                if (nums[j] >= prev) {
                    len++;
                    prev = nums[j];
                }
            }
            maxLen = Math.max(maxLen, len);
        }
        
        return maxLen;
    }
}

The code cannot pass the test case :
[10,1,11,2,12,3,11]
where the correct result is 4 (1, 2, 3, 11), but the solution above will generate 3, (1, 11, 12). That is because when j is at 11, it will ignore 2 and generate the incorrect results.

Thursday, November 20, 2014

Knowledge: Object Oriented (OO)

1. What is polymorphism? difference between inheritance and polymorphism
Inheritance refers to using the structure and behavior of a superclass in a subclass. Polymorphism refers to changing the behavior of a superclass in the subclass.

Polymorphism is the ability of one method to have different behaviors depending on the type of the object it is being called on or the type of object passed as a parameter.
Inheritance is when a 'class' derives from an existing 'class'. So if you have a Person class, then you have a Student class that extends Person, Student inherits all the things that Person has. There are some details around the access modifiers you put on the fields/methods in Person, but that's the basic idea. For example, if you have a private field on Person, Student won't see it because its private, and private fields are not visible to subclasses.
Polymorphism deals with how the program decides which methods it should use, depending on what type of thing it has. If you have a Person, which has a read method, and you have a Student which extends Person, which has its own implementation of read, which method gets called is determined for you by the runtime, depending if you have a Person or a Student. It gets a bit tricky, but if you do something like
Person p = new Student();
p.read();
the read method on Student gets called. Thats the polymorphism in action. You can do that assignment because a Student is a Person, but the runtime is smart enough to know that the actual type of p isStudent.
2. Java Overloading vs Overriding
Overriding vs. Overloading are two confusing concepts for Java novice programmers.
Overloading is the situation that two or more methods in the same class have the same name but different arguments.
Overriding means having two methods with the same method name and arguments (i.e., method signature). One of them is in the Parent class and the other is in the Child class.

Override example:
class Dog{
    public void bark(){
        System.out.println("woof ");
    }
}
class Hound extends Dog{
    public void sniff(){
        System.out.println("sniff ");
    }
 
    public void bark(){
        System.out.println("bowl");
    }
}
 
public class OverridingTest{
    public static void main(String [] args){
        Dog dog = new Hound();
        dog.bark();
    }
}

Knowledge: Parallelism

In this post, we will cover some simple concepts about parallelism.

1. Process v.s Thread
1. Threads are easier to create than processes since they
don't require a separate address space.

2. Multithreading requires careful programming since threads
share data strucures that should only be modified by one thread
at a time.  Unlike threads, processes don't share the same
address space.

3.  Threads are considered lightweight because they use far
less resources than processes.

4.  Processes are independent of each other.  Threads, since they
share the same address space are interdependent, so caution
must be taken so that different threads don't step on each other.  
This is really another way of stating #2 above.

5.  A process can consist of multiple threads.




2. mutex vs. semaphores?
A mutex is like a lock. Mutexes are used in parallel programming to ensure that only one thread can access a shared resource at a time.


Semaphores is in computer science, particularly in operating systems, a semaphore is a variable or abstract data type that is used for controlling access, by multiple processes, to a common resource in a parallel programming or a multi user environment.

A useful way to think of a semaphore is as a record of how many units of a particular resource are available, coupled with operations to safely (i.e., without race conditions) adjust that record as units are required or become free, and, if necessary, wait until a unit of the resource becomes available. Semaphores are a useful tool in the prevention of race conditions; however, their use is by no means a guarantee that a program is free from these problems. Semaphores which allow an arbitrary resource count are called counting semaphores, while semaphores which are restricted to the values 0 and 1 (or locked/unlocked, unavailable/available) are called binary semaphores.

3. How to implement a mutex?

void acquire_lock( Lock lock1) {
    while (test_and_set(lock1));
}

int test_and_set(int x) {
     if (x) {
         return 1;
     } else {
         x = 1;
         return 0;
     }
}

void acquire_unlokc(Lock lock1) {
    lock1 = 0;   // we need it to be atomic instruction
}


4. What is synchronized  method in Java?
To make a method synchronized, simply add the synchronized keyword to its declaration:
public class SynchronizedCounter {
    private int c = 0;

    public synchronized void increment() {
        c++;
    }

    public synchronized void decrement() {
        c--;
    }

    public synchronized int value() {
        return c;
    }
}

If count is an instance of SynchronizedCounter, then making these methods synchronized has two effects:
  • First, it is not possible for two invocations of synchronized methods on the same object to interleave. When one thread is executing a synchronized method for an object, all other threads that invoke synchronized methods for the same object block (suspend execution) until the first thread is done with the object.
  • Second, when a synchronized method exits, it automatically establishes a happens-before relationship with any subsequent invocation of a synchronized method for the same object. This guarantees that changes to the state of the object are visible to all threads.
Note that constructors cannot be synchronized — using the synchronized keyword with a constructor is a syntax error. Synchronizing constructors doesn't make sense, because only the thread that creates an object should have access to it while it is being constructed.


Whenever a synchronized method is called, the mutex is locked. When the method is finished, the mutex is unlocked. This ensures that only one synchronized method is called on a given object.


5. How to prevent deadlocks?
Lock Ordering
Deadlock occurs when multiple threads need the same locks but obtain them in different order.

If you make sure that all locks are always taken in the same order by any thread, deadlocks cannot occur. Look at this example:

Thread 1:
lock A
lock B

Thread 2:
wait for A lock C (when A locked)

Thread 3:
wait for A wait for B wait for C


If a thread, like Thread 3, needs several locks, it must take them in the decided order. It cannot take a lock later in the sequence until it has obtained the earlier locks.

For instance, neither Thread 2 or Thread 3 can lock C until they have locked A first. Since Thread 1 holds lock A, Thread 2 and 3 must first wait until lock A is unlocked. Then they must succeed in locking A, before they can attempt to lock B or C.

Lock ordering is a simple yet effective deadlock prevention mechanism. However, it can only be used if you know about all locks needed ahead of taking any of the locks. This is not always the case.


Lock Timeout
Another deadlock prevention mechanism is to put a timeout on lock attempts meaning a thread trying to obtain a lock will only try for so long before giving up. If a thread does not succeed in taking all necessary locks within the given timeout, it will backup, free all locks taken, wait for a random amount of time and then retry. The random amount of time waited serves to give other threads trying to take the same locks a chance to take all locks, and thus let the application continue running without locking.

Here is an example of two threads trying to take the same two locks in different order, where the threads back up and retry:

Thread 1 locks AThread 2 locks B
Thread 1 attempts to lock B but is blockedThread 2 attempts to lock A but is blocked
Thread 1's lock attempt on B times outThread 1 backs up and releases A as wellThread 1 waits randomly (e.g. 257 millis) before retrying.
Thread 2's lock attempt on A times outThread 2 backs up and releases B as wellThread 2 waits randomly (e.g. 43 millis) before retrying.


In the above example Thread 2 will retry taking the locks about 200 millis before Thread 1 and will therefore likely succeed at taking both locks. Thread 1 will then wait already trying to take lock A. When Thread 2 finishes, Thread 1 will be able to take both locks too (unless Thread 2 or another thread takes the locks in between).

An issue to keep in mind is, that just because a lock times out it does not necessarily mean that the threads had deadlocked. It could also just mean that the thread holding the lock (causing the other thread to time out) takes a long time to complete its task.

Additionally, if enough threads compete for the same resources they still risk trying to take the threads at the same time again and again, even if timing out and backing up. This may not occur with 2 threads each waiting between 0 and 500 millis before retrying, but with 10 or 20 threads the situation is different. Then the likeliness of two threads waiting the same time before retrying (or close enough to cause problems) is a lot higher.

A problem with the lock timeout mechanism is that it is not possible to set a timeout for entering a synchronized block in Java. You will have to create a custom lock class or use one of the Java 5 concurrency constructs in the java.util.concurrency package. Writing custom locks isn't difficult but it is outside the scope of this text. Later texts in the Java concurrency trails will cover custom locks.


Deadlock Detection
Deadlock detection is a heavier deadlock prevention mechanism aimed at cases in which lock ordering isn't possible, and lock timeout isn't feasible.

Every time a thread takes a lock it is noted in a data structure (map, graph etc.) of threads and locks. Additionally, whenever a thread requests a lock this is also noted in this data structure.

When a thread requests a lock but the request is denied, the thread can traverse the lock graph to check for deadlocks. For instance, if a Thread A requests lock 7, but lock 7 is held by Thread B, then Thread A can check if Thread B has requested any of the locks Thread A holds (if any). If Thread B has requested so, a deadlock has occurred (Thread A having taken lock 1, requesting lock 7, Thread B having taken lock 7, requesting lock 1).

Of course a deadlock scenario may be a lot more complicated than two threads holding each others locks. Thread A may wait for Thread B, Thread B waits for Thread C, Thread C waits for Thread D, and Thread D waits for Thread A. In order for Thread A to detect a deadlock it must transitively examine all requested locks by Thread B. From Thread B's requested locks Thread A will get to Thread C, and then to Thread D, from which it finds one of the locks Thread A itself is holding. Then it knows a deadlock has occurred.

Below is a graph of locks taken and requested by 4 threads (A, B, C and D). A data structure like this that can be used to detect deadlocks.




So what do the threads do if a deadlock is detected?

One possible action is to release all locks, backup, wait a random amount of time and then retry. This is similar to the simpler lock timeout mechanism except threads only backup when a deadlock has actually occurred. Not just because their lock requests timed out. However, if a lot of threads are competing for the same locks they may repeatedly end up in a deadlock even if they back up and wait.

A better option is to determine or assign a priority of the threads so that only one (or a few) thread backs up. The rest of the threads continue taking the locks they need as if no deadlock had occurred. If the priority assigned to the threads is fixed, the same threads will always be given higher priority. To avoid this you may assign the priority randomly whenever a deadlock is detected.

Saturday, November 1, 2014

System Design: Design a Tiny URL

1. What is a Tiny URL?
http://n00tc0d3r.blogspot.com/2013/09/big-data-tinyurl.html

tinyurl is a URL service that users enter a long URL and then the service return a shorter and unique url such as "http://tiny.me/5ie0V2". The highlight part can be any string with 6 letters containing [0-9, a-z, A-Z]. That is, 62^6 ~= 56.8 billions unique strings.

Step 1: Constraints and use cases
Use cases:
  1. shortening: take a URL => return a much shorter URL
  2. Redirection: take a short URL => redirect to the original URL
  3. Custom URL
  4. High availability of the system
Out of scope:
  1. Analytics
  2. Automatic link expiration
  3. Manual link removal
  4. UI v.s. API

System contraints
1. New URLs per month: 100 MLN
2. 1BN requests per month
3. 10% from shortening and 90% from redirection
4. Request per second: 400 + (40 shortening, 360 from redirection)
5. 6BN URLs in 5 years
6. 500 bytes per URL
7. 5 bytes per hash
8. 3 TB for all URLs, 36 GB for hashes
9. New data written per second (40 * (500 + 5)) = 20 K
10. New data read per second 360 * (500 + 5) = 180 K

Step 2: Abstract Design

  1. Application service layer
    1. Shortening service
    2. Redirection service
  2. Data storage layer (keep track of hash <=> url mapping)
    1. Acts like a big hash table


hashed_url = convert_to_base64(md5(original_url + random_salt))[:6]


Step 3: Understand the bottlenecks
Traffic:
Lot of data:

Bottleneck:
Traffic is not probably very hard
Data is interesting

Step 4: Scaling your abstract design




Useful links:
http://en.wikipedia.org/wiki/URL_shortening
http://en.wikipedia.org/wiki/TinyURL
http://en.wikipedia.org/wiki/Bitly

Useful concepts:
1. MD5 http://en.wikipedia.org/wiki/MD5
2. Base64 http://en.wikipedia.org/wiki/Base64
3. mysql, nosql http://www.thegeekstuff.com/2014/01/sql-vs-nosql-db/ 
5. consistent hashing http://en.wikipedia.org/wiki/Consistent_hashing
6. MongoDB http://en.wikipedia.org/wiki/MongoDB

System Design: Introduction to System Design

http://www.hiredintech.com/app#system-design

I. System Design process:
Step 1: Constraints and use cases
The very first thing you should do with any system design question is to clarify the system's constraints and to identify what use cases the system needs to satisfy. Spend a few minutes questioning your interviewer and agreeing on the scope of the system. Many of the same rules we discussed while talking about algorithm design apply here as well.

Step 2: Abstract Design

Step 3: Understand the bottleneck
Step 4: Scale your abstract design

2. Scalability fundamental:

  • Vertical scaling

  • Horizontal scaling
  • Caching
  • Load balancing
  • Database replication
  • Database partitioning