Monday, September 7, 2015

Leetcode: H-Index

Given an array of citations (each citation is a non-negative integer) of a researcher, write a function to compute the researcher's h-index.
According to the definition of h-index on Wikipedia: "A scientist has index h if h of his/her N papers have at least h citations each, and the other N − h papers have no more thanh citations each."
For example, given citations = [3, 0, 6, 1, 5], which means the researcher has 5 papers in total and each of them had received 3, 0, 6, 1, 5 citations respectively. Since the researcher has 3 papers with at least 3 citations each and the remaining two with no more than 3 citations each, his h-index is 3.
Note: If there are several possible values for h, the maximum one is taken as the h-index.
Hint:
  1. An easy approach is to sort the array first.
  2. What are the possible values of h-index?
  3. A faster approach is to use extra space.
Understand the problem:
The key of the problem is to understand the definition of the H-index.
H-index is the number of h papers, in which each paper has at least h citations. 
e.g. If there are 2 papers, and each of the 2 papers has at least 2 citations, the h-index is 2. 

So the naive solution is to sort the array in reverse order, i.e, citations from high to low. 
If citation[i] >= i + 1, continue to next. 
Else, return i. 

There is one corner case is if we cannot stop in the loop iteration, we return the number of papers. 

Code (Java):


public class Solution {
    public int hIndex(int[] citations) {
        if (citations == null || citations.length == 0) {
            return 0;
        }
        
        Arrays.sort(citations);
        // Reverse the array
        int m = 0;
        int n = citations.length - 1;
        while (m < n) {
            swap(citations, m, n);
            m++;
            n--;
        }
        
        int len = citations.length;
        
        int i = 0;
        for (i = 0; i < len; i++) {
            if (i + 1 > citations[i]) {
                return i;
            }
        }
        
        return i;
    }
    
    private void swap(int[] array, int i, int j) {
        int tmp = array[i];
        array[i] = array[j];
        array[j] = tmp;
    }
}

A O(n) time solution:
Since the problem requires for a O(n) time solution, we can consider using the bucket sort. 
The number of buckets is N + 1, in which each bucket holds the number of papers of which the citation is i. Note the last bucket will store the number of papers in which citations is greater or equal to N. 

Code (Java):
public class Solution {
    public int hIndex(int[] citations) {
        if (citations == null || citations.length == 0) {
            return 0;
        }
        
        int n = citations.length;
        int[] buckets = new int[n + 1];
        
        // Step 1: Accmulate the number of citations for each bucket
        for (int i = 0; i < n; i++) {
            if (citations[i] <= n) {
                buckets[citations[i]]++; 
            } else {
                buckets[n]++;
            }
        }
        
        // Step 2: iterate the citations from right to left.
        int numPapers = 0;
        for (int i = n; i >= 0; i--) {
            numPapers += buckets[i];
            if (numPapers >= i) {
                return i;
            }
        }
        
        return 0;
    }
}


2 comments:

  1. How come the h index is 5 for int[] citations = {40,30,10,20,50}; ?

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete