Monday, September 21, 2015

H-Index

Given an array of citations (each citation is a non-negative integer) of a researcher, write a function to compute the researcher's h-index.

According to the definition of h-index on Wikipedia: "A scientist has index h if h of his/her N papers have at least h citations each, and the other N − h papers have no more than h citations each."

For example, given citations = [3, 0, 6, 1, 5], which means the researcher has 5 papers in total and each of them had received 3, 0, 6, 1, 5 citations respectively. Since the researcher has 3 papers with at least 3 citations each and the remaining two with no more than 3 citations each, his h-index is 3.


Note: If there are several possible values for h, the maximum one is taken as the h-index.


This is a very interesting problem. 
1. Sort the array in descending order.
2. Scan the array from the biggest element. At each position, there are two values.
    n - Number of elements scanned so far
    k - The value of the last scanned element
n starts from 1 and k starts from the largest value. So in the beginning, k should always be greater than n. We can try to take the p value now, which equals Math.min(n, k). Since the p actually takes the value of n, which is smaller, we have to check the following element's value. If its value is greater than p, this p is invalid. 
3. While the scan keeps going, we might hit a point where n==k. That means we have n elements, there values are all >= k. For the remaining elements, their values are smaller than k. So p=n at this moment is the answer we are looking for.
4. Sometimes, if there is no point where n==k, we might be in this situation. At n', array value is k'. Math.min(n', k')=n' and the value of the next element k >= n'. So we need to keep going. For the next element, there are n elements in total, the value at n is k. Math.min(n, k)=k. Because we already know k >= n'. Here we know k <= n'+1. k can have only two possible values, n' or n1+1. So we can safely say there are k elements with values >= k. For the remaining element, their value <= k.

To summarize, starting from the biggest element, I'm looking for a point where n==k. If this position doesn't exist, then the transition point between k and k' is what I'm looking for.

 public class Solution {  
   public int hIndex(int[] citations) {  
     Arrays.sort(citations);  
     for(int i = citations.length-1; i >= 0; i--) {  
       int n = citations.length-i;  
       int k = citations[i];  
       int h = Math.min(n, k);  
       if(i==0 || citations[i-1]<h)  
         return h;  
     }  
     return 0;  
   }  
 }  

No comments:

Post a Comment