My collections.

Sunday, September 20, 2015

Implement strStr() using KMP

I'm reading "Introduction to Algorithms" to review the KMP algorithm. I feel the most important concept in the algorithm is k=pi[k]. It's very tricky to understand. The key to understand this is to remember for the matched part between target string and pattern string, checking target string is the same as checking pattern string. When constructing pi[], pattern string is both target string and pattern string. So apply the above rule again. Some notes per my understanding.

Code for strstr, which is also a leetcode question.

1:    public int strStr(String haystack, String needle) {  
2:      int l = needle.length();  
3:      if(l == 0)  
4:        return 0;  
5:      int[] pie = new int[l];  
6:      pie[0] = 0;  
7:      int k = 0;  
8:      for(int q = 1; q < l; q++) {  
9:        while(k>0 && needle.charAt(k)!=needle.charAt(q))  
10:          k = pie[k-1];  
11:        if(needle.charAt(k) == needle.charAt(q))  
12:          k++;  
13:        pie[q] = k;  
14:      }  
15:      int i = 0;  
16:      int j = 0;  
17:      while(i < haystack.length()) {  
18:        while(j>0 && needle.charAt(j)!=haystack.charAt(i))  
19:          j = pie[j-1];  
20:        if(needle.charAt(j) == haystack.charAt(i))  
21:          j++;  
22:        if(j == needle.length())  
23:          return i-j+1;  
24:        i++;  
25:      }  
26:      return -1;  
27:    }

The Skyline Problem

When I first started working on the problem, my idea is to insert and merge new intervals into the result list while loop through buildings. For the result list, I only need to grab the last two elements and compare with the new interval and do all the merging and insertion stuff. The solution can only pass part of the test cases. The reason is the new segment might have a start time which is earlier than the last two points. It's not enough to only compare it against the last two elements of the result list.

My solution fails in the above cases. When calculating the skyline of 1, 2 and 3, 3's starting point is before p3 and p4. The correct result after inserting 3 is create a new p3 and remove the original p3 and p4. It tells me that for each new interval, I need to go through all the existing points in result list, find the correct position to insert the new interval and than possibly modify many subsequent points in the result list.

I started my search of correct answer and this one looks very interesting. https://leetcode.com/discuss/37630/my-c-code-using-one-priority-queue-812-ms

Copy the explanation from the link above here,

"The idea is to do line sweep and just process the buildings only at the start and end points. The key is to use a priority queue to save all the buildings that are still "alive". The queue is sorted by its height and end time (the larger height first and if equal height, the one with a bigger end time first). For each iteration, we first find the current process time, which is either the next new building start time or the end time of the top entry of the live queue. If the new building start time is larger than the top one end time, then process the one in the queue first (pop them until it is empty or find the first one that ends after the new building); otherwise, if the new building starts before the top one ends, then process the new building (just put them in the queue). After processing, output it to the resulting vector if the height changes."

The explanation is very useful, but you may need to read it a couple of times before it makes sense to you. The fundamental principle is that at any time, the last element in result list is the starting point of the last segment. This point extends until the top point of the priority queue.

Try the code with the example below is the best way to understand the algorithm.

1:  public class Solution {  
2:    public class Pos {  
3:      int h;  
4:      int x;  
5:      public Pos(int hh, int xx) {  
6:        h = hh;  
7:        x = xx;  
8:      }  
9:    }  
10:    public List<int[]> getSkyline(int[][] buildings) {  
11:      LinkedList<int[]> res = new LinkedList<int[]> ();  
12:      if(buildings.length == 0)  
13:        return res;  
14:      PriorityQueue<Pos> queue = new PriorityQueue<Pos> (1, new Comparator<Pos>() {  
15:        @Override  
16:        public int compare(Pos p1, Pos p2) {  
17:          if(p1.h != p2.h)  
18:            return p2.h-p1.h;  
19:          else  
20:            return p1.x-p2.x;  
21:        }  
22:      });  
23:      int curr = 0;  
24:      int currX = 0;  
25:      int currH = 0;  
26:      while(curr<buildings.length || !queue.isEmpty()) {  
27:        //when buildings[curr] overlaps with segments already inserted  
28:        if(queue.isEmpty() || (curr<buildings.length && buildings[curr][0] <= queue.peek().x)) {  
29:          currX = buildings[curr][0];  
30:          //Insert the end point of this segment and the following segments' end point having the same start point into the queue  
31:          while(curr<buildings.length && buildings[curr][0]==currX) {  
32:            queue.add(new Pos(buildings[curr][2], buildings[curr][1]));  
33:            curr++;  
34:          }  
35:        }  
36:        //buildings[curr] has no overlap with already inserted segments  
37:        //Finish all the inserted segments before dealing with buildings[curr]  
38:        else {  
39:          currX = queue.peek().x;  
40:          //For end points which terminate earlier than currX and not has high as currX, they will be invisible in the skyline  
41:          //So we should directly pop them away  
42:          while(!queue.isEmpty() && queue.peek().x <= currX)  
43:            queue.poll();  
44:        }  
45:        //If queue is empty, it's the last point. The height should be zero  
46:        currH = queue.isEmpty() ? 0 : queue.peek().h;  
47:        //If the newly inserted point has the same height as the last element of result list, there is no need to insert it. Because the last element of result list will extend until the top of priority list, which must have a larger end point  
48:        if(res.isEmpty() || res.getLast()[1]!=currH) {  
49:          res.add(new int[]{currX, currH});  
50:        }  
51:      }  
52:      return res;  
53:    }  
54:  }

Sunday, September 13, 2015

Best Time to Buy and Sell Stock IV

Best Time to Buy and Sell Stock IV

Say you have an array for which the ith element is the price of a given stock on day i.

Design an algorithm to find the maximum profit. You may complete at most k transactions.

Note:

You may not engage in multiple transactions at the same time (ie, you must sell the stock before you buy again).

刚拿到这道题目的时候完全没有思路，想借鉴 Best Time to Buy and Sell Stock III 的思路，但是复杂度太高，过不了大的test。看了许多大牛的文章，大家的思路基本上都来源于这篇博客，http://blog.csdn.net/linhuanmars/article/details/23236995。文章的思路很好，用local和global两个数组，基于DP来解决这个问题，但是我一直看不懂关于local的递推关系式，

local[i][j]=max(global[i-1][j-1]+max(diff,0), local[i-1][j]+diff) where diff=prices[i]-prices[i-1]

我觉得这个关系大致正确，但仔细推敲却不make sense。

local[i-1][j]+diff这一部分可以理解，local[i-1][j]就是进行了j次transaction，其中最后一次是在(i-1)的位置卖出。local[i-1][j]+diff就是把最后一次卖出的位置从i-1移到了i，因为相当于合并了两次交易，所以总共的交易次数还是j。

global[i-1][j-1]+max(diff,0)的含义就很confusing了，global[i-1][j-1]意思是到(i-1)为止，在交易了最多(j-1)次的情况下的最大收益，要从这个得到local[i][j]，就要保证在i的位置卖一次，但是global[i-1][j-1]有可能会包含在(i-1)的位置卖股票的情况，这种情况下，在i的位置再卖一次显然没有任何意义。

基于这些大牛的思路，我自己总结的更加容易理解的关系是这样的。

在计算local[i][j]的时候，这里的关键因素是多了一个元素，prices[i]。local[i][j]的意思是交易了j次，并且最后一次的卖发生在i。这个元素究竟会怎样影响最大收益呢？它有两种方式影响最大收益。

1. 把(i-1)处的卖移到i处，由于这相当于合并了两次交易，所以并不会影响总的交易次数。这种情况下的收益是local[i-1][j]+diff，diff=prices[i]-prices[i-1]。

2. 在(i-1)处买，在i处卖，得到的收益加上global[i-2][j-1]，就是local[i][j]可能的最大值。至于为什么不用再考虑其他的情况，比如说在(i-2)处买，在i处卖，这种情况相当于在(i-2)处买，在(i-1)处卖，在i处再卖。这种情况已经被case 1 cover了，所以无需再考虑。总之，在case 2，收益等于global[i-2][j-1]+diff。

所以，综合两种情况，递推关系应该如下，

local[i][j] = local[i-1][j] + diff

if (i-2 >= 0) local[i]][j] = max(local[i][j], global[i-2][j-1]+diff)

global[i][j] = max(global[i-1][j], local[i][j])

这道题的思路其实还是主要源于各位大牛的博客，只是我实在理解不了其中的一个递推关系，所以在这里写下了自己的思路。

1:  public class Solution {  
2:    public int maxProfit(int k, int[] prices) {  
3:      if (k < 1 || prices.length <= 1)  
4:        return 0;  
5:      if (k >= prices.length/2)  
6:        return helper(prices);  
7:      int[][] local = new int[prices.length][k+1];  
8:      int[][] global = new int[prices.length][k+1];  
9:      for(int i = 1; i < prices.length; i++) {  
10:        for(int j = 1; j <= k; j++) {  
11:          int diff = prices[i] - prices[i-1];  
12:          local[i][j] = local[i-1][j]+diff;  
13:          if(i-2 >= 0)  
14:            local[i][j] = Math.max(local[i][j], global[i-2][j-1]+diff);  
15:          global[i][j] = Math.max(global[i-1][j], local[i][j]);  
16:        }  
17:      }  
18:      return global[prices.length-1][k];  
19:    }  
20:    public int helper(int[] prices) {  
21:      int profit = 0;  
22:      for(int i = 1; i < prices.length; i++) {  
23:        profit += Math.max(prices[i]-prices[i-1], 0);  
24:      }  
25:      return profit;  
26:    }  
27:  }

Sunday, September 14, 2014

Inheritance, polymorphism, abstract class and interface (Notes of reading Head First Java)

Inheritance

subclass inherits from superclass by using keyword extends

Ex: public class subclass extends superclass {}

A subclass inherits all public instance variables and methods of superclass, but does not inherit the private instance variables and methods
A nonpublic class can only be inherited by classes from the same package
Inherited methods can be overridden. The method cannot be overridden if it’s marked with final modifier
To invoke the superclass version of a method from a subclass that’s overridden the method, use the super keyword

Ex: super.method()

Polymorphism

superclass obj = new subclass() //The left is called reference type. The right is called object type
With polymorphism, the reference type can be a superclass of the actual object type.
obj.method() // If method is overridden in subclass, then method of subclass is called though the reference type is superclass
Besides assignment, we can also have polymorphic arguments and return types.
Rules for overriding

Arguments must be the same types and return types must be compatible.
The method can’t be less accessible.

Overloading vs overriding

Overloading is having two methods with the same name but different argument lists
The return types can be different
You can’t change only the return type
You can vary the access levels in any direction

Abstract class

The main reason for abstract class is that some classes just should not be instantiated

Ex: abstract public class superClass{}

The abstract method means the method must be overridden.

Ex: public abstract void method(); // There is no body for abstract method

Abstract class can contain both abstract and non-abstract methods
The first concrete class in the inheritance tree must implement all abstract methods
You cannot make a new instance of an abstract type, but you can make an array object declared to hold that type
Ex:

superClass obj = new superClass() // illegal
superClass[] objs = new superClass[5] // legal, each element inside objs can be either superClass or subClass

You can call a method on an object reference only if the class of the reference type actually has the method
Ex:

public class subClass extends superClass{}
There is a method METHOD which is only defined in subClass
superClass obj = new subClass()
obj.METHOD() // illegal, cannot pass compiler
The way to workaround is:

obj_cast = (subClass) obj
obj_cast.METHOD() //legal

The compiler decides whether you can call a method based on the reference type, not the actual object type. The method you are calling on a reference must be in the class of that reference type. Doesn’t matter what the actual object is.

If a class contains abstract methods, the class must be declared as abstract
Every class in Java extends class Object

Interface

A Java interface is like a 100% pure abstract class
To define an interface

public interface intClass{}

To implement an interface

public class subClass extends superClass implements intClass{}

Interface methods are implicitly public and abstract, so typing ‘public’ and ‘abstract’ is optional
Benefits of interface

If you use interfaces instead of concrete subclasses as arguments and return types, you can pass anything that implements that interface
A class can implement multiple interfaces

A class that implements an interface must implement all the methods of the interface, since all interface methods are implicitly public and abstract

Saturday, September 13, 2014

Organize your code with package and jar (Notes of reading Head First Java)

I'm reading 'Head First Java' recently and here are some notes about package and jar.

Jar usage

Use jar file to compact your class files

/projects/class contains all classes files
/projects/source contains all java files
create manifest.txt under class
content of manifest.txt:
Main-Class: <main_class> // don’t put .class on the end
Press the return key after typing the Main-Class line

Create jar file

cd /projects/class
jar -cvmf manifest.txt <file_name>.jar *.class

Execute jar file

java -jar <file_name>.jar

Organize your code in packages

Organize source code

/projects/source/com/shunrang/ contains all java files
Add this as the first line of all java files: package com.shunrang

Compile source code

mkdir /projects/class
cd /projects/souce
javac -d ../class com/shunrang/*.java

Run your code

cd /projects/class
java com.shunrang.<main_class>
The -d flag in javac will create the same structure under class folder as what is inside source folder

Combine Jar and package

Create manifest.txt file

Put manifest.txt under /projects/class
Content of manifest.txt: Main-Class: com.shunrang.<main_class>

Create Jar file

cd /projects/class
jar -cvmf manifest.txt <file_name>.jar com //All you specify is the com directory

Run Jar file

java -jar <file_name>.jar

Useful jar command

jar -tf <file_name>.jar //List the contents of a JAR, -tf stands for ‘Table File’
jar -xf <file_name>.jar //Extract the contents of a JAR, -xf stands for ‘Extract File'

Tuesday, August 26, 2014

Run wordcount on Hadoop 2.5

Wordcount is the hello_world program for mapreduce. When actually running it, I met with some problems. I'm using hadoop 2.5. Most tutorials are designed for Hadoop older than 2.0 and the code is slightly different for 2.5.

package org.myorg;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.util.*;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapred.JobConf;
public class myWordCount {
    public static class Map extends Mapper
        <LongWritable, Text, Text, IntWritable> {
            private final static IntWritable one = new IntWritable(1);
            private Text word = new Text();

            public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
                String line = value.toString();
                StringTokenizer tokenizer = new StringTokenizer(line);
                while (tokenizer.hasMoreTokens()) {
                    word.set(tokenizer.nextToken());
                    context.write(word, one);
                }
            }
        }
    public static class Reduce extends Reducer
        <Text, IntWritable, Text, IntWritable> {
            public void reduce(Text key, Iterator<IntWritable> values, Context context) throws IOException, InterruptedException {
                int sum = 0;
                while (values.hasNext()) {
                    sum += values.next().get();
                }
                context.write(key, new IntWritable(sum));
            }
        }
        public static void main(String[] args) throws Exception {
            Configuration conf = new Configuration();
            conf.set("mapreduce.job.queuename", "apg_p7");
            System.out.println("This is a new version");
            Job job = new Job(conf);
            job.setJarByClass(myWordCount.class);
            job.setJobName("myWordCount");
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);
            job.setMapperClass(myWordCount.Map.class);
            job.setCombinerClass(myWordCount.Reduce.class);
            job.setReducerClass(myWordCount.Reduce.class);
            FileInputFormat.addInputPath(job, new Path(args[0]));
            FileOutputFormat.setOutputPath(job, new Path(args[1]));
            job.waitForCompletion(true);
        }
}

There are some changes in the arguments of map and reduce, as well as the settings in main. To run the program under mapreduce, the following steps needs to be done.

1. Put source code under this location

/project/src/org/myorg/myWordCount.java

2. Compile java code

mkdir /project/class;
cd /project;
javac -classpath `yarn classpath` -d ./class ./src/org/myorg/*.java

3. Create manifest.txt file

cd project/class;
vim manifest.txt;

The content of manifest.txt is

Main-Class: org.myorg.myWordCount

Leave an empty line at the end of manifest.txt

3. Generate jar file

jar -cvmf manifest.txt myWordCount.jar org

flag meaning:
c: Indicates that you want to create a jar file.
v: Produces verbose output on stdout while the JAR file is being built. The verbose output tells you the name of each file as it's added to the JAR file.
m: Used to include manifest information from an existing manifest file. The format for using this option is: jar cmf existing-manifest jar-file input-file(s)
f: The f option indicates that you want the output to go to a jar file rather than to stdout.

4. Put input data on HDFS

mkdir input
echo "hadoop is fast" > input/file1
echo "Hadoop is amazing" > input/file2
hadoop fs -put input /user/hadoop

5. Run the program

hadoop jar myWordCount.jar /user/hadoop/input /user/hadoop/output

Note:
Sometimes I met with this error:
14/12/05 03:59:03 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
14/12/05 03:59:03 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String).

This is because previously I need to run some Hadoop java class files directly. In order to run them, I have to `export HADOOP_CLASSPATH=<Location of java class file>`. When run jar files, I need to `unset HADOOP_CLASSPATH`, then the error is gone.

References:
Mapreduce 1.2 tutorial
Mapreduce 2.5 tutorial
Stackoverflow

Sunday, August 17, 2014

Git study notes

I'm studying Git recently. The website I'm following is GitGuys. It took me about 10 hours to finish all the topics there and it's really helpful. Some notes taken for future reference.

1. Tracking branch vs Remote tracking branch
http://www.gitguys.com/topics/tracking-branches-and-remote-tracking-branches/
2. Reset local master to track remote master
Method 1:
  #Rename your local master branch
               git branch -m master _old_master_branch_
          #Create a new master branch from a remote source
               git checkout -b master origin/master
Method 2:
          git fetch remoteSource
          git reset --hard remoteSource/master
   For method 2, you are throwing away all your changes in current master branch.

—————————————Notes for commands—————————————————————

git hash-object [file_name] // See the full sha1 hash of the file
git ls-files --stage // show list of files in (staging area)/(git index)
git ls-files --stage --abbrev // abbreviate the hash
git show [file_hash_code] //show the content of the file
git cat-file -p HEAD // display the most recent commit and find the git tree it refers to
git ls-tree [tree_hash_code] - -abbrev // display the content of the tree
git tag [tag_name] -m “Input your tag message here”// tag the current state of the repository
git cat-file -p [tag_name] // display details of a tag
git tag -m “Input your tag message here” [tag_name] [hash_code_for_commit_you_wanna_tag]
git tag -l // get a list of tags
git checkout [tag_name] // check out files that were tag with [tag_name]
git ls-files // Show what files are in the git index
git mv README readme // rename a file in the index and the working directory from README to readme
git rm [file_name] --cached// Remove a file from the index while keep it at the actual location
git diff --cached // Differences between the index and the most recent commit
gt diff // Differences between the working directory and the index
git diff HEAD // Differences between the working directory and the most recent commit
git show // Shows both the commit and the difference between the commit and any parent commit
git show HEAD~ //Show the commit from the parent of the HEAD
git show HEAD~2 //Show the commit from the grandparent of the HEAD
git show [hash_code_for_commit]
git branch -d [branch_name] // Delete a branch
git show-branch // Show the branches and the commit history. A “*”begins the line if this branch is the current branch. A “!”begins the line if this branch is not the current branch. The branch name is reported, in [brackets]. The branch’s most recent commit is shown. Below the “—“ line is the commit history. Commits before the common ancestor are shown, until the point where the branches have a common ancestor.
git stash // Temporarily stashing your work
git show [branch_name]:[file_name] // See the content of [file_name] from branch [branch_name]
When we have conflicts when running git merge, command 27 and 28 are there to help.
git ls-files -u // Show which files need merging. 1: The “common ancestor”of the file. 2: The version from the current branch. 3: The version from the other branch.
git show :1:[file_name] // show file in stage 1
git remote // List the names of remote repositories
git push origin master // push the branch names master to the remote repository named origin
git pull origin [branch_name] // When you are on branch [branch_name], you can pull the newest changes from remote repo by this command
git branch --set-upstream [branch_name] origin/[branch_name] // Do this once to avoid typing command 31 every time for pull
git remote prune origin // remove all tracking branches that have been deleted in the remote repo. But it won’t delete the actual local branch
git branch --track [branch_name] origin/[branch_name] // Same result as 34
git branch -r // Show remote tracking branches
git branch -a //Show all branches
git remote -v //Show basic info about the default remote
git remote show origin //Show a lot about a remote