Sunday, September 14, 2014

Inheritance, polymorphism, abstract class and interface (Notes of reading Head First Java)

Inheritance
  • subclass inherits from superclass by using keyword extends
    • Ex: public class subclass extends superclass {}
  • A subclass inherits all public instance variables and methods of superclass, but does not inherit the private instance variables and methods
  • A nonpublic class can only be inherited by classes from the same package
  • Inherited methods can be overridden. The method cannot be overridden if it’s marked with final modifier
  • To invoke the superclass version of a method from a subclass that’s overridden the method, use the super keyword
    • Ex: super.method()
Polymorphism
  • superclass obj = new subclass() //The left is called reference type. The right is called object type
  • With polymorphism, the reference type can be a superclass of the actual object type.
  • obj.method() // If method is overridden in subclass, then method of subclass is called though the reference type is superclass
  • Besides assignment, we can also have polymorphic arguments and return types.
  • Rules for overriding
    • Arguments must be the same types and return types must be compatible.
    • The method can’t be less accessible.
  • Overloading vs overriding
    • Overloading is having two methods with the same name but different argument lists
    • The return types can be different
    • You can’t change only the return type
    • You can vary the access levels in any direction
Abstract class
  • The main reason for abstract class is that some classes just should not be instantiated
    • Ex: abstract public class superClass{}
  • The abstract method means the method must be overridden.
    • Ex: public abstract void method(); // There is no body for abstract method
  • Abstract class can contain both abstract and non-abstract methods
  • The first concrete class in the inheritance tree must implement all abstract methods
  • You cannot make a new instance of an abstract type, but you can make an array object declared to hold that type
  • Ex:
    • superClass obj = new superClass() // illegal
    • superClass[] objs = new superClass[5] // legal, each element inside objs can be either superClass or subClass
  • You can call a method on an object reference only if the class of the reference type actually has the method
  • Ex: 
    • public class subClass extends superClass{}
    • There is a method METHOD which is only defined in subClass
    • superClass obj = new subClass()
    • obj.METHOD() // illegal, cannot pass compiler
    • The way to workaround is:
      • obj_cast = (subClass) obj
      • obj_cast.METHOD() //legal
    • The compiler decides whether you can call a method based on the reference type, not the actual object type. The method you are calling on a reference must be in the class of that reference type. Doesn’t matter what the actual object is.
  • If a class contains abstract methods, the class must be declared as abstract
  • Every class in Java extends class Object
Interface
  • A Java interface is like a 100% pure abstract class
  • To define an interface
    • public interface intClass{}
  • To implement an interface
    • public class subClass extends superClass implements intClass{}
  • Interface methods are implicitly public and abstract, so typing ‘public’ and ‘abstract’ is optional
  • Benefits of interface
    • If you use interfaces instead of concrete subclasses as arguments and return types, you can pass anything that implements that interface
    • A class can implement multiple interfaces
  • A class that implements an interface must implement all the methods of the interface, since all interface methods are implicitly public and abstract

Saturday, September 13, 2014

Organize your code with package and jar (Notes of reading Head First Java)

I'm reading 'Head First Java' recently and here are some notes about package and jar.

Jar usage
  • Use jar file to compact your class files
    • /projects/class contains all classes files
    • /projects/source contains all java files
    • create manifest.txt under class
    • content of manifest.txt: 
    • Main-Class: <main_class> // don’t put .class on the end
    • Press the return key after typing the Main-Class line
  • Create jar file
    • cd /projects/class
    • jar -cvmf manifest.txt <file_name>.jar *.class
  • Execute jar file
    • java -jar <file_name>.jar
Organize your code in packages
  • Organize source code
    • /projects/source/com/shunrang/ contains all java files
    • Add this as the first line of all java files: package com.shunrang
  • Compile source code
    • mkdir /projects/class
    • cd /projects/souce
    • javac -d ../class com/shunrang/*.java
  • Run your code
    • cd /projects/class
    • java com.shunrang.<main_class>
    • The -d flag in javac will create the same structure under class folder as what is inside source folder
Combine Jar and package
  • Create manifest.txt file
    • Put manifest.txt under /projects/class
    • Content of manifest.txt: Main-Class: com.shunrang.<main_class>
  • Create Jar file
    • cd /projects/class
    • jar -cvmf manifest.txt <file_name>.jar com //All you specify is the com directory
  • Run Jar file
    • java -jar <file_name>.jar
  • Useful jar command
    • jar -tf <file_name>.jar //List the contents of a JAR, -tf stands for ‘Table File’
    • jar -xf <file_name>.jar //Extract the contents of a JAR, -xf stands for ‘Extract File'

Tuesday, August 26, 2014

Run wordcount on Hadoop 2.5

Wordcount is the hello_world program for mapreduce. When actually running it, I met with some problems. I'm using hadoop 2.5. Most tutorials are designed for Hadoop older than 2.0 and the code is slightly different for 2.5.

package org.myorg;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.util.*;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapred.JobConf;
public class myWordCount {
    public static class Map extends Mapper
        <LongWritable, Text, Text, IntWritable> {
            private final static IntWritable one = new IntWritable(1);
            private Text word = new Text();

            public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
                String line = value.toString();
                StringTokenizer tokenizer = new StringTokenizer(line);
                while (tokenizer.hasMoreTokens()) {
                    word.set(tokenizer.nextToken());
                    context.write(word, one);
                }
            }
        }
    public static class Reduce extends Reducer
        <Text, IntWritable, Text, IntWritable> {
            public void reduce(Text key, Iterator<IntWritable> values, Context context) throws IOException, InterruptedException {
                int sum = 0;
                while (values.hasNext()) {
                    sum += values.next().get();
                }
                context.write(key, new IntWritable(sum));
            }
        }
        public static void main(String[] args) throws Exception {
            Configuration conf = new Configuration();
            conf.set("mapreduce.job.queuename", "apg_p7");
            System.out.println("This is a new version");
            Job job = new Job(conf);
            job.setJarByClass(myWordCount.class);
            job.setJobName("myWordCount");
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);
            job.setMapperClass(myWordCount.Map.class);
            job.setCombinerClass(myWordCount.Reduce.class);
            job.setReducerClass(myWordCount.Reduce.class);
            FileInputFormat.addInputPath(job, new Path(args[0]));
            FileOutputFormat.setOutputPath(job, new Path(args[1]));
            job.waitForCompletion(true);
        }
}


There are some changes in the arguments of map and reduce, as well as the settings in main. To run the program under mapreduce, the following steps needs to be done.

1. Put source code under this location
/project/src/org/myorg/myWordCount.java

2. Compile java code
mkdir /project/class;
cd /project;
javac -classpath `yarn classpath` -d ./class ./src/org/myorg/*.java

3. Create manifest.txt file
cd project/class;
vim manifest.txt;

The content of manifest.txt is
Main-Class: org.myorg.myWordCount

Leave an empty line at the end of manifest.txt

3. Generate jar file
jar -cvmf manifest.txt myWordCount.jar org 
flag meaning:
c: Indicates that you want to create a jar file.
v: Produces verbose output on stdout while the JAR file is being built. The verbose output tells you the name of each file as it's added to the JAR file.
m: Used to include manifest information from an existing manifest file. The format for using this option is: jar cmf existing-manifest jar-file input-file(s)
f: The f option indicates that you want the output to go to a jar file rather than to stdout.


4. Put input data on HDFS
mkdir input
echo "hadoop is fast" > input/file1
echo "Hadoop is amazing" > input/file2
hadoop fs -put input /user/hadoop

5. Run the program
hadoop jar myWordCount.jar /user/hadoop/input /user/hadoop/output

Note:
Sometimes I met with this error:
14/12/05 03:59:03 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
14/12/05 03:59:03 WARN mapreduce.JobSubmitter: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).

This is because previously I need to run some Hadoop java class files directly. In order to run them, I have to `export HADOOP_CLASSPATH=<Location of java class file>`. When run jar files, I need to `unset HADOOP_CLASSPATH`, then the error is gone.


References:
Mapreduce 1.2 tutorial
Mapreduce 2.5 tutorial
Stackoverflow


Sunday, August 17, 2014

Git study notes

I'm studying Git recently. The website I'm following is GitGuys. It took me about 10 hours to finish all the topics there and it's really helpful. Some notes taken for future reference.

1. Tracking branch vs Remote tracking branch
http://www.gitguys.com/topics/tracking-branches-and-remote-tracking-branches/
2. Reset local master to track remote master
     Method 1: 
          #Rename your local master branch
               git branch -m master _old_master_branch_ 
          #Create a new master branch from a remote source
               git checkout -b master origin/master
  Method 2:
          git fetch remoteSource 
          git reset --hard remoteSource/master
   For method 2, you are throwing away all your changes in current master branch.

—————————————Notes for commands—————————————————————
  1. git hash-object [file_name] // See the full sha1 hash of the file
  2. git ls-files --stage // show list of files in (staging area)/(git index)
  3. git ls-files --stage --abbrev // abbreviate the hash
  4. git show [file_hash_code] //show the content of the file
  5. git cat-file -p HEAD // display the most recent commit and find the git tree it refers to
  6. git ls-tree [tree_hash_code] - -abbrev // display the content of the tree
  7. git tag [tag_name] -m “Input your tag message here”// tag the current state of the repository
  8. git cat-file -p [tag_name] // display details of a tag
  9. git tag -m “Input your tag message here” [tag_name] [hash_code_for_commit_you_wanna_tag]
  10. git tag -l // get a list of tags
  11. git checkout [tag_name] // check out files that were tag with [tag_name]
  12. git ls-files // Show what files are in the git index
  13. git mv README readme // rename a file in the index and the working directory from README to readme
  14. git rm [file_name] --cached// Remove a file from the index while keep it at the actual location
  15. git diff --cached // Differences between the index and the most recent commit
  16. gt diff // Differences between the working directory and the index
  17. git diff HEAD // Differences between the working directory and the most recent commit
  18. git show // Shows both the commit and the difference between the commit and any parent commit
  19. git show HEAD~ //Show the commit from the parent of the HEAD
  20. git show HEAD~2 //Show the commit from the grandparent of the HEAD
  21. git show [hash_code_for_commit]
  22. git branch -d [branch_name] // Delete a branch
  23. git show-branch // Show the branches and the commit history. A “*”begins the line if this branch is the current branch. A “!”begins the line if this branch is not the current branch. The branch name is reported, in [brackets]. The branch’s most recent commit is shown. Below the “—“ line is the commit history. Commits before the common ancestor are shown, until the point where the branches have a common ancestor.
  24. git stash // Temporarily stashing your work
  25. git show [branch_name]:[file_name] // See the content of [file_name] from branch [branch_name]
  26. When we have conflicts when running git merge, command 27 and 28 are there to help.
  27. git ls-files -u // Show which files need merging. 1: The “common ancestor”of the file. 2: The version from the current branch. 3: The version from the other branch.
  28. git show :1:[file_name] // show file in stage 1
  29. git remote // List the names of remote repositories
  30. git push origin master // push the branch names master to the remote repository named origin
  31. git pull origin [branch_name] // When you are on branch [branch_name], you can pull the newest changes from remote repo by this command
  32. git branch --set-upstream [branch_name] origin/[branch_name] // Do this once to avoid typing command 31 every time for pull
  33. git remote prune origin // remove all tracking branches that have been deleted in the remote repo. But it won’t delete the actual local branch
  34. git branch --track [branch_name] origin/[branch_name] // Same result as 34
  35. git branch -r // Show remote tracking branches
  36. git branch -a //Show all branches
  37. git remote -v //Show basic info about the default remote
  38. git remote show origin //Show a lot about a remote