Hanumant’s Java Workshop

Turbo Charged Java Development!

Working with GridGain

From what I understand about GridGain, basically, you have to identify a task (called GridTask) that can be split up and the splits (called GridJobs) can then be thrown on to multiple machines (called GridNodes). The original task (the one that we split up), can wait for the results of all the GridJobs and once they are ready it can combine them and return the final output. This requires a fair amount of code change (as compared to Terracotta) if you already have something running that you want to run on multiple machines. Of course, in Terracotta you would probably spend that time in configuration instead of java code changes. So, I believe, in terms of amount of efforts, there isn’t much difference.

Based on this understanding, I am still trying to figure out how to make use of GridGain in our Producer-consumer scenario. It was fairly intuitive to do on Terracotta but I am not yet sure how to proceed with GridGain. May be this scenario is more suitable for Terracotta.

Let’s see…

September 27, 2007 Posted by | Java | 1 Comment

Working with Terracotta

Terracotta allows us to specify, through an XML based configuration file, what objects do we want to share accross JVMs. In our application, we want a single Jobs instance to be shared accross JVMs. Our Producers will add jobs to the same Jobs instances and our consumers will take out jobs from the same Jobs instance.

We will create a new starter class called Main that can spawn either a Producer or a Consumer thread depending on the command line argument:

The Main class:

public class Main {
    Jobs jobs = new Jobs();
    public Main(boolean isProducer){
        if(isProducer) new Producer(jobs).start();
        else new Consumer(jobs).start();
    }

    public static void main(String[] args){
        new Main(args.length>0 && 
                "producer".equals(args.length[0]));
    }
}


All we want to do now is to tell Terracotta to share jobs field of this Main class and make sure that all the Producers and Consumers use the methods of Jobs class in a mutually exclusive manner. The config file to do this is really simple –

<?xml version="1.0" encoding="UTF-8"?>
<tc:tc-config xmlns:tc="http://www.terracotta.org/config">
 <application>
   <dso>
     <roots>
        <root>
           <field-name>Main.jobs</field-name>
        </root>
     </roots>
     <locks>
       <autolock>
         <method-expression>* Jobs*.*(..)</method-expression>
         <lock-level>write</lock-level>
       </autolock>
      </locks>
      <instrumented-classes>
         <include><class-expression>.*</class-expression></include>
      </instrumented-classes>
    </dso>
 </application>
</tc:tc-config>

That’s all we need to do in terms of coding!!!

Running the application:

To run the application,
1. First we need to run the Terracotta Server using start-tc-server.bat available in the bin directory of Terracotta installation.

2. Launch our Main class using dso-java.bat script (available in bin) instead of directly using java. We pass the config file name as a system property and our class name :

c:\works\test\>dso-java -Dtc.config tc-config-pc.xml Main consumer

I started up the consumer first and I can see that it gets stuck on the wait() because there is no Job in Jobs.

3. Start the producer in another cmd window:
c:\works\test\>dso-java -Dtc.config tc-config-pc.xml Main producer

That’s it!!! As soon as producer starts putting Job instances in jobs, the consumer starts getting them. No RMI, No EJB, No CORBA and we have shared an object with multiple JVMs. No code changes required in the main fiunctionality to make it work on multiple JVMs.

To understand how it works under the hood, please do read the documentation at Terracotta website.

September 26, 2007 Posted by | Java | 1 Comment

Parallel Computing in Java

I have been reading a bit about how to make our Java applications scalable. Besides standard performance techniques that one can apply to fine tune one’s application, I was also trying to find out what I can do if my Java application is performing at its best but it is not enough. What if one server is just not enough to perform a task in the time required by an SLA ? How to employ multiple machines to perform such a task? This is different from Clustering which has to be done at the application server level and seems more suitable for “load balancing” kind of requirement. It can be used to do multiple tasks at multiple places but cannot be used to do one task at multiple places.

Enter, Terracotta and GridGain.
While Terracotta allows you to make your objects shared accross JVMs, GridGain seems to be a more pure parallel computing type of environment. Apparently, both of these tools can be used to make your application take advantages of multiple machines.

The Approach
To understand how they work, I am going to implement at simple producer-consumer scenario where there are producers of Jobs and consumers that take up those Jobs. It is the standard multi-threaded producer-consumer scenario except that the we are going to have the consumers (and the producers as well, if required) running on multiple machines instead of multiple threads on one machine. The idea is to employ multiple machines to do the jobs instead of multiple threads working on the same machine.

Let’s see some code now …

The basic producer consumer scenario –

First let’s define what our producers and consumers will work on —

A Job :

//imports 
public class Job { 
    int jobduration = (int) (Math.random()*5000); 
    public void run(){ 
        try { 
            Thread.sleep(jobduration); 
            System.out.println("Job finished in " + jobduration + " millis."); 
        } catch (InterruptedException ex) { 
            ex.printStackTrace(); 
        } 
    } 
}

Jobs – a container to hold all the jobs that need to be done :

//imports 
public class Jobs { 
    //we can also use BlockingQueue and avoid writing our own synchronization logic 
    //Come to think of it, if we use BlockingQueue, we won't need Jobs class at all. 
    //But this is good for learning the Terracotta stuff. 
    private Queue list = new LinkedList();   

    public Job getJob() { 
        while(true) 
        { 
            synchronized(this) 
            { 
                try{ 
                    if(!list.isEmpty()) return list.remove(); 
                    else this.wait(); 
                }catch(Exception e){ 
                    e.printStackTrace(); 
                } 
            } 
        } 
    }   

    public void addJob(Job job) { 
        synchronized(this){ 
            list.offer(job); 
            this.notifyAll(); 
        } 
    } 
}

Let’s now look at the producer and the consumer code.

The producer :

//imports... 
public class Producer extends Thread { 
    Jobs jobs; 
    public Producer(Jobs j){ 
        jobs = j; 
    }   

    public void run(){ 
        while(true){ 
            try { 
                //sleep randomly for up to 5 seconds. 
                Thread.sleep((int) (Math.random()*5000)); 
                jobs.addJob(new Job()); 
            } catch (InterruptedException ex) { 
                ex.printStackTrace(); 
            } 
        } 
    } 
}


The consumer :

//imports... 
public class Consumer extends Thread{ 
    Jobs jobs; 
    public Consumer(Jobs j){ 
        jobs = j; 
    }   

    public void run(){ 
        while(true){ 
            Job job = jobs.getJob(); 
            if(job!=null) job.run(); 
        } 
    } 
}

In a regular, single JVM application, we would create a shared Jobs instance and create as many number of Producer and Consumer threads as we want passing them the same Jobs instance.

For example:

public class OldMain { 
    Jobs jobs = new Jobs(); 
    public OldMain(){ 
        new Producer(jobs).start(); 
        new Consumer(jobs).start(); 
        new Consumer(jobs).start(); 
    }   

    public static void main(String[] args){ 
        new OldMain(); 
    } 
}

Here, we are running two Consumers in the same JVM, which doesn’t really add much value unless it is running on multiple CPUs. So, what we want to do is to run Consumers on multiple machines, while all picking up jobs from the same Jobs instances.

September 26, 2007 Posted by | Java | 18 Comments