Creating an RPM for a Java Application
Ok, so why would I want to do that anyway?
Well, there can be several reasons. I wanted to deploy my application on several CentOS boxes. All these boxes are hooked up to a central repository server. To deploy any application to these appliance, the RPM for that application needs to be added on this repo server. The boxes are synched with the repo server automatically. So basically, if I add an RPM on this repo server, the boxes can easily grab it just like any other application. No user intervention required in the whole process.
The ground work
There are several online guides and tutorials that describe how to build an RPM. The ones that I read were -
- https://pmc.ucsc.edu/~dmk/notes/RPMs/Creating_RPMs.html
- http://docs.fedoraproject.org/drafts/rpm-guide-en/ch-creating-rpms.html
- http://genetikayos.com/code/repos/rpm-tutorial/trunk/rpm-tutorial.html
- http://www.ibm.com/developerworks/library/l-rpm1/
While all of the above gave me the basic concepts and the general idea of how to build an RPM package, I found several key pieces of information missing which caused confusion. Also, none of them explained it from a java developer’s perspective. My goal in this blog is to document my findings, the information that would have saved me a week of effort had it been given in the above refered articles, for myself [it's amazing how soon you forget things
], and for any other poor soul who is banging his head trying to build an RPM for a Java application.
I advise you to go through all of the above articles before reading further.
The Standard Steps
All of the following steps are to be done on a Linux box, the one where you want to build the RPM.
Step 1: Install the package rpm-build
yum install rpm-build
or if you are not logged in as root but your account is in the sudo list -
sudo yum install rpm-build
Notice that the package name is rpm-build but the command that you use for building is rpmbuild (not rpm-build) :rolleyes:
Step 2: The .rpmmacro file
You need to create this file contain the following two lines in your home directory.
%_topdir /home/hdeshmukh/rpm %_tmppath /home/hdeshmukh/rpm/tmp
This file basically tells rpmbuild to use your personal account space for building the rpm instead of using shared space.
Step 3: Create the directory structure
rpmbuild requires the following directory structure under %_topdir directory -
The following command makes these directories -
$mkdir ~/rpm ~/rpm/BUILD ~/rpm/RPMS ~/rpm/RPMS/noarch ~/rpm/SOURCES ~/rpm/SPECS ~/rpm/SRPMS ~/rpm/tmp
Key Information – rpmbuild builds rpms for various CPU architectures such as i386 and i686. For a Java application, you don’t care about that. So you just need to create one directory named ~/rpm/RPMS/noarch instead of multiple directories such as ~/rpm/RPMS/i386, one for each architecture.
Key information for bundling a Java Application
Impedence Mismatch
As you know (I am assuming that you have gone through the above mentioned URLs) that rpmbuild actually 1. “builds” the sources and 2. “installs” the output of the build process. Now, in Linux world, this basically means compiling the sources using “make”. The place where I was stuck and frustrated was the liberal reference to “make” in the above mentioned articles. I know it is a build tool but honestly speaking, I have never used make and I do not care about it. I don’t know what exactly it does, what cfg file in what path does it need, what does it create and where it puts what it creates. The next stumbling block for me was the “install” process. All the articles refer to “make -install”. Again, I have no clue what/how/where does it install. May be there is a config file somewhere that it reads, but I don’t know.
Build Process in the Java World: In Java world, you have, for “sources”:
- a set of .java and .properties files (organized in some application specific directory structure)
- a set of third party jars
- an ant build.xml file that describes the build process
The build process uses ant (instead of make) i.e. ant <targetname> and it uses the build.xml to generate the final output, usually, in the form of a jar file.
Install Process in the Java World: For a simple application, you can just drop your final jar anywhere you like and just run java -jar <jarname>. For a complex application, such as an enterprise application (a war or an ear), you might have to drop the war or ear to appropriate directory of the application server. In some cases, you might even want to explode the jar to an approriate directory on the machine and then execute some command like java -classpath ./lib/a.jar:./lib/b.jar -Dx=1 -Dy=2 com.mycomp.myapp.AppStarter to run your application.
With the above discussion in reference, I will now introduce the heart of the build process, the spec file :drumroll: The lines in the bold font is the code that goes in the spec file and the lines in the regular font are my annotations.
The .spec file
Summary: Lease Alert Monitor
Name: LeaseAlert Should not have any space.
Version: 1
Release: 1 Name version and release become part of the name of the output rpm file. In this case it will be LeaseAlert_1_1.rpm
License: Restricted
Group: Applications/System
BuildRoot: %{_builddir}/%{name}-root BuildRoot is the directory where rpmbuild will “install” the output of the “build” process (whatever that process it). %{_builddir} points to %_topdir/BUILD/ so our BuildRoot will be ~/rpm/BUILD/leasealert-root (%_topdir is defined in .rpmacros to be ~/rpm)
URL: http://mycompany.net/
Vendor: Mycompany
Packager: Hanumant Deshmukh
Prefix: /usr/local In the “install” process (specifed in the spec file below), you have to specify the exact directory path where you want to install (i.e. copy the files, basically) on the machine where the application is being installed. You may believe that the application will be installed in say, /usr/local/javaapps/leasealert directory, but at the install time, the machine may not have /usr/local or the user may not want to install it there. May be the user wants to install it a /home/hdeshmukh/javaapps/leasealert The Prefix value specifies what part of your install directory is changeable by the user. So when a user installs your RPM, he can specify the prefix and the application will be installed under <prefix>/javaapps/leasealert instead of /usr/local/javaapps/leasealert.
BuildArchitectures: noarch For java apps, you don’t care about the CPU architecture
%description
Lease Alert Monitor
%prep For java apps, there is nothing to prepare. However, in some cases, you might want to pull the sources from a source code control system. The command to pull the sources (and all the files that are required for building) and put it in the SOURCES directory (explained in build section below) should go here.
%build
pwd This just shows where rpmbuild is executing from.
cd %{_sourcedir} When rpmbuild reaches the build section, it is in BuildRoot directory (specified above in the beginning of the spec file). But our sources are in the SOURCES directory. %{_sourcedir} is a standard variable available in rpmbuild and in our case it points to ~/rpm/SOURCES. The goal here is to cd to the directory from where ant can find the build.xml and execute the build target.
ant tar My build.xml is in the top of the SOURCES directory and the name of my target is tar. The output of this ant command is a tar file named leasealert.tar in ~/rpm/BUILD directory. Please see the description of my ant file below to learn the contents and structure of this tar file.
%install
pwd At this time, your current directory is BuildRoot. This is where the application will be installed on YOUR machine (i.e. the machine on which you are creating the rpm) and NOT ON end user’s machine. When the rpm is installed on the end users machine, the directory path upto BuildRoot will be removed. So if you install your app (on your rpm build machine) in <BuildRoot>/usr/local/javaapps/leasealert, when the user installs your rpm, the application will be installed in /usr/local/javaapps/leasealert. Of course, the user may specify the prefix as /home/hdeshmukh, in which case the app will be installed in /home/hdeshmukh/javapps/leasealert
rm -rf $RPM_BUILD_ROOT This is just to make sure that the BuildRoot is empty. $RPM_BUILD_ROOT points to ~/rpm/BUILD/leasealert-root in my case. Note that this command will NOT be executed on end user’s machine.
mkdir -p $RPM_BUILD_ROOT/usr/local/edns/standalonejava/leasealert The path after $RPM_BUILD_ROOT, is where you want (subject to the prefix change) the application to be installed on the end user’s machine. In my case, it is /usr/local/edns/standalonejava/leasealert. This directory will be created on end user’s machine if it does not already exist. Not because of this mkdir command here but because the rpm file will contain the application in /usr/local/edns/standalonejava/leasealert directory and will explode the contents in this directory on end user’s machine. In case of a web application, you should give the path to your application server’s document root directory.
cd $RPM_BUILD_ROOT/usr/local/edns/standalonejava/leasealert
tar -xf $RPM_BUILD_ROOT/../leasealert.tar My install process just requires me to explode the tar file in $RPM_BUILD_ROOT/usr/local/edns/standalonejava/leasealert directory. In case of a web app, you may want to explode it in the application server’s document root directory. So you have to make that directory (in the previous step) before exploding the tar.
%clean
rm -rf $RPM_BUILD_ROOT This is executed after the rpm file is already built. So no need to keep this stuff anymore.
%files Here, you have to list ALL the files that you want to copy on the end user’s machine. The path to the files is as per the deployment structure used under BuildRoot. So no need to specify $RPM_BUILD_ROOT/usr/local…
%defattr(-,root,root)
/usr/local/edns/standalonejava/leasealert/leasealert.jar
/usr/local/edns/standalonejava/leasealert/leasecapacitymonitor.properties
%attr(755,root,root) /usr/local/edns/standalonejava/leasealert/run.sh This is the shell script file that contains the java command to excute my main class along with some properties, so I want to make it executable.
/usr/local/edns/standalonejava/leasealert/lib/mail.jar
%changelog
* Tue Oct 20 2008 Hanumant
- Created initial spec file
I hope the contents of the spec file are clear. Now, run $rpmbuild -ba ~/rpm/SPECS/leasealert.spec to build the rpm. LeaseAlert_1_1.rpm should be created in /RPMS/noarch directory.
The build.xml file
The deployment structure of my standalone java application is follows -
<deploydirectory>/leasealert.jar <– contains application class files
<deploydirectory>/leasecapacitymonitor.properties
<deploydirectory>/run.sh <-shell script to run the application
<deploydirectory>/lib/mail.jar <– third party jarfiles
In case you are wondering about the contents of run.sh, it is quite simple:
#!/bin/sh
java -classpath .:leasealert.jar:./lib/mail.jar com.mycompany.MyClass
As you have probably already noticed in the spec file, I want the deploy directory to be /usr/local/edns/standalonejava/leasealert
Now, the goal of my build.xml is to compile the sources and bundle all the stuff (jar file, lib, props, and run.sh) into a single tar file named leasealert.tar. The directory structure contained within the tar file should be such that, when exploded during the “install” process of rpmbuild, it should reflect the deployment structure. The following is how I achieved it -
<project name=”leasealert” basedir=”/home/hdeshmukh/rpm/” default=”main”>
<property name=”rpmroot.dir” value=”/home/hdeshmukh/rpm”/>
<property name=”lib.dir” value=”${rpmroot.dir}/SOURCES/lib”/> All my third party jars are here.
<property name=”src.dir” value=”${rpmroot.dir}/SOURCES/src”/> All .java files, and property files are here.
<property name=”build.dir” value=”${rpmroot.dir}/BUILD”/> This is where the the final leasealert.tar file is generated and put by ant.
<property name=”classes.dir” value=”${build.dir}/classes”/>This is where the .class files are generated by ant.
<property name=”jar.dir” value=”${build.dir}/jar”/> This is where the the leasealert.jar file is generated and put by ant.
<property name=”main-class” value=”com.mycompany.myapp.MyClass”/>
Standard ant stuff —
<path id=”classpath”>
<fileset dir=”${lib.dir}” includes=”**/*.jar”/>
</path>
<target name=”clean”>
<delete dir=”${build.dir}”/>
</target>
<target name=”compile”>
<mkdir dir=”${classes.dir}”/>
<javac srcdir=”${src.dir}” destdir=”${classes.dir}” classpathref=”classpath” source=”1.6″ target=”1.6″/>
</target>
<target name=”jar” depends=”compile”>
<mkdir dir=”${jar.dir}”/>
<jar destfile=”${jar.dir}/${ant.project.name}.jar” basedir=”${classes.dir}”>
<manifest>
<attribute name=”Main-Class” value=”${main-class}”/>
</manifest>
</jar>
</target>
<target name=”tar” depends=”jar”> This generates the final leasealert.tar
<copy file=”${src.dir}/run.sh” tofile=”${jar.dir}/run.sh” />
<copy toDir=”${jar.dir}/” >
<fileset dir=”${src.dir}”>
<include name=”**/*.properties”/>
</fileset>
</copy>
<mkdir dir=”${jar.dir}/lib”/>
<copy todir=”${jar.dir}/lib”>
<fileset dir=”${lib.dir}”>
<include name=”**/*.*”/>
</fileset>
</copy>
<tar destfile=”${build.dir}/leasealert.tar” basedir=”${jar.dir}”/>
</target>
<target name=”clean-build” depends=”clean,jar”/>
<target name=”main” depends=”clean, tar”/>
</project>
I am assuming that you already have ant installed on your build machine. If not, you can build your stuff elsewhere and just copy the outout into the BUILD directory of rpm. You can install ant using the command- sudo yum install apache-ant
Closing Remarks
The approach that I have chosen here is to use build mechanism of rpmbuild to kickoff the ant build process. It seems, ant 1.7 has a rpmbuild task that allows you to kickoff the rpmbuild process from an ant build process. Don’t get too excited though, because you require the same spec file in this approach as well
I have ant 1.6, which does not have rpmbuild task and I was too exhausted to upgrade it and generate an rpm using this approach. So I leave the details of how to do that as an excersize for the readers
As always, comments welcome!
Firing Up LAMP Update 2
It has been a long time since I investigated this. Here is a quick recap of what happened while working on this earlier: I tried installing Quercus 3.1.3 on Resin 3.0.24 and the sample programs worked fine. Then I tried installing phpBB3, (which was at RC6 stage) and ran into a bug in Quercus which stalled the installation. I googled the issue and found that other people also encountered the same. Folks at Caucho mentioned on their forum that they were working on this issue. Since all of the pieces involved were in beta/non-production stage, I wasn’t too interested persuing this further.
Well, situation has now changed. phpBB3 has been released as a production version and Caucho folks have fixed the issue in Resin 3.1.4. So I decided to give it a try again and the following are my finding/observations.
1. Installing Quercus
Quercus is implemented as a servlet and is bundled in quercus.jar. It also depends on resin-util.jar and script10.jar Important thing is that all these jar files are already bundled with Resin and are present in <resin>/lib directory. So there is no explicit “installation” of Quercus as such. It is already there. For other app server, these files should be added to their lib folder.
Since Quercus is exposed as a servlet, any webapp that wishes to serve phps, must configure QuercusServlet to service .php requests. This is done by putting the following entry in any webapp’s web.xml file:
<servlet>
<servlet-name>Quercus Servlet</servlet-name>
<servlet-class>com.caucho.quercus.servlet.QuercusServlet</servlet-class><!– Tells Quercus to use the following JDBC database and to ignore the
arguments of mysql_connect().
–>
<init-param>
<param-name>database</param-name>
<param-value>jdbc/phpbb3</param-value>
</init-param></servlet>
<servlet-mapping>
<servlet-name>Quercus Servlet</servlet-name>
<url-pattern>*.php</url-pattern>
</servlet-mapping>
As you can see, I have also configured a database connection that Quercus will use. The free version of Quercus cannot use arbitrary database connections from php script. Regardless of the connection parameters specified in php code, connection specified by this entry is used. In this case, I have specified a JDBC connection in conf/resin.conf file and mapped it to jdbc/phpbb3.
<database>
<jndi-name>jdbc/phpbb3</jndi-name>
<driver type=”com.mysql.jdbc.Driver”>
<url>jdbc:mysql://localhost:3306/phpbb3</url>
<user>abc</user>
<password>abcpass</password>
</driver>
<prepared-statement-cache-size>8</prepared-statement-cache-size>
<max-connections>20</max-connections>
<max-idle-time>30s</max-idle-time>
</database>
This completes the configuration required to use Quercus.
2. Installing phpBB3
I just exploded the phpBB3 distribution in resin\webapps\phpbb3 folder and created web.xml file (web.xml for phpbb3) containing the entries made in step 1, in its WEB-INF folder . That’s it.
I started up resin server and access http://localhost:8080/phpbb3 I got the phpbb3 installation screen, followed the prompts and every thing went smooth. No issues. After installation, I was able to create and access forums, topics, and posts. I must say that at this point I haven’t checked out all the functionality of phpBB3.
So overall, everything seems to be working fine. I will now play with this set up and try to integrate it with some JEE application.
Firing Up LAMP
I am currently facing a problem with an enterprise website. This website contains some webapps that are based on the JEE stack. Some of these webapps are custom developed and some are opensource apps. Some of the webapps are also hooked up to backend legacy systems through an ESB. So far so good!
Now, I want to leverage another opensource webapp and this webapp is based on LAMP. While LAM is ok, it is P that I have a problem with. To get this webapp up and running would have been no issue had it been the only webapp I was interested in. But I need to integrate this webapp with some backend components that are written in Java. Yes, PHP does have some modules that can be used to do so but I feel that such an integration is not seamless. Also, I am not using Apache but my application server itself as my webserver. So I will need to figure out how to plug in PHP engine into that. Then there is some personal reason as well … I like to develop in Java and I would like to keep the PHP stuff to a minimum. I would have coded up the PHP app in JEE but it feels such a waste of time reinventing the wheel.
So basically, I was looking for something that will allow me to easily integrate PHP with my JEE environment and apparantly I have hit the jackpot!!! Folks from Caucho, who are well known for their high performance servlet engine called Resin, have developed a cool technology called Quercus that implements PHP engine in pure Java. Here is why I am drooling…
1. Non intrusive – It is just a war file that you can install in any servlet container. So I can keep my existing set up as it is. No messy mod settings.
2. Fast – PHP files are compiled to Javabyte code (just like JSP files are) and as per their benchmark results, it runs up to 6x faster than apache-php combination.
3. Seamless Integration with Java – Take a look at this :
<?php
$my_bean = jndi("java:comp/env/ejb/my-session-bean");
$my_bean->doStuff("my-argument");
?>
You can get hold of any of your existing Java components and use them right from PHP. Not that you would want to do this on a regular basis, but you can if you need to as a tactical solution!
4. Breaks the barrier – Most importantly, I think it breaks the barrier between the Java and the PHP world and lets their waters mix. For example, I can hook up an opensource CRM system based on PHP with an existing Java based OMS. It is well known that PHP is excellent for quick prototyping of webpages and with Quercus, so I can take advantage of that while at the same time I can use JEE in developing complex enterprise applications.
Well, to me, it does sound really good on paper and in the next couple of weeks I am going to try this thing out and see if it really delivers what it promises. So here is what I am going to do …
1. Set up Quercus – first on Resin and then on Tomcat.
2. Make phpBB3 work on Quercus.
3. Hook up certain pbpBB functionality with a JEE based webapp and with some session beans.
Stay tuned …
Terracotta and GridGain comparison…
One of my objectives with this exercise was to be able to understand which one of these tools should we use in what situations. It looks like we can get some inferences with the help of this application.
Initially, before implementing this app on GridGain, I was not too sure on how to implement it so that we can have some similarity with the implementation on TerraCotta. In Terracotta, we shared a Jobs class instance accross JVMs and we started multiple Producers and Consumers who would look at the same Jobs instance and add/remove a Job to/from that instance. In effect, we were able to take advantage of multiple nodes by starting up either a Consumer or a Producer, as required, and achieved better performance. In other words, “sharing” enabled us take advantage of multiple machines. So I was hung up on finding out how to share things on GridGain.
After discussing this with Dimitriy, I learned that it would not be correct to look at GridGain from a “sharing” perspective. We should look at it from a “task” perspective…what task can be made a unit of work and can be executed on other machines. In this application, Job.run() is such a task. In Terracotta, we isolated Producers and Consumers, while in GridGain, we isolated Job.run() method.
However, one drawback of this application is that Job.run() is a completely independent task and does not depend on anything. So no sharing or coordination between two JVMs is required. In Terracotta solution we were able to see how such coordination can be done among threads running on multiple machines but our application doesn’t touch this aspect on GridGain. I will try to modify it such that we can see how coordination can be achieved on GridGain. Any suggestions would be welcome!
Another important aspect of GridGain that we haven’t touched upon in this application is how to split a task, execute the parts on multiple nodes, bring back the results, join the results, and return the final output. I think this kind of a situation will take care of our sharing scenario as well.
Later…
Off my mark with GridGain…
Dmitriy from GridGain was kind enough to point out that for a simple application like this, not much code needs to be written or modified. As he explains in his comment on my previous post, I used his suggestions and was able to run the application.
All I did was the following -
1. Made Job class implement Serializable.
2. Used @Gridify annotation on Job.run() method. (I think I should have named it execute instead of run to avoid unnecessary confusion with Thread.run()).
3. In the Main, inserted GridFactory.start(), Thread.sleep() and GridFactory.end() .
public Main() {
try {
GridFactory.start();
new Producer(jobs).start();
new Consumer(jobs).start();//not sure how many Consumers should I create.
//new Consumer(jobs).start();
Thread.sleep(Long.MAX_VALUE);
} catch (Exception e) {
e.printStackTrace();
} finally {
GridFactory.stop(true);
}
}
4. Added libraries (gridgain jar, other supporting jars, aspectjweaver jar) to the project.
5. Added -DGRIDGAIN_HOME and javaagent to VM parameters. BTW, for some reason, GridGain refused to start ( gridgain.bat from cmd line) when GRIDGAIN_HOME was set to “C:\Program Files\gridgain-1.5.1\bin”. But when I changed the blackslash to forward slash, it worked! This is on WinXP.
So after these steps, I was able to see that Job.run() was being shipped off to different nodes. At this time, I have a few questions -
1. What happens when a Consumer picks up a Job from Jobs and calls job.run(). Since the run() method is gridified, when is that Consumer ready to pick up the next job? Immediately after the run() method is shipped off to be executed to another node? or after the run() method is done execution on the other node? What I am expecting is that since the execution of run() is shipped off to another node, the consumer should pick up the next available job and ship it off to another node. Is that a valid expectation? Is this happening is this sample application?
2. This question depends on the answer of the first one. How many Consumers should I start from Main? Starting a consumer is done through the code while number of grid nodes can be changed (by killing or starting new nodes) at anytime. So how do I make sure all the nodes of the grid are utilized. If a Consumer becomes ready to pick up a new Job as soon as it pick up one Job and sends it off to another node, then I just need one Consumer. If the Consumer.run() on the Main node waits until the Job.run() finishes execution on the remote node, then I need to start as many consumers as I have grid nodes.
May be Dmitriy can throw some light on these questions
Working with GridGain
From what I understand about GridGain, basically, you have to identify a task (called GridTask) that can be split up and the splits (called GridJobs) can then be thrown on to multiple machines (called GridNodes). The original task (the one that we split up), can wait for the results of all the GridJobs and once they are ready it can combine them and return the final output. This requires a fair amount of code change (as compared to Terracotta) if you already have something running that you want to run on multiple machines. Of course, in Terracotta you would probably spend that time in configuration instead of java code changes. So, I believe, in terms of amount of efforts, there isn’t much difference.
Based on this understanding, I am still trying to figure out how to make use of GridGain in our Producer-consumer scenario. It was fairly intuitive to do on Terracotta but I am not yet sure how to proceed with GridGain. May be this scenario is more suitable for Terracotta.
Let’s see…
Working with Terracotta
Terracotta allows us to specify, through an XML based configuration file, what objects do we want to share accross JVMs. In our application, we want a single Jobs instance to be shared accross JVMs. Our Producers will add jobs to the same Jobs instances and our consumers will take out jobs from the same Jobs instance.
We will create a new starter class called Main that can spawn either a Producer or a Consumer thread depending on the command line argument:
The Main class:
public class Main {
Jobs jobs = new Jobs();
public Main(boolean isProducer){
if(isProducer) new Producer(jobs).start();
else new Consumer(jobs).start();
}
public static void main(String[] args){
new Main(args.length>0 &&
"producer".equals(args.length[0]));
}
}
All we want to do now is to tell Terracotta to share jobs field of this Main class and make sure that all the Producers and Consumers use the methods of Jobs class in a mutually exclusive manner. The config file to do this is really simple -
<?xml version="1.0" encoding="UTF-8"?>
<tc:tc-config xmlns:tc="http://www.terracotta.org/config">
<application>
<dso>
<roots>
<root>
<field-name>Main.jobs</field-name>
</root>
</roots>
<locks>
<autolock>
<method-expression>* Jobs*.*(..)</method-expression>
<lock-level>write</lock-level>
</autolock>
</locks>
<instrumented-classes>
<include><class-expression>.*</class-expression></include>
</instrumented-classes>
</dso>
</application>
</tc:tc-config>
That’s all we need to do in terms of coding!!!
Running the application:
To run the application,
1. First we need to run the Terracotta Server using start-tc-server.bat available in the bin directory of Terracotta installation.
2. Launch our Main class using dso-java.bat script (available in bin) instead of directly using java. We pass the config file name as a system property and our class name :
c:\works\test\>dso-java -Dtc.config tc-config-pc.xml Main consumer
I started up the consumer first and I can see that it gets stuck on the wait() because there is no Job in Jobs.
3. Start the producer in another cmd window:
c:\works\test\>dso-java -Dtc.config tc-config-pc.xml Main producer
That’s it!!! As soon as producer starts putting Job instances in jobs, the consumer starts getting them. No RMI, No EJB, No CORBA and we have shared an object with multiple JVMs. No code changes required in the main fiunctionality to make it work on multiple JVMs.
To understand how it works under the hood, please do read the documentation at Terracotta website.
Parallel Computing in Java
I have been reading a bit about how to make our Java applications scalable. Besides standard performance techniques that one can apply to fine tune one’s application, I was also trying to find out what I can do if my Java application is performing at its best but it is not enough. What if one server is just not enough to perform a task in the time required by an SLA ? How to employ multiple machines to perform such a task? This is different from Clustering which has to be done at the application server level and seems more suitable for “load balancing” kind of requirement. It can be used to do multiple tasks at multiple places but cannot be used to do one task at multiple places.
Enter, Terracotta and GridGain.
While Terracotta allows you to make your objects shared accross JVMs, GridGain seems to be a more pure parallel computing type of environment. Apparently, both of these tools can be used to make your application take advantages of multiple machines.
The Approach
To understand how they work, I am going to implement at simple producer-consumer scenario where there are producers of Jobs and consumers that take up those Jobs. It is the standard multi-threaded producer-consumer scenario except that the we are going to have the consumers (and the producers as well, if required) running on multiple machines instead of multiple threads on one machine. The idea is to employ multiple machines to do the jobs instead of multiple threads working on the same machine.
Let’s see some code now …
The basic producer consumer scenario -
First let’s define what our producers and consumers will work on –
A Job :
//imports
public class Job {
int jobduration = (int) (Math.random()*5000);
public void run(){
try {
Thread.sleep(jobduration);
System.out.println("Job finished in " + jobduration + " millis.");
} catch (InterruptedException ex) {
ex.printStackTrace();
}
}
}
Jobs – a container to hold all the jobs that need to be done :
//imports
public class Jobs {
//we can also use BlockingQueue and avoid writing our own synchronization logic
//Come to think of it, if we use BlockingQueue, we won't need Jobs class at all.
//But this is good for learning the Terracotta stuff.
private Queue list = new LinkedList();
public Job getJob() {
while(true)
{
synchronized(this)
{
try{
if(!list.isEmpty()) return list.remove();
else this.wait();
}catch(Exception e){
e.printStackTrace();
}
}
}
}
public void addJob(Job job) {
synchronized(this){
list.offer(job);
this.notifyAll();
}
}
}
Let’s now look at the producer and the consumer code.
The producer :
//imports...
public class Producer extends Thread {
Jobs jobs;
public Producer(Jobs j){
jobs = j;
}
public void run(){
while(true){
try {
//sleep randomly for up to 5 seconds.
Thread.sleep((int) (Math.random()*5000));
jobs.addJob(new Job());
} catch (InterruptedException ex) {
ex.printStackTrace();
}
}
}
}
The consumer :
//imports...
public class Consumer extends Thread{
Jobs jobs;
public Consumer(Jobs j){
jobs = j;
}
public void run(){
while(true){
Job job = jobs.getJob();
if(job!=null) job.run();
}
}
}
In a regular, single JVM application, we would create a shared Jobs instance and create as many number of Producer and Consumer threads as we want passing them the same Jobs instance.
For example:
public class OldMain {
Jobs jobs = new Jobs();
public OldMain(){
new Producer(jobs).start();
new Consumer(jobs).start();
new Consumer(jobs).start();
}
public static void main(String[] args){
new OldMain();
}
}
Here, we are running two Consumers in the same JVM, which doesn’t really add much value unless it is running on multiple CPUs. So, what we want to do is to run Consumers on multiple machines, while all picking up jobs from the same Jobs instances.
-
Recent
-
Links
-
Archives
- October 2008 (1)
- January 2008 (1)
- October 2007 (3)
- September 2007 (3)
-
Categories
-
RSS
Entries RSS
Comments RSS