rpm packaging java software
Saturday, January 5th, 2008You may know, that rpm supports (is that the right name?) “named” dependencies. That is, you can require arbitrary strings other packages provide, no matter if that arbitrary string is a package or not.
Take curl as an example, lets see what curl provides:
# rpm -q --provides libcurl4 libcurl.so.4()(64bit) libcurl4 = 7.16.4-16.2
You see the package provides the libcurl.so.4()(64bit) symbol. Now lets explore the requires:
# rpm -q --requires libcurl4 curl-ca-bundle /sbin/ldconfig /sbin/ldconfig rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(CompressedFileNames) <= 3.0.4-1 libc.so.6()(64bit) libc.so.6(GLIBC_2.2.5)(64bit) libc.so.6(GLIBC_2.3)(64bit) libc.so.6(GLIBC_2.3.4)(64bit) libc.so.6(GLIBC_2.4)(64bit) libcrypto.so.0.9.8()(64bit) libdl.so.2()(64bit) libdl.so.2(GLIBC_2.2.5)(64bit) libidn.so.11()(64bit) libssl.so.0.9.8()(64bit) libz.so.1()(64bit) rpmlib(PayloadIsBzip2) <= 3.0.5-1
curl depends on zlib. But it is not needed to have the package zlib installed. Any package “providing” the symbol “libz.so.1()(64bit)” will fulfill the requirement. If you have a megapackage-bundle rpm which includes lot of libraries, it could also fulfill curl’s requirement.
Advantages:
They are generated automatically from elf binaries, perl modules, etc. AFAIK Debian depend only on packages. If the depending package stops providing a library, you will not know until you get an error.
You don’t need to maintain the requirements manually and the spec file only lists what you need to build the package.
The requirements are more granular. If you depend on a big package, you are not really needed all the stuff. Therefore there is no need to split packages in thousands of smaller pieces.
Don’t couple the libraries with package names. AFAIK Debian does not support this. And a virtual package called “zengine” or whatever would need to be created so packages need to depend on it.
Disadvantages:
More noise. Metadata gets bigger.
More processing needed by the solver, as there are much more provides and requires.
This technique is not only used with native libraries, but with other languages. Look at mono:
# rpm -q --provides mono-web mono-web-forms mono-web-services mono-remoting mono(Mono.Http) = 1.0.5000.0 mono(Mono.Http) = 2.0.0.0 mono(System.Runtime.Remoting) = 1.0.5000.0 mono(System.Runtime.Remoting) = 2.0.0.0 mono(System.Runtime.Serialization.Formatters.Soap) = 1.0.5000.0 mono(System.Runtime.Serialization.Formatters.Soap) = 2.0.0.0 mono(System.Web) = 1.0.5000.0 mono(System.Web) = 2.0.0.0 mono(System.Web.Services) = 1.0.5000.0 mono(System.Web.Services) = 2.0.0.0
Did I mention that they are also used in kernel modules to match driver compatibility? The kernel package provides certain interface names, with a version which is actually some kind of hash of the signature of those interfaces. So a driver will depend on those.
# rpm -q --provides kernel-default kernel-default-nongpl kernel = 2.6.22.9-0.4 k_deflt k_numa k_smp smp kernel-smp kernel(drivers_ata) = 8c8c26cd48be2c29 kernel(drivers_char_tpm) = c2f46bb4192faaf6 kernel(fs_nfsd) = 5302e5e83fcef713 kernel(drivers_media_dvb_frontends) = 6c12beb7312724c0 kernel(drivers_video_matrox) = 9d4717fb264df90d kernel(drivers_block_paride) = c3185e6447e90578 ...
Now, lets get back to Java. I tried to package jruby, microemu and other small things. The amount of jars you need to build, which are not yet available in the distribution is amazing. You give up after three iterations. And the binary tarball with the jars is there as a temptation.
However, binary jars sometimes include everything you need to run the program. Including many java packages (namespaces) from external projects. For example, microemu includes nanoxml and asm.
However, the information that current java packages carry is minimal:
# rpm -q --provides ant apache-ant ant = 1.7.0-30
Even source packages, usually include the full source, but all binary jar’s required to build the source in a lib directory. We don’t want to include these jars in the rpm package (our goal is to include only what we compiled). But if these jars are not available in the distribution, we also don’t know the name of the package providing those jars.
However, we could build the source using those jars, and then make the rpm require the java namespaces the jar requires by examining the resulting jar. And therefore also the spec file could remove the namespaces included in the resulting jar which don’t belong to that package, and in that way we don’t provide the rpm package version’s version of those namespaces.
So, I discussed this with Pascal. He hacked a bash line to find the provides quicker than me
for jar in *.jar; do
jar tf "$jar"|grep '\.class$'|sed 's|/[^/]*\.class$||'|sort -u|while read package; do
echo "Provides: java(${package//\//.}) = %{version}-%{release}"
; done
; done
Which for ant, results in:
Provides: java(org.apache.tools.ant) = %{version}-%{release}
Provides: java(org.apache.tools.ant.dispatch) = %{version}-%{release}
Provides: java(org.apache.tools.ant.filters) = %{version}-%{release}
Provides: java(org.apache.tools.ant.filters.util) = %{version}-%{release}
Provides: java(org.apache.tools.ant.helper) = %{version}-%{release}
Provides: java(org.apache.tools.ant.input) = %{version}-%{release}
Provides: java(org.apache.tools.ant.listener) = %{version}-%{release}
... much more
However, we still need the requires information. We discussed about using jarjar. But Pascal hacked a better way using jdepend. The output looks like:
Provides: java(org.springframework.dao) Requires: java(org.springframework.core) Provides: java(org.springframework.dao.support) Requires: java(org.apache.commons.logging) Requires: java(org.springframework.beans.factory) Requires: java(org.springframework.dao) Requires: java(org.springframework.util) Provides: java(org.springframework.transaction) Requires: java(org.springframework.core) Provides: java(org.springframework.transaction.annotation) Requires: java(org.springframework.transaction.interceptor) Provides: java(org.springframework.transaction.interceptor) Requires: java(org.aopalliance.aop) Requires: java(org.aopalliance.intercept) Requires: java(org.apache.commons.logging) Requires: java(org.springframework.aop) Requires: java(org.springframework.aop.framework) Requires: java(org.springframework.aop.framework.adapter) Requires: java(org.springframework.aop.support) Requires: java(org.springframework.aop.target) Requires: java(org.springframework.beans.factory) Requires: java(org.springframework.beans.propertyeditors) ...
The missing piece would be stripping unwanted namespaces from the final compiled jar. This can be done using either jarjar or manually by unpacking the jar, deleting some directories and repacking it.
This would allow for much easier building of java packages from source without needing the full dependency tree prepackaged. And build those later one by one, and in a much more granular way. It would also help when building directly from binary jars, resulting in much more granular rpms.
What do you think? What other things can be improved when packaging java software?






