Joshua Bahnsen wrote on the yum mailing list (summarizing):
Is there support for compression types other than gzip for the yum metadata? For example, bzip2 or LZMA? I am keeping track of 16 RHEL channels, using createrepo with the standard gzip I am totaling 1.4 GB of metadata. Compressing those same XML documents with LZMA yields a total of 140 MB. That’s 10x savings overall, I think that’s worth a look.
James Antill asked some questions, and also pointed some issues:
These we can probably try and help with, but we’ve been asking and waiting for 12+ months for RHN and CentOS to move to generating .sqlite files server side. So I wouldn’t bet that we can help in the general case, quickly. Plus any client side support for lzma probably wouldn’t get into 5.x until at least 5.5 (more likely 5.6 or 5.7). So realistically you are targeting Fedora and 6.x for a change like this.
For Fedora the main issue is that they use sqlite version of the metadata generated on the server side, therefore they are more interested in how those files compress.
openSUSE still downloads the raw metadata. Mostly because our sat-solver is really fast converting it to the solv cache. We have thought about shipping solv files in the metadata some day, but we still haven’t look at that in detail.
However I wanted to see how it could make a difference for openSUSE by using lzma when compressing the metadata. I took the 11.0 update repository, as it has quite a lot of stuff in it, which makes it big to download when it changes.
For that, only a minimal patch to /usr/bin/repo2solv.sh is needed. You can get that patch here.
Here are my metadata size results:
I also looked to how fast the parsing of the metadata goes. For that I parsed each file 3 times and averaged each time component. I dropped disk caches before.
As you can see, the repo is 4x smaller and parses 2x faster, which is still very nice(the 10x gains seems to be for files like other.xml which we don’t download).
You can get the data from here.
A change like this is still challenging:
- We need upstream support to keep repos compatible
- It can only be done in Factory, and people need to update the software management stack before the format is changed.
What do you think?




KDE
Kopete
Duncan Mac-Vicar’s profile on LinkedIn
Spam Poison
Cool idea, but when you do that, do it right – and make it extensible in the future, so that further changes can coexist.
And, for that matter – would it not be possible to simply add an additional metadata file right now, with say an .lzma extension, and stacks which support that format would grab that instead of the raw metadata? Why is this an incompatible change …?
True, this would require one further retrieval attempt which could turn into a miss (and thus a slowdown), but that too can be cached and only refreshed less frequently?
Those are some pretty impressive results… I definitely think users and developers would notice and appreciate the speed improvements.
This looks great! Maybe lzma support could be pushed as an update to zypp in 11.1 to make things smoother.
It is great to see people working on stuff like this.
Lars, you are right…however in practice rpm-md is not very specified.
Fedora uses different resource types for their server side caches (sqlite), therefore other clients can ignore those resource types (as we ignore changelogs).
However, the same resource using different compression is not specified, so it would share the resource type, and that could make the behavior undefined, either you choose one, or the client download boths?.
If the objective is to reduce download size, shouldn’t “delta-metadata” be created? Like DeltaRPMs but for metadata.
i hope suse 11.2 can get this new techno
[...] community has found a new fan and HOWTOs were published too, some by community members and some by Novell employees who are on the Microsoft/Novell payroll (Novell extracts money from Microsoft). So thanks to both [...]
Duncan , IMHO the client shouldnt download both, we can perform for example a conditional HTTP GET request to get the lzma version if that returns 404, then retrieve the raw metadata..