October 16, 2009

What is with rpm-metadata (createrepo) and yum on openSUSE?

Long time ago, I wrote about our interoperability efforts built around rpm-metadata format and first-class PackageKit support.

On the rpm-metadata side however, even if we depend a lot on these tools, the situation was far from ideal:

  • We need usually extensions to the rpm-metadata to support the features that make the openSUSE software management more powerful compared to other tools
  • We are in continuous talk with the yum team to make those extensions common so they can be standardized instead of staying in suseinfo/susedata.xml
  • Some of those extensions got implemented, in createrepo 0.9.x
  • We are stuck with createrepo 0.4.x plus a high stack of patches
  • createrepo 0.9.x requires a recent yum
  • yum on openSUSE is unmaintained, and not included in the distribution
  • openSUSE Build Service and other infrastructure depend on a proven createrepo (which means they depend on the custom patches)

This situation won’t sustain in the long term, so the following action plan was agreed between various stakeholders:

  • we will update yum and createrepo to the upstream latest versions, and maintain them in [the system:packagemanager] project
  • all patches except a couple were discarded, so don’t expect those versions to work flawlessly for now, we will reevaluate them one by one, upstream them or discard them
  • openSUSE infrastructure will freeze their production version from our project once the newer versions work, instead of maintaining a fork
  • as we don’t want to include yum in the default package selection, but createrepo depends on it, yum was split in yum and yum-common (libraries).
  • yum is still available in the system:packagemanager, and will be kept up-to-date. We are interested in competition, as it makes the [ZYpp][10] team work harder :-)
  • I will improve enhancerepo (for custom tag extensions generation) and include it in this project in a near future
  • Most patches in the current packages were one liner fixes where the developer did not spent 5 minutes to upstream it. This attitude created a uncomfortable patch mess. We will change this by upstreaming fixes or just rejecting having those patches in the package.

If you use createrepo or yum in your infrastructure, we invite you to contribute to this project! yum-3.2.25 was released a couple of days ago, and it is already available there.

[10]: http://en.opensuse.org/Libzypp

April 29, 2009

Making package management even faster?

Joshua Bahnsen wrote on the yum mailing list (summarizing):

Is there support for compression types other than gzip for the yum metadata? For example, bzip2 or LZMA? I am keeping track of 16 RHEL channels, using createrepo with the standard gzip I am totaling 1.4 GB of metadata. Compressing those same XML documents with LZMA yields a total of 140 MB. That’s 10x savings overall, I think that’s worth a look.

James Antill asked some questions, and also pointed some issues:

These we can probably try and help with, but we’ve been asking and waiting for 12+ months for RHN and CentOS to move to generating .sqlite files server side. So I wouldn’t bet that we can help in the general case, quickly. Plus any client side support for lzma probably wouldn’t get into 5.x until at least 5.5 (more likely 5.6 or 5.7). So realistically you are targeting Fedora and 6.x for a change like this.

For Fedora the main issue is that they use sqlite version of the metadata generated on the server side, therefore they are more interested in how those files compress.

openSUSE still downloads the raw metadata. Mostly because our sat-solver is really fast converting it to the solv cache. We have thought about shipping solv files in the metadata some day, but we still haven’t look at that in detail.

However I wanted to see how it could make a difference for openSUSE by using lzma when compressing the metadata. I took the 11.0 update repository, as it has quite a lot of stuff in it, which makes it big to download when it changes.

For that, only a minimal patch to /usr/bin/repo2solv.sh is needed. You can get that patch here.

Here are my metadata size results:

I also looked to how fast the parsing of the metadata goes. For that I parsed each file 3 times and averaged each time component. I dropped disk caches before.

As you can see, the repo is 4x smaller and parses 2x faster, which is still very nice(the 10x gains seems to be for files like other.xml which we don’t download).

You can get the data from here.

A change like this is still challenging:

  • We need upstream support to keep repos compatible
  • It can only be done in Factory, and people need to update the software management stack before the format is changed.

What do you think?

October 13, 2008

Towards trusted third party repositories

I got pointed to Dan’s Kegel post on Linux Foundation’s packaging mailing list called “Towards trusted third party repositories”. I am not subscribed to the list so I am commenting here.

I’ve been intrigued by http://en.opensuse.org/Standards/OneClickInstall for some time now. (That’s a way to provide a one-click web install experience for .deb/.rpm/.psi etc. packages, implemented as a mime type handler that parses a simple .xml file pointing to the package/repository appropriate for each distro.) When this idea was brought up on the Packagekit mailing list, it generated lots of negative feedback. The summary at http://packagekit.org/pk-faq.html#1-click-install gives a bunch of non-central objections, followed by the central objection that one cannot trust third party repositories: “Allowing to easily add third party repositories and install third party software without a certification infrastructure is like opening the gates to hell”

One Click Install is only a simple description to add a repository and some software, but the security is not a property of this simple description, but of the package manager. A malicious one click install file can point ZYpp to “hell”, but “hell:

  • Either is signed with a non trusted key and the user needs to trust it.
  • Is signed with the distribution key, which is already in the trusted keyring therefore it just works.

This is a real problem. Here are a couple risks:

No, it is not. It depends on the package manager you are using. Basic security here is independent of one click install.

1) users might click on malware sites and add completely malicious sites to their repository lists

Again, this is not true for ZYpp, and I guess not all package managers allows to download metadata from a repository without trusting it first.

2) a compromised third-party repository might update system packages maliciously.

If you trusted it, there is nothing you can do.

3) several genuinely well-intentioned repositories might include conflicting versions of a commonly needed package not provided by the system repositories.

This has nothing to do with one click install. Packman repository does it already. This is a dependency resolving problem and vendor management: ZYpp for example, does not allow implicit vendor jump on update (only on dist-upgrade). And vendor jump is a conflict you need manually to approve. Only package managers that threat all packages with the same name as the same package suffer here.

After mulling this problem over for a long time, two ideas came to mind: 1) Since the distribution is trusted, it could decide to trust some third-party repositories. For instance, it might decide to trust Adobe’s hypothetical repository so that people could get flash and air updates straight from the source.

This is possible now:

  • You can import Adobe’s key into the distribution together with the distribution key, so it goes automatically to the trusted keyring
  • You can create metadata in the distribution side, linking to Adobe’s packages, and sign the metadata with the distribution key
  • You can do nothing and let the user trust explicitly the repository

This idea of using the distribution as arbiter of trust for third party repositories could be extended to games publishers, etc. This could provide a partial solution to the first threat listed above; if the “good” third-party repositories are already known to the distribution, there’s less need for users to be doing something dangerous like deciding on their own to trust a random third-party repo. This addresses the first threat identified above.

I think repository trusting and the required package manager infrastructure has nothing to do with one click install. It just needs to be there.

2) A simple way to keep repositories from updating packages they shouldn’t is to have package managers enforce some sort of namespacing. e.g. Adobe’s repository could be allowed to only update packages whose names start with “adobe-”. (System repositories would continue to be able to update any package at all.) This addresses the second and third threats identified above.

Why inventing a new thing? You already have the vendor tag there. A package should be considered update candidate only for packages that are from the same vendor.

Imagine you have mediaplayer-1.0 from the distribution, without mp3 support. Then a 3rd party repo offers a compiled 2.0 version with mp3 support but without other features, and then the distribution offers 3.0. You would be getting and losing features all the time as you update. Same package names does not means it is the same package. If your package manager does that, it is a bug. If there is no solution to the dependency solving than jumping vendor, do a conflict and make the user explicitly switch. Once he does, the update candidate would be the same vendor packages.

I think something like this is going to be needed before we can have a thriving — and safe – ecosystem of ISVs providing easily-downloaded-and-installed binary packages for Linux. What do people think about the package namespacing idea? – Dan

I don’t think reinventing the vendor tag and repository signatures is a good idea.

The problem is not at this level, IMHO the only thing we can do is improve the usability of the security chain. For the user is meanless to trust a repository with gpg key 0×32432432 and they will probably just click “yes”. The pending task is to make the cryptographic chain something that makes sense for the user.

October 6, 2008

enhancerepo 0.3.2

A new version of enhancerepo (0.3.2) is building in the build service.

The new feature is updateinfo metadata generation (patches) support, in a simple but very automatic way, designed specially for testing purposes, but it may be sufficient for people wanting to generate patches for their own repositories.

Using the –generate-update pkg1,pkg2.. option, enhancerepo will look in your repository (and additionally in another base directory), and look for all packages. If a package has multiple versions available (including the base directory) it will create update metadata in the repoparts directory. If you run enhancerepo with the –updates option (either in the same run or not) it will take all repoparts and index them in the updateinfo.xml. This allow to manually edit the patches before indexing them, or to mix automatically generated ones with hand-crafted ones.

As additional coolness it will look for bugs to generate descriptions/references, update type (for example changes containing CVE references or vulnerabilities are tagged with security automatically).

For example, I have 2 amarok packages in my repo, plus one in the 11.0 base directory. This command will look the 3 rpms, and start looking backwards till it finds some changes in the changelog. You can specify multiple packages per “patch”.

# enhancerepo --generate-update amarok --updates --updates-base-dir /space/repo/11.0 /space/repo/duncan2
generating update...
3 versions for 'amarok'
Found change amarok-1.4.10-17-i586.i586 and amarok-1.4.9.1-53-i586.i586.
'amarok' has 1 entries (68/67)
Saving update part to '/space/repo/duncan2/repoparts/update-amarok-1.xml'.
Adding update /space/repo/duncan2/repoparts/update-amarok-1.xml
Saving /space/repo/duncan2/repodata/updateinfo.xml.gz ..
Adding /space/repo/duncan2/repodata/updateinfo.xml.gz to 
       /space/repo/duncan2/repodata/repomd.xml index
repodata/updateinfo.xml.gz already exists. Replacing.
Saving /space/repo/duncan2/repodata/repomd.xml ..

The resulting updateinfo.xml:


<?xml version="1.0" encoding="UTF-8"?>
<updates>
<update status="stable" from="dmacvicar@piscola" type="security" version="1">
  <title>Untitled updatesecurity update 1 for amarok</title>
  <id>amarok</id>
  <issued>1223293050</issued>
  <release>no release</release>
  <description>
    - update to 1.4.10: fix tmp file vulnerability in the Magnatune
    database parsing. Secunia#SA31418 / CVE-2008-3699 / bnc#417232
  </description>
  <references>
    <reference href="http://bugzilla.novell.com/417232" 
       title="bug number 417232" type="bugzilla" id="417232"/>
    <reference 
       href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2008-3699" 
       title="CVE number 2008-3699" type="cve" id="2008-3699"/>
  </references>
  <pkglist>
    <collection>
      <package name="amarok" arch="i586" version="1.4.10" release="17">
        <filename>amarok-1.4.10-17.i586.rpm</filename>
      </package>
    </collection>
  </pkglist>
</update>
</updates>

Some important notes:

  • If you generate an update, and then you don’t add new packages or changes, enhancerepo will generate the same update again, but with a new version.
September 30, 2008

enhancerepo 0.3.1 cooking in build service: deltarpm support

A new version of enhancerepo (0.3.1) is cooking itself in the build service.

The new feature is deltarpm metadata generation support and also some kind of smart deltarpm generation.

This means, enhancerepo can look which package has several versions in the repository, and generate delta rpms for N steps to the older versions. The default is one step, that is, only a delta rpm to the newest to the previous one. And then it can generate the metadata for you too (and add it to the index and such).

Example run:

# enhancerepo --disk-usage --keywords --eulas --create-deltas 2 --deltas -- /space/repo/duncan2
Adding eula: /space/repo/duncan2/zapping-0.9.6-72.eula to zapping-0.9.6-72-i586
Adding eula: /space/repo/duncan2/zaptel.eula to zaptel-1.2.10-70-i586
Adding keyword: /space/repo/duncan2/zaptel-debuginfo.keywords to zaptel-debuginfo-1.2.10-70-i586
Preparing disk usage...
Creating delta - amarok-1.4.10-3-i586 -> amarok-1.4.10-17-i586 (1/2)
Creating delta - amarok-1.4.9.1-53-i586 -> amarok-1.4.10-17-i586 (2/2)
Saving /space/repo/duncan2/repodata/susedata.xml.gz ..
Adding /space/repo/duncan2/repodata/susedata.xml.gz to /space/repo/duncan2/repodata/repomd.xml index
repodata/susedata.xml.gz already exists. Replacing.
Saving /space/repo/duncan2/repodata/deltainfo.xml.gz ..
Adding /space/repo/duncan2/repodata/deltainfo.xml.gz to /space/repo/duncan2/repodata/repomd.xml index
repodata/deltainfo.xml.gz already exists. Replacing.
Saving /space/repo/duncan2/repodata/repomd.xml ..

Some important notes:

  • It now requires ruby-rpm, which is available on the devel:languages:ruby:extensions repository.
  • Be careful with running createrepo on top of a directory with deltarpms. createrepo will index them incorrectly as packages (So clean deltarpms, run createrepo, and then generate deltas with enhancerepo on top).
  • I did not test this release as much as the latest :-)
September 26, 2008

Introducing enhancerepo 0.3

Introduction

You may know that we are slowly heading to use the rpmmd format as the default one. We already do for the build service since the beginning, and the only remaining part is the media.

Therefore, we have been following an strategy of, at the same time, extending rpmmd with our own extensions, but at the same time, talk to the yum people in order to think in more long term about changes in the format.

Right now, in the openSUSE 11.1Beta code, you find the following extensions:

  • updateinfo.xml (patches, same Format as Fedora)
  • deltainfo.xml (delta-rpms, same format as yum-presto, different name)
  • suseinfo.xml (repository extra data)
  • susedata.xml (package extra data, like eulas and disk usage per mount point)

Apart of that, we sign our repositories.

Even for testing our own stuff, it became tedious to add extra metadata to repositories created with createrepo.

enhancerepo

Enhancerepo allows you to inject all the extra metadata to repositories in an easy way. It takes care of updating the index and compressing the files.

Features

  • Sign repositories
  • Generate eulas from text files with certain name conventions (package.eula)
  • Generate keywords from text files with certain name conventions (package.eula)
  • Add disk usage per mount point information
  • Add repository expiration time (for outdated mirror autodetection)
  • (expermental and incomplete) generation of updateinfo.xml

Usage

Usage
-----
enhancerepo [OPTION] ... DIR

-h, --help:

   show help

--sign keyid

   Generates signature for the repository
   using key keyid

--updates

   Add updates from *.updates files
   and generate updateinfo.xml

SUSE specific repository data (suseinfo.xml):
--expire time

  Set repository expiration hint
  (Can be used to detect dead mirrors)

--repo-product prodname:

   Adds product compatibility information

--repo-keyword keyword

   Tags repository with keyword

SUSE specific package data (susedata.xml)
--eulas

   Drop packagename.eula files and
   the attributes will be added to susedata.xml

--keywords

   Drop packagename.keywords files and
   the attributes will be added to susedata.xml

--disk-usage

   Add disk usage information
   the attributes will be added to susedata.xml

Note: your .eula or .keywords file will be added to a package if it
matches its name. If you want to add the attributes to a specific
package, name the file name-version, name-version-release or
name-version-release.arch

DIR: The repo base directory ( where repodata/ directory is located )

Download

Packages available here.

git clone git://git.opensuse.org/projects/enhancerepo
May 21, 2008

The greatest unknown openSUSE 11.0 package management feature

During the development of openSUSE 11.0, we have been reporting in real time cool improvements like the fast installation, how YaST became sexy, how YaST/ZYpp/zypper became fast, how YaST/ZYpp/zypper performs better than others and even that our solver is also really smart.

However, there is something else…

Interoperability

Background

One of the features of our stack is the availability of patches and patterns. The first provide updates in a sense of “fix for a problem” (which can mean various, or none updated packages), while patterns are intelligent groups that can recommend, require and suggest packages in order to make certain functionality available, without being too strict in the specific packages to install.

Unlike in other systems where groups and updates are handled as special entities, ZYpp patterns and patches are just objects with dependencies like packages, and the solver threats them in the same way.

Because rpm installed packages database does not know about patterns and patches, in openSUSE 10.x (and SLE10) those objects are installed in a separate database, only viewable to libzypp. This is hidden to the users, but does not allow for easy management using 3rd party tools.

In addition to that, the patch metadata format is our own extension to the metadata handled by yum, the tool used by Fedora and Centos. That means, even if other distribution provide similar concepts, they will mostly ignore our extended metadata.

This is sad, if we share 90% of the metadata format, why not go further?. Sometimes it is no worth to wait that others do steps in becoming more interoperable with you, so what about doing those steps ourselves?

At this time, Fedora was implementing updates metadata by using a yum plugin and a updateinfo.xml description. Metadata for deltarpm availability is handled via the yum-presto plugin.

Sharing tools and data, a step for Interoperability

Metadata format

In openSUSE 11.0, ZYpp reads patches from updateinfo.xml too! (check 11.0 update repo!). Not only that, our delta rpm availability metadata will be in the same format yum-presto (with some modifications agreed with its author).

How will this benefit you?

  • You will be able to use yum stack with updates out of the box with updates and deltarpms, and they will just work.
  • You will be able to generate custom patches for openSUSE using existing tools like Bodhi, from Fedora, or generate custom deltarpm information using the yum-presto included tools.
  • We are working hard to get the ZYpp/YaST stack to build on Fedora (and other distros), in a near future, you will be able to enjoy ZYpp performance and features on Fedora with their own repositories!
  • We decoupled the delta-rpm information from patches, so we may start adding delta-rpm to normal factory packages and it will start to work out of the box!
  • Much more!

Handling of patches and patterns

As we mentioned, in 10.x codebase we used to install patterns and patches in a special database. This is no longer the case.

Patterns and patches are no longer installed, which means your system is rpm only! Patches are shown now as (un)satisfied (and (un)relevant). Which means you have all the requirements to consider them present.

All the information of patches and patterns (and products) is extra information that openSUSE applications use to add more value to you. So if you for example remove a repository offering patches, then you just lose the information about which patches do you have, the real information is the rpm packages you have installed. When you re-add the update repository, and you can immediately see which patches published affect you, which ones are irrelevant, and which ones are relevant but you don’t need because your packages are up to date.

Patterns and patches become “advice” and “value”, not extra non-compatible information.

How will this benefit you?

  • Simpler system.
  • No conflicts because using 3rd party tools, “rpm by hand” or our native tools.

openSUSE, PackageKit enabled

PackageKit is the new actor in the package management world. It is a thin layer that provides applications access to the package management system as a DBUS service. You may heard about it because Fedora 9 is coming PackageKit enabled. How it benefits you?

  • Role based (non-root) package management, via PolicyKit.
  • Sharing of upstream tools across distributions.
  • Gives the desktop the chance to integrate with software manipulation.

So, openSUSE 11.0 is fully PackageKit enabled. You will be able to use all PackageKit compatible applications on openSUSE and they will use the ZYpp stack underneaths. Not only that it is enabled, but our hackers Scott Reeves and Stefan Haas did an amazing job on the backend, I would dare to say it is one of the most robust backend implementations, and it fully benefits from the ZYpp speed and features.

Future

All this improvements are available now. May be you are already enjoying them in Factory. However this opens the door for new possibilities, just a few examples come to mind:

  • The openSUSE Build Service, the great software building platform from our openSUSE team, builds packages for all major distributions since long time. The build service could allow to enter and generate patch information for fixed bugs and the update/patch information will be compatible across yum/Fedora and openSUSE. Same with deltarpm information.
  • We could extend ZYpp parser to understand Fedora groups stored in comps.xml and threat them as patterns.
  • Do you have more ideas?

Community involvement

We welcome any help on creating more interoperability possibilities, especially about building the ZYpp stack and YaST on Fedora, Mandriva and others. There are already some packages building in the build service, but we still have a long way to go.

May 17, 2008

Solving the famous “smart” case 3

In my last post, I showed how sat-solver (solver’s ZYpp uses) would solve correctly the “case 2″ included in smart’s README, an exercise smart uses to claim its “smartness” (because apt and yum can’t handle them).

How does it with case 3?

That’s another interesting case which was tested with APT-RPM and YUM.

In this case, there’s a package A version 1.0 installed in the system, and there are two versions available for upgrading: 1.5 and 2.0. Version 1.5 may be installed without problems, but version 2.0 has a dependency on B, which is not available anywhere.

In this case, the best possibility is upgrading to 1.5, since upgrading to 2.0 is not an option.

Here both yum and apt fail:

Just like APT, YUM selected version 2.0 and didn’t consider the availability of an intermediate version.

But smart does the right thing:

Smart correctly selects the intermediate version 1.5, which is the only viable possibility given the current options.

So, we setup a repository packages.xml with A version 1.5 and 2.0 ( 2.0 requiring an non-existing B) and a system.xml representing the installed packages, with only A 1.0 installed. We put all together in a test.xml file, and select a “upgrade” action.

We run the already introduced deptestomatic:

# deptestomatic test.xml
>!> Solution #1:
>!> upgrade A-1.0-1.noarch => A-1.5-1.noarch[test]
>!> !unflag A-2.0-1.noarch[test]
>!> installs=0, upgrades=1, uninstalls=0

And we confirm, that ZYpp (sat-solver) would also select A 1.5 as expected.

Solving the famous “smart” case 2

In my recent post about ZYpp, yum and smart speed and memory usage, I got this comment:

It would be also interesting to know how the new libzypp manages the “Case Studies” from http://svn.labix.org/smart/trunk/README

I don’t know, but could be that yum fixed “Case 2″ and libzypp “fails”? So the extra time and memory over libzypp taken by yum to make the dep resolution would be justified.

At first glance I thought the case was very vague, but actually it was really simple and concrete, so it was a good candidate to be tested with our test suite tools. Let’s look at it:

This is another real case, and is being reproduced in a controlled environment for tests with YUM, APT-RPM, and Smart.

The issue is, a package named A requires package BCD explicitly, and RPM detects implicit dependencies between A and libB, libC, and libD. Package BCD provides libB, libC, and libD, but additionally there is a package B providing libB, a package C providing libC, and a package D providing libD.

In other words, there’s a package A which requires four different symbols, and one of these symbols is provided by a single package BCD, which happens to provide all symbols needed by A. There are also packages B, C, and D, that provide some of the symbols required by A, but can’t satisfy all dependencies without BCD.

So, what is so special about this case?

The expected behavior for an operation asking to install A is obviously selecting BCD to satisfy A’s dependencies, on the other hand, YUM and APT fail to deliver that as a guaranteed operation, as is shown below.

Ok, there are a couple of things that must be said about this example.

  • The dependencies of the package A are broken. Unless the package A depends on the libraries libA, libB and libC and some runtime “thing” provided by it, then BCD should be split in three packages. But as none of B, C, D provides any runtime “thing”, but they just “replace” it, this is not true. So either fix BCD or fix A to depend on the libraries only, the explicit BCD dependency is bogus.

  • Debian’s apt-get won’t succeed here, because AFAIK Debian only has dependencies across packages, not on libraries. Actually I still wonder how smart could succeed here using a debian backend. I ignore if apt-rpm does support library dependencies.

  • The result yum produces makes no sense too.

  • So yes, this is the expected result, but the example is a really dumb package.

How to simulate dependencies using the ZYpp sat solver?. The magic is call deptestomatic. There are two versions of this tool, one provided by satsolver-tools, and another one provided by libzypp-testsuite. The one in sat-solver tools should be sufficient. The one in libzypp-testsuite has extra features like graphical display of dependencies.

So, we use the helix format for testcases, which was the format of the original RedCarpet testuite. A testcase is a xml file refering to one or more xml files describing repositories, and one repository represents the installed packages (in this case we can simulate with no installed packages and just one repository).

We start by creating a packages.xml.

And now, we define the test.xml describing the testcase.

If there are conflicts, the xml language has features to select solutions, but that is advanced stuff documented here. For our case, the test is really, simple, just install A.

Now run the test:

# deptestomatic test.xml

And you get the result:

>!> Installing A from channel test
>!> Solution #1:
>!> install A-2.0-1.noarch[test]
>!> install BCD-2.0-1.noarch[test]
>!> installs=2, upgrades=0, uninstalls=0

Which is the “smart” result. It only installs A and BCD, and not B, C and D (which is what yum does).

Learning to develop testcases and run them is an excellent way to help the development team to fix bugs without having to request extra information, so you can play yourself with the simulation before actually reporting it.

You can also generate testcases for day to day operations in a very simple way (no need to write xml):

  • in the YaST (Qt) package selector, go to Extras, and choose generate solver testcase (after selecting your transactions, like installing or removing packages).
  • Use zypper command –debug-solver (like zypper install –debug-solver foo )

Then you can run them with deptestomatic by referencing the test xml file. Note that you can reuse package lists (repositories) across tests. libzypp-testsuite and the sat-solver source includes a big list of testcases representing simple operations, distributions, upgrades, and distribution upgrades.

May 16, 2008

yum and ZYpp speed / memory usage

Michael Zucchi complains about yum memory usage, and points python as guilty.

Yum isn’t so yummy after-all. Re-enabled python so i could run yum. Wow 120mb of vm to install a couple of packages. Not bad considering the box only has 128mb. This is crap.

Hmm, should I try xubuntu – or will it be just as crappy and bloated and blighted by python poo?

Since our efforts to make the ZYpp really fast, by incorporating and integrating Michael Schroeder’s sat solver together with Michael Matz’s great work on the solv files and data storage, I never took the time to make a “quick comparison” on speed or memory usage. So lets have a quick look.

These are my repositories:

Software configuration management (openSUSE_10.3)
10.3 - Main Repository (NON-OSS) 
10.3 - Packman
openSUSE-10.3-Updates
Virtualization:VirtualBox
home:dgollub
KDE:KDE3
Mozilla based projects (openSUSE_10.3)
ZYPP SVN Builds (openSUSE_10.3) 
ZYPP SVN Builds (openSUSE_10.3)
home:prusnak
10.3 - VideoLan
openSUSE.org tools (openSUSE_10.3)
SUSE Feature Tracking Tool (openSUSE_10.3)
psmt's Home Project (openSUSE_10.3)
openSUSE:10.3
Duncan Mac-Vicar SUSE rpms (openSUSE_10.3)
Latest YaST svn snapshots (openSUSE_10.3)
building/openSUSE_10.3         

All these repos together are about 41.000 packages.

What I did was to symlink ZYpp repositories to the yum repo path so they use the same repos.

# rm -rf /etc/yum.repos.d/
# ln -s /etc/zypp/repos.d /etc/yum.repos.d
*NOTE:* I tested with yum 3.2.4. I know 3.2.14 is available, but that is what I had installed when doing the test. After doing this tests I upgraded to 3.2.14 but it did not accept my .repo file because the character “:” in repo names. However the changelog of yum since 3.2.4 shows: If using latest yum would invalidate this numbers (not as in 1 second, but as in an order of magnitude), let me know and I will repeat them when I make them work with my repo files.

Update 14.05.2008 : I did add yum 3.2.14. However it performed even worse, except for memory usage.

Update 15.05.2008 : added smart 0.52 numbers

libzypp is the one you see in factory since some days: 4.21.1.

yum and ZYpp behave differently, as yum downloads and parses filelists.xml and other.xml which we ignore. There fore I skipped the download metadata part and just timed the cache building process.

# yum clean dbcache
...
19 sqlite files removed

# time yum makecache
...
Metadata Cache Created

real    9m41.036s
user    2m34.766s
sys     0m11.545s

Almost 10 minutes. As this time includes parsing the two big files we ignore. I did it again, pressing Ctrl-C After yum finished with the primary data, which is what ZYpp uses:

# time yum makecache
...
Exiting on user cancel

real    4m6.730s
user    0m34.058s
sys     0m3.080s

Now, zypper’s turn:

# time zypper ref -B
...
All repositories have been refreshed.

real    0m18.472s
user    0m16.029s
sys     0m2.024s

So yum takes 13 times the ZYpp needs technically (primary 1:1 comparison), but 30 times the time the end user sees.

Now, installing a package. Times were measured till the “continue? yes/no” prompt, or till the first interactive question.

# time yum install fate
...
Is this ok [y/N]: n
Exiting on user Command
Complete!

real    0m19.143s
user    0m14.057s
sys     0m1.920s

zypper’s turn:

# time zypper in fate
...
Continue? [YES/no]: n

real    0m9.796s
user    0m8.509s
sys     0m0.624s

This time, ZYpp is only twice as fast as yum. Only ;-)

What happens when you want to upgrade your packages?

# time yum upgrade
...
real    0m45.152s
user    0m36.894s
sys     0m7.476s

(Note: yum did not even found a solution here).

# time zypper update
...
Continue? [YES/no]: n

real    0m8.988s
user    0m7.820s
sys     0m0.596s

yum needs 4 times the time ZYpp needs to calculate the upgrade.

Update 14.05.2008 : I was comparing update to upgrade, I fixed those numbers in the chart. However, I don’t have the update value for the old yum.

Summary:

Now, how much memory does each one need? For this, I just tested the install command with one package using valgrind massif, a heap profiler.

yum memory usage:

yum memory usage

ZYpp memory usage:

zypp memory usage

Update 14.05.2008 : memory usage chart for yum 3.2.14

yum 3.2.14 memory usage

Update 15.05.2008 : memory usage chart for smart 0.52

smart 0.52 memory usage

Here you can appreciate ZYpp goes a little bit over 20M, while yum goes over 180M, so yum uses about 9 times more memory. Update 14.05.2008 : yum 3.2.14 uses around 160 in the worst point of time.

I would be interested in tracking cpu usage too, but that will come later. What do you think about it?