start-stop-daemon on CentOS/RHEL

(Note that I’ve only ever done this on CentOS 6.2. It should work in a lot of other places too though, especially other RHEL based distros.)

I really wanted to use the Smokeping init script that comes with Ubuntu 10.04.* LTS  on a CentOS 6.2 box. One look at it, however, and you will very quickly realise that out-the-box that isn’t something which is likely to work, possibly for other reasons, but definitely because you don’t have “start-stop-daemon” on a CentOS box; not yet at least 🙂

This helpful post suggested that if you pull the dpkg source from one of the Debian mirrors then you could build it, albeit quite nastily, and end up with a successful build of start-stop-daemon. However, it doesn’t have to build so nastily. Newer versions of dpkg build cleanly, as I discovered and have detailed below. As root or using sudo, do the following:

cd /usr/local/src
wget -c "http://za.archive.ubuntu.com/ubuntu/pool/main/d/dpkg/dpkg_1.15.8.4ubuntu3.tar.bz2"
tar jfxvh dpkg_1.15.8.4ubuntu3.tar.bz2
rm dpkg_1.15.8.4ubuntu3.tar.bz2
cd dpkg-1.15.8.4ubuntu2/
./configure --without-install-info --without-update-alternatives --without-dselect
make && make install

Now if you type “which start-stop-daemon” you should discover that it’s built and installed into /usr/local/sbin, and works perfectly just like it’s supposed to. And with that hurdle out the way, I could now finish getting that Ubuntu init script working on CentOS. Happy time 🙂

The easiest way to fix apt missing key issues

Sorry, this is a bit lazy of me, but at the moment I can only con­firm that this works on Ubuntu 10.04.4 LTS. It might, how­ever, work on other ver­sions too, and maybe also with Debian of course.

If when you run an apt-get update you are being told some­thing like this right at the end:

W: GPG error: http://196.x.y.z lucid Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 2940ABA983EF826A

… then install “add-apt-key” on the box, and run it, adding the miss­ing key itself to the end of the com­mand, as shown below.

root@xyz-box-bry-01:~# apt-get install add-apt-key
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  add-apt-key
0 upgraded, 1 newly installed, 0 to remove and 5 not upgraded.
Need to get 5,314B of archives.
After this operation, 81.9kB of additional disk space will be used.
Get:1 http://196.x.y.z/ubuntu/ lucid/universe add-apt-key 1.0-0.5 [5,314B]
Fetched 5,314B in 0s (270kB/s)
Selecting previously deselected package add-apt-key.
(Reading database ... 88896 files and directories currently installed.)
Unpacking add-apt-key (from .../add-apt-key_1.0-0.5_all.deb) ...
Processing triggers for man-db ...
Setting up add-apt-key (1.0-0.5) ...
root@xyz-box-bry-01:~# add-apt-key 2940ABA983EF826A
gpg: directory `/root/.gnupg' created
gpg: new configuration file `/root/.gnupg/gpg.conf' created
gpg: WARNING: options in `/root/.gnupg/gpg.conf' are not yet active during this run
gpg: keyring `/root/.gnupg/secring.gpg' created
gpg: keyring `/root/.gnupg/pubring.gpg' created
gpg: requesting key 83EF826A from hkp server subkeys.pgp.net
gpg: /root/.gnupg/trustdb.gpg: trustdb created
gpg: key 83EF826A: public key "Opscode Packages " imported
gpg: Total number processed: 1
gpg:               imported: 1
OK
root@xyz-box-bry-01:~#

I’ve found that some­times the key server doesn’t have the key and so it’s not imported, but re-running the com­mand gen­er­ally fixes that as the next key server cho­sen usu­ally does end up hav­ing the key. Once you’ve suc­cess­fully imported the key, run “apt-get update” again and your prob­lem should no longer exist.

There was some­thing else I wanted to say but I’ve totally for­got­ten what it was.

Enjoy :)

Fixing tftpd-hpa on Lucid

Whilst working at IS, I’ve sent VMware servers to Accra, Nairobi, Lagos, Maputo, London, Durban, Cape Town and then of course to two sites in Johannesburg. The servers in Accra, Nairobi, Lagos and Maputo run various virtual machines required by the NMS team (of which I am a member), as well as a whole lot that a sister team of ours uses. The NMS machines are things like Syslog boxes and SNMP gateways, etc.

Connectivity to those regions is anywhere from awesome – if that region is on the eastern side of Africa and connects via Seacom – right down to pretty much unusable. The kind of unusable where you spend a whole day just trying to log into the console of a virtual machine because every time you try to write “root” the word will come out as “rroot” and then “rooooot” and then “rootttt”. It’s one of the most frustrating things I’ve ever had to do in my life, I’m sure of it.

Which kinda brings me to the point of this post. I decided that the best way to deploy the various machines (given that there’s never any time to send the VMware server itself to the region with all the virtual machines already built) was to kickstart them, using preseeds for the Ubuntu boxes and kickstarts for the CentOS/RHEL boxes. This has worked famously for me, and I’m now able to have fully built, NMS standards compliant virtual machines in any of those regions in ten minutes or less.

That was until I upgraded the key infrastructure boxes (dhcp & tftp servers etc.) to Lucid. Suddenly everything ground to a halt. The fix however was very simple. In fact I feel kinda guilty that you’ve had to read this whole long story just to get such a simple solution to your problem.

Before Lucid these boxes ran Hardy. I used tftpd-hpa running as a daemon, using the standard /var/lib/tftpboot directory as the TFTP root. My /etc/default/tftpd-hpa file looked like this:

#Defaults for tftpd-hpa
RUN_DAEMON="yes"
OPTIONS="-l -s /var/lib/tftpboot"

After upgrading to Lucid that file had changed so that it looked like this:

# /etc/default/tftpd-hpa
TFTP_USERNAME="tftp"
TFTP_DIRECTORY="/srv/tftp"
TFTP_ADDRESS="0.0.0.0:69"
TFTP_OPTIONS=""

However the service wouldn’t start and the network installs kept on failing to even start. Changing the contents of /etc/default/tftpd-hpa to look more like this solved my problem. The “4” is because I switch IPv6 off on all my Lucid machines by adding “ipv6.disable=1” to the “GRUB_CMDLINE_LINUX_DEFAULT” line in /etc/default/grub.

# /etc/default/tftpd-hpa
TFTP_USERNAME="tftp"
TFTP_DIRECTORY="/var/lib/tftpboot"
TFTP_ADDRESS="0.0.0.0:69"
TFTP_OPTIONS="-4 --secure"

Bounce the service and you’re sorted, and back on the road with your network installs. 🙂

Solving “I have no name!@*”

[imported from my old site, not a new post (if that’s important)]

So I’m not sure how, but I broke one of my Ubuntu virtual machines now, right at a about the worst possible time for it to break. The symptoms were that while I could still login to the machine as my user, and if I was root I could still “su” to that user, every time I was that user the user itself seemed to have no understanding of itself, while the system still did. That was a pretty hairy sentence, so let me explain by showing what I tried.

root@gw-pkl-01:~# su - charles
I have no name!@gw-pkl-01:~$

Now when you try SSH’ing anywhere or doing anything useful you are told to get lost.

I have no name!@gw-pkl-01:~$ ssh charles@anywhere.i.can.think.of -Cv
You don't exist, go away!
I have no name!@gw-pkl-01:~$

But I wonder whether the system knows who I am?

I have no name!@gw-pkl-01:~$ id
uid=1000 gid=1000(charles) groups=4(adm),20(dialout),24(cdrom),25(floppy),29(audio),30(dip),44(video),46(plugdev),107(fuse),109(lpadmin),115(admin),1000(charles),1001(wheel)
I have no name!@gw-pkl-01:~$ whoami
whoami: cannot find name for user ID 1000
I have no name!@gw-pkl-01:~$ echo $USER
charles
I have no name!@gw-pkl-01:~$

That almost all seemed correct, so what on earth could be going on? I tried adding new users to see if they were alright, and they also had the problem.

The answer is that your /etc/passwd file is not readable by all users. This was confirmed by comparing the broken machine with another one that I had deployed from the same template.

root@dynamips-pkl-01:~# ls -la /etc/passwd
-rw-r--r-- 1 root root 2104 2008-07-17 00:12 /etc/passwd
root@dynamips-pkl-01:~#

root@gw-pkl-01:~# ls -la /etc/passwd
-rw------- 1 root root 2331 2008-08-12 13:49 /etc/passwd
root@gw-pkl-01:~#

Make it readable and everything works again.

root@gw-pkl-01:~# chmod +r /etc/passwd
root@gw-pkl-01:~# su - charles
charles@gw-pkl-01:~$ id
uid=1000(charles) gid=1000(charles) groups=4(adm),20(dialout),24(cdrom),25(floppy),29(audio),30(dip),44(video),46(plugdev),107(fuse),109(lpadmin),115(admin),1000(charles),1001(wheel)
charles@gw-pkl-01:~$ whoami
charles
charles@gw-pkl-01:~$ echo $USER
charles
charles@gw-pkl-01:~$

Thanks to this post which made that all easy to realise.

Debian or Ubuntu on a Sun Netra

[imported from my old site, not a new post (if that’s important)]

I will probably expand on this at some stage, so for now it’s going to be short, but I recently put Debian onto a Sun Netra, which was pretty cool, but I decided to cross-grade from Debian to Ubuntu which I’ve done before without having any issues, and I ran into issues. Tons of them in fact. But only one was bad enough that I almost gave up, and by bad enough I mean that anybody who is good with dpkg would have solved this quickly but no matter how hard I searched (at first), all I could find was whining about how if you have this problem it’s your own fault because you’ve mixed Debian and Ubuntu, but not giving any clues as to how to solve the problem. I had mixed the two, but I also hadn’t. My /etc/apt/sources.list had only Ubuntu sources, but some of the software on the machine was Debian. Anyway to get to the point if you have this problem:

Selecting previously deselected package sysvutils.
(Reading database ... 168107 files and directories currently installed.)
Unpacking sysvutils (from .../sysvutils_2.86.ds1-14.1ubuntu9_i386.deb) ...
dpkg: error processing /var/cache/apt/archives/sysvutils_2.86.ds1-14.1ubuntu9_i386.deb (--unpack):
trying to overwrite `/usr/share/man/man1/mesg.1.gz', which is also in package sysvinit
Errors were encountered while processing:
/var/cache/apt/archives/sysvutils_2.86.ds1-14.1ubuntu9_i386.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

Then please fix it by doing the following:

$ sudo dpkg --force-overwrite -i /var/cache/apt/archives/sysvutils_2.86.ds1-14.1ubuntu9_i386.deb(Reading database ... 168508 files and directories currently installed.)
Unpacking sysvutils (from .../sysvutils_2.86.ds1-14.1ubuntu9_i386.deb) ...
dpkg - warning, overriding problem because --force enabled:
trying to overwrite `/usr/share/man/man1/mesg.1.gz', which is also in package sysvinit
dpkg - warning, overriding problem because --force enabled:
trying to overwrite `/usr/share/man/man1/last.1.gz', which is also in package sysvinit
dpkg - warning, overriding problem because --force enabled:
trying to overwrite `/usr/share/man/man1/lastb.1.gz', which is also in package sysvinit
dpkg - warning, overriding problem because --force enabled:
trying to overwrite `/usr/share/man/man8/pidof.8.gz', which is also in package sysvinit
dpkg - warning, overriding problem because --force enabled:
trying to overwrite `/usr/share/man/man8/killall5.8.gz', which is also in package sysvinit
dpkg - warning, overriding problem because --force enabled:
trying to overwrite `/usr/bin/last', which is also in package sysvinit
dpkg - warning, overriding problem because --force enabled:
trying to overwrite `/usr/bin/mesg', which is also in package sysvinit
dpkg - warning, overriding problem because --force enabled:
trying to overwrite `/sbin/killall5', which is also in package sysvinit
dpkg - warning, overriding problem because --force enabled:
trying to overwrite `/usr/bin/lastb', which is also in package sysvinit
dpkg - warning, overriding problem because --force enabled:
trying to overwrite `/bin/pidof', which is also in package sysvinit
Setting up sysvutils (2.86.ds1-14.1ubuntu9) ...

And when you are done, thank these guys.

Updated: it appears that somebody other than me is actually reading my website, so I wanted to make a point of noting that the problem I had was exactly as above, but obviously not with i386 debs, so please don’t try fixing it exactly as pasted above, but rather apply that to a sparc64 setup. I’m kinda thinking that if you’re having this issue then you already knew that, but I had to mention it 🙂