Tech Blog :: linux

Nov 29 '11 1:06pm

Parse Drupal watchdog logs in syslog (using node.js script)

Drupal has the option of outputting its watchdog logs to syslog, the file-based core Unix logging mechanism. The log in most cases lives at /var/log/messages, and Drupal's logs get mixed in with all the others, so you need to cat /var/log/messages | grep drupal to filter.

But then you still have a big text file that's hard to parse. This is probably a "solved problem" many times over, but recently I had to parse the file specifically for 404'd URLs, and decided to do it (partly out of convenience but mostly to learn how) using Node.js (as a scripting language). Javascript is much easier than Bash at simple text parsing.

I put the code in a Gist, node.js script to parse Drupal logs in linux syslog (and find distinct 404'd URLs). The last few lines of URL filtering can be changed to any other specific use case you might have for reading the logs out of syslog. (This could also be used for reading non-Drupal syslogs, but the mapping applies keys like "URL" which wouldn't apply then.)

Note the comment at the top: to run it you'll need node.js and 2 NPM modules as dependencies. Then take your filtered log (using the greg method above) and pass it as a parameter, and read the output on screen or output with > to another file.

Mar 30 '11 2:41pm

Quick tip: Extract all unique IP addresses from Apache logs

Say you wanted to count the number of unique IP addresses hitting your Apache server. It's very easy to do in a Linux (or compatible) shell.

First locate the log file for your site. The generic log is generally at /var/log/httpd/access_log or /var/log/apache2/access_log (depending on your distro). For virtualhost-specific logs, check the conf files or (if you have one active site and others in the background) run ls -alt /var/log/httpd to see which file is most recently updated.

Then spit out one line to see the format:
tail -n1 /var/log/httpd/access_log

Find the IP address in that line and count which part it is. In this example it's the 1st part (hence $1):

cat /var/log/httpd/access_log | awk '{print $1}' | sort | uniq > /var/log/httpd/unique-ips.log

You'll now have a list of sorted, unique IP addresses. To figure out the time range, run
head -n1 /var/log/httpd/access_log
to see the start point (and the tail) syntax above for the end point.)

To count the number of IPs:
cat  /var/log/httpd/unique-ips.log | wc -l

To paginate:
more  /var/log/httpd/unique-ips.log

Have fun.

Nov 19 '10 1:01pm

How to compile rsync on [crappy] shared hosting

I'm working on a client's shared hosting server that doesn't allow rsync. (It's on there, just blocked from use, and tech support on $5/month hosting doesn't know what rsync is.)

Anyway, getting rsync running in the home directory turned out to be very easy. If you're running crappy shared Linux hosting with at least SSH access, and have the same problem, try this:

First make a space in your home directory, I used ~/opt (and src for the source code):
mkdir -p ~/opt/src
Then go to and get the latest source. As of now that's 3.0.7, so I did:
cd ~/opt/src
Download (using curl in my case, because wget was blocked):
curl -O
tar -xzf rsync-3.0.7.tar.gz
cd rsync-3.0.7/
Now build it into ~/opt/rsync:
./configure --prefix=$HOME/opt/rsync
make install

Those should generate a bunch of output, hopefully with no errors... if it looks like it worked, run ~/opt/rsync/bin/rsync to confirm.
If that worked, and especially if there is already an rsync on the server but it's blocked, you'll want to give your version precedence in the PATH. So:
nano ~/.bashrc
Add a line at the end:
export PATH=~/opt/rsync/bin:$PATH
Ctrl+W saves, Ctrl+X exits.
Reload bashrc:
source ~/.bashrc
(or logout and back in)
which rsync (should be the one you just created)

That's it, unless your hoster put up some other restrictions (like the compiler itself), which is very possible.

Nov 14 '10 12:22am

Cron Wrapper Script

I was having issues with a server's crontab, but the complexity of the cron commands (with logging and error-catching added to each) made it hard to debug. So I wrote this general cron wrapper script. (It's built for Ubuntu Linux, should work in any bash shell, but haven't tested it in other environments.)
To use, extract somewhere on the server. Then in your crontab, you call it like so (this example runs every minute):

* * * * * $WRAPPER -d "Doing something" -s "/usr/local/" -l $CRON_LOG &>> $CRON_LOG

It'll collect all output (regular and error), then dump it as a block into the specified log file.

The trailing 2>> $CRON_LOG in the example will catch errors in the wrapper script itself; errors in the command are caught by the script.

Run by itself to see the usage. (There are 3 one-letter arguments.)

May 5 '10 7:13pm

Upgrade SVN to 1.6 on Ubuntu 8

If you're running Ubuntu 8 ("Hardy") - a common setup for Slicehost servers including the one running this site - then you're probably still using SVN (client) v1.4.6. That's the latest version in apt-get with the standard libraries. But this post explains how to get 1.6. The only unexpected bit was a dependency on an Apache upgrade, but the whole process took less than 2 minutes, with less than 10 seconds of Apache downtime.
May 4 '10 9:54am

How to identify a Linux distribution

If you log into a Linux server and want to know which distribution you're working with, these commands are helpful:

uname -a (Linux kernel version)

lsb_release -a (distro information)

cat /etc/issue (more basic distro information)