1. Tell me about a monitoring test you’ve written?
I decided long ago that I don’t want to hire anyone who has never written a monitoring test. I don’t care how simple or complicated the test was, but I want to make sure they’ve done it. Throughout my career, I’ve come across so many specialized pieces of code or infrastructure, that I take it for granted sooner or later your going to need to do this. I find that the people who care about uptime do it earlier in their career. Its good to follow up with several more questions about their specific implementation, and then ask if they had any unexpected issues with the test.
2. How would you remove files from a directory when ‘rm *’ tells you there are too many files?
Back in the 1990’s when Solaris shipped with a version of Sendmail that was an open-relay, it wasn’t unusual for me to have to wipe a mail queue directory for a customer. If someone had been really aggressive sending mail to it, it wasn’t too unusual to be confronted with the message that the * expansion was too long to pass to rm. I can think of a few ways to do this:
- for i in `ls /dir/`; do rm $i ; done
- find /dir/ -exec rm {} \;
- rm -rf /dir; mkdir /dir
And I’m sure there are plenty more. After I get the answer I like to cover if they think there is any issue with the method they’ve chosen.
I like this question since it show a candidates understanding of how the command-line works, and if they can think around some of its limitations.
3. How would you setup and automated installation of linux?
A good candidate should have done this, and they should imediately be talking about setting up FAI or Kickstart. I like to make sure they cover the base pieces of infrastructure, like DHCP, tftp, and PXE. Generally I will follow up, and ask when they think it makes sense to setup this type of automation, since it does require quite a bit of initial infrastructure.
4. How would you go about finding files on an installed system to add to configuration management?
This question is straight forward and quick, and I’m looking for two things from the candidate. First, I want them to tell me about using the package management system to locate modified config files, and second I want to hear them tell me about talking to the development team as to what was copied on the system.
This question tells me they’ve looked for changes on systems, and have a basic understanding of what the package management tools provide. But, that they know there is a human component, and it might be quicker to ask the dev team what they installed then building a tool to find it.
5. If I gave you a production system with a PHP application running through Apache, what things would you monitor?
I like using this question because it give you an idea of the thoroughness of the candidates thought process. The easy answer is the URL the application is running on, but I like to push candidates for more. I generally looking for a list like:
- The URL of the application
- The Port Apache is running on
- The Apache Processes
- PING
- SSH
- NTP
- Load Average / CPU utilization
- Memory Utilization
- Percentage of Apache Connections used
- Etc..
I’m looking for both the application specific and the basic probes. I cannot tell you how many times in my career, I’ve started a job and found out SSH wasn’t monitored. Since it wasn’t part of the application, people didn’t think it was needed.
This question tests the candidates attention to detail. Monitoring is an important part of any production environment, and I want candidates who state the obvious.
6. If I asked you to backup a servers local filesystem, how would you do it?
Backups are, unfortunately, the bread and butter of operations work. A candidate should really have some experience running a backup, and so they should know the basics. Unfortunately, this is a really open ended question. There are endless ways it can be done, and that makes it a little tough on both the candidate and interviewer. One example a candidate could choose would be to use the tar command, but they could also choose to use tar with an LVM snapshot, or they could use rsync to a remote server. Its really the follow up question that makes this worthwhile; what are the disadvantages of your method, and can you think of another way you might do this to address those issues? Again, since its the bread and butter of operations work, they should know the strengths and weakness of the scheme they select, and they should know at least one alternative.
This question checks to see if a candidate has performed typical operations work, but also if they have thought through the problems with it.