Note: The first 2 episodes of the interview series can be found here and here.
Question 1:
The interviewer comes in and hands you the following ls -la listing:
# ls -la total 108 dr-xr-x---. 7 root root 4096 Sep 5 07:16 . dr-xr-xr-x. 22 root root 4096 Sep 1 17:43 .. -rw-------. 1 root root 15432 Sep 5 06:36 .bash_history -rw-r--r--. 1 root root 18 May 20 2009 .bash_logout -rw-r--r--. 1 root root 176 May 20 2009 .bash_profile -rw-r--r--. 1 root root 176 Sep 23 2004 .bashrc -rw-r--r-- 1 root root 0 Sep 5 07:16 -f ...
They say the -f file must go; would you please delete it?
I suspect most candidates would try one of the following approaches:
Use quotes (e.g. rm “-f”). This does not work as the quotes are removed by the shell and -f is passed as such to the rm command. Such command will not return any error, though, as it tells rm to ignore non-existing files.
Escape the minus (e.g. rm \-f). This does not work as escaping is interpreted by the shell and, again, -f is passed as such to the rm command.
Double-escape the minus (e.g. rm \\-f). This does not work as the string \-f gets passed to rm, causing it to return an error; in the end, the file \-f does not exist while -f does.
At this point many candidates just give up, yielding a poor interview rating, which is truly a shame.
Anyway, how to delete such file? There are 2 solutions to this problem, both found by RTFM, i.e. man rm:
Note the “
--
” (double minus) parameter, e.g:# rm -- -f
Prefix the file name with the current directory, e.g.:
# rm ./-f
The follow-up to this interview question can be enumerating the characters that are not allowed by the filesystem, which are not many, I believe \0 (null, the string terminator) and / are forbidden on Linux while the rest are allowed. On Windows the list is longer, though.
Question 2:
You are asked to get to the whiteboard and draw a cron definition to be put in a /etc/cron.d file for writing the date/time into a file every Wednesday at 5 PM. The date/time format is given to you, e.g. %Y-%m-%d %H:%M:%S.
An experienced candidate may start to immediately write something in the line of:
0 17 * * 3 root date +'%Y-%m-%d %H:%M:%S' >> /tmp/logfile
All good? … Nope.
What is wrong, you may ask yourself? You may remember a couple of details regarding the Cron operation:
The commands are run with the shell configured in /etc/crontab rather than your favorite shell, but for such simple command it should not matter.
Local PATH changes are not visible, although “/bin” usually gets into PATH through /etc/crontab.
Local environment variables are not visible – but none are used.
At this point most candidates just give up.
The answer is with the way Cron interprets the % (percent) character – this is the end of line indication. The command is split into lines by % delimiters and then passed as such to /bin/sh. Quotes do not help with anything as the shell receives something in the line of:
date +' Y- m- d H: M: S' >> /tmp/logfile
This obviously generates an error. The solution is to escape the percent signs in the Cron command definition:
0 17 * * 3 root date +'\%Y-\%m-\%d \%H:\%M:\%S' >> /tmp/logfile
This finally looks good (well, sort of – but at least it does what it should do).
Question 3:
A situation came along: a mission-critical process writes data to some file whose contents must be preserved at all costs, only that some junior administrator has just run a rm -f command over it. The process is up & running but in a few hours the log rotatation will kick in and send a HUP signal.
Let’s just ignore the coldness one may feel through the spine – this is actually quite a simple issue to solve:
Identify the pid of the process and then do a listing in the /proc/_pid_/fd in order to identify the file descriptor of the deleted file, e.g.:
# ls -la /proc/966/fd/10 lrwx------ 1 root root 64 Sep 5 08:56 /proc/966/fd/10 -> /var/lib/critical/process.out (deleted)
Grab the contents of the deleted file:
# dd if=/proc/966/fd/10 of=/tmp/recovered.out
Hold on: how about the contents that the process keeps on writing after we issue the dd command?
In order to avoid data loss one may think about using a combination of:
Issue a SIGSTOP to freeze the process, then get the contents with dd and then terminate the process (kill) / restart the service.
Get the contents and issue a SIGHUP immediately afterwards.
None of these are guaranteed to preserve everything if the application buffers data (well, a mission-critical application should flush buffers after every write and not cache anything, this is a good point to raise in the interview). Nevertheless, there is a clean solution for that, involving the tail command:
# tail -f -n +0 /proc/966/fd/10 > /tmp/recovered.out
This command can be left running until the log rotation event if a faster service restart is not desirable.
Question 4:
The interviewer comes into the room and tells you that a node got into a 100% storage situation which was analysed by some administrator who could not identify the cause as all file sizes seem in order. Restarting the node is not possible due to mission-critical software running on it.
Sounds bad? The diagnosis is actually simple – if no visible file is large enough then it means that some process keeps a reference to a very large deleted file (or some deleted files add up to a very large size). Finding the process and the file is the tough part but the lsof command helps:
# lsof | grep deleted | awk '{print $2,$4,$8,$9}' 2006 34w 24967 /tmp/vteAIZ4MY 2006 35u 28708 /tmp/vteXJZ4MY 2006 36w 28985 /tmp/vteYU9XMY ....
What did I do here? I have filtered the lsof output by deleted files and then got it to print 4 fields only:
The pid of the process that keeps the deleted file open;
The file descriptor (with the access type);
The file size – the interesting field one should keep an eye on;
The file name.
Once we have identified the file, there are 2 options:
Reload or restart the process, e.g. kill -HUP _pid_;
Truncate the file, e.g. truncate -s 0 /proc/_pid_/fd/_fd_.
That’s it for today, thank you for your read!