Note: The first 3 episodes of the interview series can be found here, here and here.
Question 1:
You have the shared document open and the phone rings. The interviewer, at the other end of the line, starts with a thick accent:
– What does ls * do?
You cannot believe your ears: it sounds easy. Really easy. So you answer in the line of “it lists all the files in the current directory”. The interviewer follows up with one or 2 questions on how it really works and you answer about the star being passed as a parameter to ls and how the binary interprets it in some way that it gets the entire directory walked over and its contents listed. Simple!
The interviewer thanks you for the time and goes away, leaving you with a big smile on your face. Well, until the recruiter calls you back and tells you to try again in 1-2 years.
Now, in an alternate reality from the one above, what does happen when the command is run in a Linux shell? The result may be close to:
$ ls * 1file 2file 3file 4dir: a b c d 5file 6file 7dir: e f g h 8file 9file
Did you see that? ls goes one level below. This is by no means possible if it were to only list the contents of the current directory. So something else, maybe something magical happens.
There’s no magic here, though: the star is first expanded by the shell to a list of files and directories and then this list is passed to ls. Then ls goes through every parameter and does what it knows best – lists files and directory contents. Yes, this is the “trick” expected by the interviewer, but there are some other details that should be mentioned:
The star is expanded to a list of files and directories that do not contain items starting with “.” (dot). Those files and directories are by no means invisible – only that the dot-starting names are reserved (by convention) for local configuration files.
If the expanded list exceeds a certain length or the number of elements is too large, one will get errors in the area of “too many files”. The limit is high, it varies between systems but is nevertheless finite so there is a chance of exceeding it at some point (e.g. having some program constantly writing status files in a single directory for a long time). Please also see this text for a more detailed explanation.
If the directory is empty (the star could not be expanded by the shell to a file/directory list), the star is indeed passed as a parameter to ls and a “not found” type of error is generated by the binary.
If you actually went on this path during the interview, more questions in the area of dealing with “too many files” error situations may come to you. They can usually be resolved by using find, which actually leads us to the next question.
Question 2:
The interviewer comes in and gives you the following scenario:
– I tried to delete all the files in the current folder with rm -f * and got myself with an “argument list too long” type of error. I really want the files deleted.
As mentioned with the previous question, find is your friend as it can ennumerate through each and every file, e.g.:
$ find . -type f -print
The above example is not very useful on its own, but it can be easily changed to become a scalable rm -f * equivalent:
$ find . -maxdepth 1 -type f -delete
Or, if one feels that rm should be explicitly used:
$ find . -maxdepth 1 -type f -exec rm -f {} \;
Another version, with pipelining to xargs:
$ find . -maxdepth 1 -type f -print0 | xargs -0 rm -f
Note: The -print0 passes out file names separated by the null terminator. xargs -0 picks them up from the stream in this particular format. This is required for files containing unusual characters such as space, tab, line terminators and so on.
Question 3:
You just solved a difficult problem and you’re given a second, easier one, just to round up the interview. It sounds easy, something in the area of renaming all the “.htm” files in a directory structure to “.html”. Really easy, don’t you think?
Your first guess is right, find should be used to locate all the files we need to rename:
$ find . -type f -name "*.htm" -print
OK, now what? Piping the output into a script may be a solution, but the interviewer is not actually looking for more than one liner in this particular scenario. They may also have a follow up question, but first let’s see how this particular problem can be solved as easy as possible:
$ find . -type f -name "*.htm" -exec mv -f {} {}l \;
Did you get it? {} is replaced by find with the file name.
The follow up is tougher, though: the reverse problem, renaming all the “.html” files to “.htm”. This actually requires doing some string magic with bash (yes, with the % operation):
$ find . -type f -name "*.html" -exec sh -c 'f="{}"; mv -f $f ${f%l}' \;
Some notes for the code above:
The exec functionality is used with the shell interpreter being called with an inline script. The {} sequence is replaced by exec with the file name;
Single quotes (‘) were used in order to prevent replacing the $f variable with the contents set outside the inline script, effectively ignoring the assignment immediately before. This is a very tricky issue, please try yourself the following to see what gets printed:
$ a="outside" $ sh -c 'a="inside"; echo $a' $ sh -c "a=\"inside\"; echo $a"
- ${f%l} looks strange, but it’s about removing the terminating “l” from the file name that is stored in the $f variable.
I’m pretty curious on how many people could solve this in an interview setting, without being faced with all these issues beforehand. Not many, I presume.
That’s it for today, thank you for your read!
Nice explanation. Really appreciate your effort.
I tried the script for find and replace and its not working.But the below once is working for me.
find . -type f -name “*.htm” -exec sh -c ‘f={};mv -f $f $(basename “$f” .htm).html’ \;
question #3 can be solved with basename -s