Rake Part 5: File Operations

[boilerplate bypath=”rake”]

Here’s the Rakefile we’ve been working on for the last few episodes. It finds Markdown source files in a sources subdirectory of a project, and produces a parallel hierarchy of HTML files in an outputs subdirectory.

SOURCE_FILES = Rake::FileList.new("sources/**/*.md", "sources/**/*.markdown") do |fl|
  fl.exclude("**/~*")
  fl.exclude(/^scratch\//)
  fl.exclude do |f|
    `git ls-files #{f}`.empty?
  end
end

task :default => :html
task :html => SOURCE_FILES.pathmap("%{^sources/,outputs/}X.html")

rule ".html" => ->(f){source_for_html(f)} do |t|
  sh "pandoc -o #{t.name} #{t.source}"
end

def source_for_html(html_file)
  SOURCE_FILES.detect{|f| f.ext('') == html_file.ext('')}
end

Now that we are recreating the input file hierarchy in an outputs directory, we need to ensure that the destination directory exists before generating any HTML files. An easy way to do this in Rake is to use a directory task. This is like a file task, except for a directory. But unlike a file task, we don’t have to supply any code for how to make the directory appear if it doesn’t already exist. Simply by specifying the task, we are giving Rake implicit instructions to create the directory if it is needed.

We add this directory to the list of dependencies for the .html rule.

rule ".html" => [->(f){source_for_html(f)}, "outputs"] do |t|
  sh "pandoc -o #{t.name} #{t.source}"
end

Now when we run rake, we can see that it creates the directory before beginning to generate the HTML files. Unfortunately, it runs into a problem as it tries to build the appendix.html file. Since this file is in a subdirectory of the sources directory, we want the HTML output file to be in a corresponding subdirectory of the outputs directory. But this subdirectory doesn’t yet exist.

t$ rake
mkdir -p outputs
pandoc -o outputs/backmatter/appendix.html sources/backmatter/appendix.md
pandoc: outputs/backmatter/appendix.html: openFile: does not exist (No such file or directory)
rake aborted!
Command failed with status (1): [pandoc -o outputs/backmatter/appendix.html...]
/home/avdi/Dropbox/rubytapas/133-rake-file-operations/project/Rakefile:16:in `block in <top (required)>'
Tasks: TOP => default => html => outputs/backmatter/appendix.html
(See full trace by running task with --trace)

To ensure this or any other intermediate directory exists before producing an HTML file, we could execute a mkdir -p shell command, using #pathmap to pass just the directory portion of the target filename.

sh "mkdir -p #{t.name.pathmap('%d')}"

But Rake gives us a shortcut for this. Instead of running a shell command, we can use a mkdir_p method right in the task:

SOURCE_FILES = Rake::FileList.new("sources/**/*.md", "sources/**/*.markdown") do |fl|
  fl.exclude("**/~*")
  fl.exclude(/^sources\/scratch\//)
  fl.exclude do |f|
    `git ls-files #{f}`.empty?
  end
end

task :default => :html
task :html => SOURCE_FILES.pathmap("%{^sources/,outputs/}X.html")

directory "outputs"

rule ".html" => [->(f){source_for_html(f)}, "outputs"] do |t|
  mkdir_p t.name.pathmap("%d")
  sh "pandoc -o #{t.name} #{t.source}"
end

def source_for_html(html_file)
  SOURCE_FILES.detect{|f| 
    f.ext('') == html_file.pathmap("%{^outputs/,sources/}X")
  }
end

Now when we run rake, it ensures the target directory exists before each markdown-to-HTML transformation.

$ rake
mkdir -p outputs/backmatter
pandoc -o outputs/backmatter/appendix.html sources/backmatter/appendix.md
mkdir -p outputs
pandoc -o outputs/ch1.html sources/ch1.md
mkdir -p outputs
pandoc -o outputs/ch2.html sources/ch2.md
mkdir -p outputs
pandoc -o outputs/ch3.html sources/ch3.md
mkdir -p outputs
pandoc -o outputs/ch4.html sources/ch4.markdown

Often when writing build scripts it’s convenient to have an easy way to quickly blow away all of the generated files. Let’s add a task to handle this. Once again, instead of running a shell, we’ll use a Rake helper method called rm_rf. This mirrors the shell rm -rf command, which recursively deletes files and directories without any warnings or confirmation.

task :clean do
  rm_rf "outputs"
end

Rake has a long list of these file operation helper methods, all of them named after their UNIX shell equivalents. They are handy for several reasons. For one thing, since they are native Ruby methods we can pass files and file lists to them directly without any kind of string interpolation.

They are also sensitive to the Rake “quiet” flag. We can run a Rake command with the -q flag, and it will do any work needed, but this time without logging to STDOUT.

$ rake -q
$

Almost all of these helpers are inherited straight from the Ruby FileUtils standard library. So if you want to see a list of all that’s available, just check out the FileUtils documentation.

That’s all for today. Happy hacking!

(This post is also available in Japanese from Nilquebe Blog.)

[boilerplate bypath=”rake-end”]

2 comments

  1. Thanks for the topic! I sent some money to the fund. I am glad people keep promoting it. I wish I was half the decent man Jim was.

  2. The Jim Weirich https links don’t work. Seems they need to be changed to http. Parts 1-4 of your rake series are similarly effected, and I assume any other pages referencing that site are as well.

Leave a Reply

Your email address will not be published. Required fields are marked *