In the last episode, we wrote this Rakefile. It automates building three Markdown files into HTML files.
task :default => :html task :html => %W[ch1.html ch2.html ch3.html] rule ".html" => ".md" do |t| sh "pandoc -o #{t.name} #{t.source}" end
We really don’t want to have to edit this file every time we add a new file to process though. Instead, we’d like to have the Rakefile automatically find files to be built.
To give us something to experiment with, I’ve set up a sample project directory. It contains four Markdown chapter files and one appendix file in a subdirectory, all of which should be built into HTML files. It also has some other stuff which we don’t want to build. There’s a ~ch1.md
file which is some kind of temporary file left behind by an editor. And there’s a scratch directory, the contents of which should be ignored.
$ tree . ├── ~ch1.md ├── ch1.md ├── ch2.md ├── ch3.md ├── ch4.markdown ├── scratch │ └── test.md ├── subdir │ └── appendix.md └── temp.md
This project is under Git revision control. If we tell Git to list the files it knows about, we see a subset of the files from before. Notably missing is a file called temp.md
, which has not been registered with Git and probably never will. It too should be left out of the list of files to build.
$ git ls-files ch1.md ch2.md ch3.md ch4.markdown scratch/test.md subdir/appendix.md ~ch1.md
In order to automatically discover just the files which should be built, we turn to Rake file lists. Let’s explore what file lists are, and what they are capable of.
To create a file list, we use the subscript operator on the Rake::FileList
class, passing in a list of strings representing files.
require 'rake' files = Rake::FileList["ch1.md", "ch2.md", "ch3.md"] files # => ["ch1.md", "ch2.md", "ch3.md"]
So far this isn’t very exciting. But we’re just getting started. Instead of listing files individually, with a FileList we can instead pass in a shell glob pattern. Let’s give it the pattern *.md
require 'rake' Dir.chdir "project" files = Rake::FileList["*.md"] files # => ["ch1.md", "temp.md", "ch3.md", "ch2.md", "~ch1.md"]
Now we start to see the power of a FileList. But this isn’t quite the list of files we want. It contains some files we don’t care about, and it’s missing some files we do want.
We’ll address the missing files first. We add a *.markdown
pattern to find files which use the long-form extension.]
require 'rake' Dir.chdir "project" files = Rake::FileList["*.md", "*.markdown"] files # => ["ch1.md", "temp.md", "ch3.md", "ch2.md", "~ch1.md", "ch4.markdown"]
But we’re still missing the appendix file. To fix this, we change the glob patterns to match any level in the project directory tree.
require 'rake' Dir.chdir "project" files = Rake::FileList["**/*.md", "**/*.markdown"] puts files # >> ch1.md # >> temp.md # >> ch3.md # >> ch2.md # >> scratch/test.md # >> ~ch1.md # >> subdir/appendix.md # >> ch4.markdown
Now we’ve found all four chapters and the appendix, but we’ve picked up a lot of junk along the way. Let’s start winnowing down the list of files. For this, we’ll use exclusion patterns.
We start by ignoring files that begin with a ~
character.
require 'rake' Dir.chdir "project" files = Rake::FileList["**/*.md", "**/*.markdown"] files.exclude("~*") puts files # >> ch1.md # >> temp.md # >> ch3.md # >> ch2.md # >> scratch/test.md # >> subdir/appendix.md # >> ch4.markdown
Next we’ll ignore files in the scratch
directory. Just to demonstrate that it’s possible, we’ll use a regular expression for this exclusion instead of a shell glob.
require 'rake' Dir.chdir "project" files = Rake::FileList["**/*.md", "**/*.markdown"] files.exclude("~*") files.exclude(/^scratch\//) puts files # >> ch1.md # >> temp.md # >> ch3.md # >> ch2.md # >> subdir/appendix.md # >> ch4.markdown
We’ve still got the file temp.md
hanging around. As we saw before, this file isn’t registered with Git. We’d like to make an exclusion rule that says to ignore any non-Git-controlled file. To do this, we pass a block to .exclude
. Inside, we put an incantation which will determine if Git is aware of the file.
require 'rake' Dir.chdir "project" files = Rake::FileList["**/*.md", "**/*.markdown"] files.exclude("~*") files.exclude(/^scratch\//) files.exclude do |f| `git ls-files #{f}`.empty? end puts files # >> ch1.md # >> ch3.md # >> ch2.md # >> subdir/appendix.md # >> ch4.markdown
This filters out the temp file, and finally we are left with the list of just the files we care about.
Next we update the code to make the FileList definition a little more self-contained. We change from the subscript shorthand to FileList.new
, and pass a block to the constructor. The FileList will yield itself to this block, which means we can set up all of our exclusions inside the block.
require 'rake' Dir.chdir "project" files = Rake::FileList.new("**/*.md", "**/*.markdown") do |fl| fl.exclude("~*") fl.exclude(/^scratch\//) fl.exclude do |f| `git ls-files #{f}`.empty? end end puts files # >> ch1.md # >> ch3.md # >> ch2.md # >> subdir/appendix.md # >> ch4.markdown
We need to make one more change to our list of files before we can return to our Rakefile
. In the Rakefile
what we needed was a list of the files to be built, not the source files that correspond to them. To convert our list of input files to a list of output files, we use the #ext
method. We give it a .html
file extension, and it returns a new list of files with all of the original Markdown extensions replaced with .html
.
require 'rake' Dir.chdir "project" files = Rake::FileList.new("**/*.md", "**/*.markdown") do |fl| fl.exclude("~*") fl.exclude(/^scratch\//) fl.exclude do |f| `git ls-files #{f}`.empty? end end puts files.ext(".html") # >> ch1.html # >> ch3.html # >> ch2.html # >> subdir/appendix.html # >> ch4.html
Now we’re ready to come back to our Rakefile. We replace our hardcoded list of target files with the FileList we just built.
Since we are now supporting Markdown files with either a .md
or .markdown
extension, we have to make one more change to tell Rake it can build an HTML file for either one. For now, we’ll do this by simply duplicating the rule. In the future we’ll look at a way to avoid this duplication.
source_files = Rake::FileList.new("**/*.md", "**/*.markdown") do |fl| fl.exclude("~*") fl.exclude(/^scratch\//) fl.exclude do |f| `git ls-files #{f}`.empty? end end task :default => :html task :html => source_files.ext(".html") rule ".html" => ".md" do |t| sh "pandoc -o #{t.name} #{t.source}" end rule ".html" => ".markdown" do |t| sh "pandoc -o #{t.name} #{t.source}" end
When we run rake
, we can see that it builds all the right HTML files:
$ rake pandoc -o ch1.html ch1.md pandoc -o ch2.html ch2.md pandoc -o ch3.html ch3.md pandoc -o subdir/appendix.html subdir/appendix.md pandoc -o ch4.html ch4.markdown
I think that’s enough Rake for today. Happy hacking!
(This post is also available in Japanese at Nilquebe Blog.)
[boilerplate bypath=”rake-end”]