At work I contribute to a moderately-sized monorepo at 70 thousand files,
8-digit lines of code and hundreds of PRs merged every day. One day I opened a
remote buffer at that repository and ran M-x find-file
.
find-file
is an interactive function that shows a narrowed list
of files in the current directory, prompts the user to filter and scroll
through candidates, and for a file to open.
Emacs froze for 5 seconds before showing me the find-file
prompt. Which
isn't great, because when writing software, opening files is actually something
one needs to do all the time.
Luckily, Emacs is "the extensible, customizable, self-documenting real-time
display editor", and comes with profiling capabilities: M-x profiler-start
starts a profile and M-x profiler-report
displays a call tree showing how much
CPU cycles are spent in each function call after starting the profile. Starting
a profile and running M-x find-file
showed that all time was being spent in a
function called ffap-guess-file-name-at-point
, which was being called by
file-name-at-point-functions
, an
abnormal hook
run when find-file
is called.
If you're familiar with Vim you can think of Emacs hooks as Vim autocommands, only with much better ergonomics.
I checked the documentation for ffap-guess-file-name-at-point
with
M-x describe-function ffap-guess-file-name-at-point
and it didn't seem to be
something essential, so I removed the hook by running M-x eval-expression
,
writing the form below, and pressing RET
.
(remove-hook 'file-name-at-point-functions 'ffap-guess-file-name-at-point)
This solved the immediate problem of Emacs blocking for 5 seconds every time I
ran find-file
, with no noticeable drawbacks.
As I write this I attempt to reproduce the issue by re-adding
ffap-guess-file-name-at-point
to
file-name-at-point-functions
.
I can't reproduce it anymore. The initial issue might have been
caused by having manually mutated the Emacs environment via ad-hoc code evaluation (drifting from the state defined in configuration)
caused by settings or packages that aren't in my configuration anymore
fixed by settings or packages that were recently added to my configuration
- fixed by some recent package upgrade
Or some combination of the above. I have no idea exactly what. Which is to say: maintaining Emacs configurations is complicated.
I could now navigate around and open files. The next thing I tried in this
remote git repository was searching through project files. The great
projectile package provides the projectile-find-file
function for that, but
I had previously given up making projectile perform well with remote buffers;
given how things are currently implemented it seems to be
impractical. So I installed
the find-file-in-project package for use on remote projects exclusively:
M-x package-install find-file-in-project
.
Both projectile-find-file
and find-file-in-project
(aliased as ffip
):
- show a narrowed list of all project files in the minibuffer
- prompt the user to filter and scroll through candidates
- open a file when
RET
is pressed on a candidate.
To disable projectile on remote buffers I had the following form in my configuration.
(defadvice projectile-project-root (around ignore-remote first activate)
(unless (file-remote-p default-directory 'no-identification) ad-do-it))
Which causes the projectile-project-root
function to not run its usual
implementation on remote buffers, but instead return nil
unconditionally.
projectile-project-root
is used as a way to either get the project root for a
given buffer (remote or not), or as a boolean predicate to test if the buffer is
in a project (e.g., a git repository directory). Having it return nil
on
remote buffers effectively disables projectile on remote buffers.
I then wrote a function that falls back to ffip
when projectile is disabled
and bound it to the keybinding I had for projectile-find-file
, so that I could
press the same keybinding whenever I wanted to search for projects files, and
not have to think about whether I'm on a remote buffer or not:
(apply 'max '(1 2))
(defun maybe-projectile-find-file ()
"Run `projectile-find-file' if in a project buffer, `ffip' otherwise."
(interactive)
(if (projectile-project-p)
(projectile-find-file)
(ffip)))
projectile-project-p
uses projectile-project-root
internally.
And called it:
M-x maybe-projectile-find-file
Emacs froze for 30 seconds. After that, it showed the prompt with the narrowed list of files in the project. 30 seconds! What was it doing during the whole time? Let's try out the profiler again.
-
Start a new profile:
M-x profiler-start
-
Call the function to be profiled:
M-x maybe-projectile-find-file
(it freezes Emacs again for 30 seconds) -
And display the report:
M-x profiler-report
Which showed:
Function CPU samples %
+ ... 21027 98%
+ command-execute 361 1%
This tells us that 98% of the CPU time was spent in whatever ...
is. Pressing
TAB
on a line will expand it by showing its child function calls.
Function CPU samples %
- ... 21027 98%
+ ivy--insert-minibuffer 13689 64%
+ #<compiled 0x131f715d2b6fa0a8> 3819 17%
Automatic GC 2017 9%
+ shell-command 1424 6%
+ ffip-get-project-root-directory 77 0%
+ run-mode-hooks 1 0%
+ command-execute 361 1%
Expanding ...
shows that Emacs spent 64% of CPU time in
ivy--insert-minibuffer
and 9% of the time—roughly 3 whole seconds!—garbage
collecting. I had garbage-collection-messages
set to t
so I could already
tell that Emacs was GCing a lot; enabling this setting makes a message be
displayed in the echo area whenever Emacs garbage collects. I could also see the
Emacs process consuming 100% of one CPU core while it was frozen and
unresponsive to input.
profiler
package implements a sampling profiler. The elp
package can be used for getting actual wall clock times.Drilling down on #<compiled 0x131f715d2b6fa0a8>
shows that cycles there (17%
of CPU time) were spent on Emacs waiting for user input, so we can ignore it for
now.
As I get deep in drilling down on ivy--insert-minibuffer
, names in the
"Function" column start getting truncated because the column is too narrow. A
quick Google search (via M-x google-this emacs profiler report width
) shows me
how to make it wider:
(setf (caar profiler-report-cpu-line-format) 80
(caar profiler-report-memory-line-format) 80)
Describing those variables with M-x describe-variable
shows that the default
values are 50
.
From the profiler report buffer I run M-x eval-expression
, paste the form
above with C-y
and press RET
. I also persist this form to my configuration.
Pressing c
in the profiler report buffer (bound to
profiler-report-render-calltree
) redraws it, now with a wider column, allowing
me to see the function names.
Here is the abbreviated expanded relevant portion of the call stack.
Function CPU samples %
- ffip 13586 63%
- ffip-find-files 13586 63%
- let* 13586 63%
- setq 13585 63%
- ffip-project-search 13585 63%
- let* 13585 63%
- mapcar 13531 63%
- #<lambda 0xb210342292> 13528 63%
- cons 13521 63%
- expand-file-name 12936 60%
- tramp-file-name-handler 12918 60%
- apply 9217 43%
- tramp-sh-file-name-handler 9158 42%
- apply 9124 42%
- tramp-sh-handle-expand-file-name 8952 41%
- file-name-as-directory 5812 27%
- tramp-file-name-handler 5793 27%
+ tramp-find-foreign-file-name-handler 3166 14%
+ apply 1237 5%
+ tramp-dissect-file-name 527 2%
+ #<compiled -0x1589d0aab96d9542> 337 1%
tramp-file-name-equal-p 312 1%
tramp-tramp-file-p 33 0%
+ tramp-replace-environment-variables 6 0%
#<compiled 0x1e202496df87> 1 0%
+ tramp-connectable-p 1006 4%
+ tramp-dissect-file-name 628 2%
+ eval 517 2%
+ tramp-run-real-handler 339 1%
+ tramp-drop-volume-letter 60 0%
tramp-make-tramp-file-name 30 0%
+ tramp-file-name-for-operation 40 0%
+ tramp-find-foreign-file-name-handler 2981 13%
+ tramp-dissect-file-name 518 2%
tramp-tramp-file-p 34 0%
#<compiled 0x1e202496df87> 1 0%
+ tramp-replace-environment-variables 1 0%
+ replace-regexp-in-string 153 0%
+ split-string 15 0%
+ ffip-create-shell-command 4 0%
cond 1 0%
A couple of things to unpack here. From lines 8-11 it could deduced that ffip
maps a lambda that calls expand-file-name
over all completion candidates,
which in this case are around 70 thousand file names. Running
M-x find-function ffip-project-search
and narrowing to the relevant region in
the function shows
exactly that:
find-function
shows the definition of a given function, in its source file.find-file-in-project.el
(mapcar (lambda (file)
(cons (replace-regexp-in-string "^\./" "" file)
(expand-file-name file)))
collection)
On line 11 of the profiler report we can see that 60% of 30 seconds (18 seconds)
was spent on expand-file-name
calls. By dividing 18 seconds by 70000 we get
that expand-file-name
calls took 250µs on average. 250µs is how long a modern
computer takes to
read 1MB sequentially from RAM! Why
would my computer need to do that amount of work 70000 times just to display a
narrowed list of files?
Let's see if the function documentation for expand-file-name
provides any
clarity.
M-x describe-function expand-file-name
expand-file-name is a function defined in C source code.
Signature
(expand-file-name NAME &optional DEFAULT-DIRECTORY)
Documentation
Convert filename NAME to absolute, and canonicalize it.
Second arg DEFAULT-DIRECTORY is directory to start with if NAME is relative
(does not start with slash or tilde); both the directory name and
a directory's file name are accepted. If DEFAULT-DIRECTORY is nil or
missing, the current buffer's value of default-directory is used.
NAME should be a string that is a valid file name for the underlying
filesystem.
Ok, so it sounds like expand-file-name
essentially transforms a file path into
an absolute path, based on either the current buffer's directory or optionally,
a directory passed in as an additional argument. Let's try evaluating some forms
with M-x eval-expression
both on a local and a remote buffer to get a sense of
what it does.
In a local dired buffer at my local home directory:
*dired /Users/mpereira @ macbook*
(expand-file-name "foo.txt")
;; => "/Users/mpereira/foo.txt"
In a remote dired buffer at my remote home directory:
*dired /home/mpereira @ remote-host*
(expand-file-name "foo.txt")
;; => "/ssh:mpereira@remote-host:/home/mpereira/foo.txt"
The expand-file-name
call in ffip-project-search
doesn't specify a
DEFAULT-DIRECTORY
(the optional second parameter to expand-file-name
) so
like in the examples above it defaults to the current buffer's directory, which
in the profiled case is a remote path like in the second example above.
With a better understanding of what expand-file-name
does, let's now try to
understand how it performs. We can benchmark it with benchmark-run
in local
and remote buffers, and compare their runtimes.
M-x describe-function benchmark-run
benchmark-run is an autoloaded macro defined in benchmark.el.gz.
Signature
(benchmark-run &optional REPETITIONS &rest FORMS)
Documentation
Time execution of FORMS.
If REPETITIONS is supplied as a number, run forms that many times,
accounting for the overhead of the resulting loop. Otherwise run
FORMS once.
Return a list of the total elapsed time for execution, the number of
garbage collections that ran, and the time taken by garbage collection.
Benchmarking it in a local dired buffer at my local home directory
*dired /Users/mpereira @ macbook*
(benchmark-run 70000 (expand-file-name "foo.txt"))
;; => (0.308712 0 0.0)
and in a remote dired buffer at my remote home directory
*dired /home/mpereira @ remote-host*
(benchmark-run 70000 (expand-file-name "foo.txt"))
;; => (31.547211 0 0.0)
showed that it took 0.3 seconds to run expand-file-name
70 thousand times on a
local buffer, and 30 seconds to do so on a remote buffer: two orders of
magnitude slower. 30 seconds is more than what we observed in the profiler
report (18 seconds), and I'll attribute this discrepancy to unknowns; maybe the
ffip
execution took advantage of byte-compiled code evaluation, or there's
some overhead associated with benchmark-run
, or something else entirely.
Nevertheless, this experiment clearly corroborates the profiler report results.
So! Back to ffip
. Looking again at the previous screenshot, it seems that the
list of displayed files doesn't even show absolute file paths. Why is
expand-file-name
being called at all? Maybe calling it isn't too important...
Let's remove the expand-file-name
call by
- visiting the
ffip-project-search
function in the library file withM-x find-function ffip-project-search
- "raising"
file
in the lambda - re-evaluating
ffip-project-search
withM-x eval-defun
and see what happens.
find-file-in-project.el
(mapcar (lambda (file)
(cons (replace-regexp-in-string "^\./" "" file)
- (expand-file-name file)))
+ file))
collection)
I run my function again:
M-x maybe-projectile-find-file
It's faster. This change alone reduces the time for ffip
to show the candidate
list from 30 seconds to 8 seconds with no noticeable drawbacks. Which is
better, but still not even close to acceptable.
Profiling the changed function shows that now most of the time is spent in
sorting candidates with ivy-prescient-sort-function
, and garbage collection.
Automatic sorting of candidates based on selection recency comes from the
excellent ivy and ivy-prescient packages, which I had installed and
configured. Disabling ivy-prescient
with M-x ivy-prescient-mode
and
re-running my function reduces the time further from 8 seconds to 4 seconds.
Another thing I notice is that ffip
allows
fd
to be used as a backend instead of GNU
find. fd
claims to have better performance, so I install it on the remote host
and configure ffip
to use it. I evaluate the form below like before, but I
could also have used the very handy M-x counsel-set-variable
, which shows a
narrowed list of candidates of all variables in Emacs (in my setup there's
around 20 thousand) along with a snippet of their docstrings, and on selection
allows the variable value to be set. Convenient!
(setq ffip-use-rust-fd t)
Which brings my function's runtime to a little over 2 seconds—a 15x performance improvement overall—achieved via:
- Manually evaluating a modified function from an installed library file
- Disabling useful functionality (prescient sorting)
- Installing a program on the remote host and configuring
ffip
to use it
The last point is not really an issue, but the whole situation is not ideal. Even putting aside all of the above points, I don't want to wait for over 2 seconds every time I search for files in this project.
Let's see if we can do better than that.
So far we've been mostly configuring and introspecting Emacs. Let's now extend it with new functionality that satisfies our needs.
We want a function that:
- Based on a remote buffer's directory, figures out its remote project root directory
- Runs
fd
on the remote project root directory - Presents the output from
fd
as a narrowed list of candidate files, with it being possible to filter, scroll, and select a candidate from the list - Has good performance and is responsive even on large, remote projects
Let's see if there's anything in find-file-in-project that we could reuse. I
know that ffip
is figuring out project roots and running shell commands
somehow. By checking out its library file with
M-x find-library find-file-in-project
(which opens a buffer with the
installed find-file-in-project.el
package file) I can see that the
shell-command-to-string
function (included with Emacs) is being used for
running shell commands, and that there's a function named ffip-project-root
that sounds a lot like what we need.
I have a keybinding that shows the documentation for the thing under the cursor. I use it to inspect the two functions:
ffip-project-root
ffip-project-root is an autoloaded function defined in
find-file-in-project.el.
Signature
(ffip-project-root)
Documentation
Return project root or default-directory.
shell-command-to-string
shell-command-to-string is a compiled function defined in
simple.el.gz.
Signature
(shell-command-to-string COMMAND)
Documentation
Execute shell command COMMAND and return its output as a string.
Perfect. We should be able to reuse them.
I also know that the ivy-read
function provided by ivy should take care of
displaying the narrowed list of files. Looks like we won't need to write a lot
of code.
To verify that our code will work on remote buffers we'll need to evaluate forms
in the context of one. The with-current-buffer
macro can be used for that.
M-x describe-function with-current-buffer
with-current-buffer is a macro defined in subr.el.gz.
Signature
(with-current-buffer BUFFER-OR-NAME &rest BODY)
Documentation
Execute the forms in BODY with BUFFER-OR-NAME temporarily current.
BUFFER-OR-NAME must be a buffer or the name of an existing buffer.
The value returned is the value of the last form in BODY. See
also with-temp-buffer.
For writing our function, instead of evaluating forms ad-hoc with
M-x eval-expression
, we'll open a scratch buffer and write and evaluate
forms directly from there, which should be more convenient.
I have a clone of the Linux git repository
on my remote host. Let's assign a remote buffer for the officially funniest file
in the Linux kernel, jiffies.c
—
/ssh:mpereira@remote-host:/home/mpereira/linux/kernel/time/jiffies.c
—to a variable named remote-file-buffer
by evaluating the following form with
eval-defun
.
*scratch*
(setq remote-file-buffer
(find-file-noselect
(concat "/ssh:mpereira@remote-host:"
"/home/mpereira/linux/kernel/time/jiffies.c")))
;; => #<buffer jiffies.c>
Notice that the buffer is just a value, and can be passed around to functions.
We'll use it further ahead to emulate evaluating forms as if we had that
buffer opened, with the with-current-buffer
macro.
Let's start exploring by writing to the *scratch*
buffer and continuing to
evaluate forms one by one with eval-defun
.
*scratch*
(shell-command-to-string "hostname")
;; => "macbook"
default-directory
;; => "/Users/mpereira/.emacs.d/
(ffip-project-root)
;; => "/Users/mpereira/.emacs.d/
And now let's evaluate some forms in the context of a remote buffer. Notice that
running hostname
in a shell returns something different.
*scratch*
(with-current-buffer remote-file-buffer
(shell-command-to-string "hostname"))
;; => "remote-host"
(with-current-buffer remote-file-buffer
default-directory)
;; => "/ssh:mpereira@remote-host:/home/mpereira/linux/kernel/time/"
(with-current-buffer remote-file-buffer
(ffip-project-root))
;; => "/ssh:mpereira@remote-host:/home/mpereira/linux/"
(with-current-buffer remote-file-buffer
(shell-command-to-string "fd --version"))
;; => "fd 8.1.1"
(with-current-buffer remote-file-buffer
(executable-find "fd" t))
;; => "/usr/bin/fd"
executable-find
requires the second argument to be non-nil to search on remote hosts. CheckM-x describe-function executable-find
for more details.Emacs is not only running shell commands, but also evaluating forms as if it were running on the remote host. That's pretty sweet!
Now that we made sure that the executable for fd
is available on the remote
host, let's try running some fd
commands.
*scratch*
(with-current-buffer remote-file-buffer
(shell-command-to-string "pwd"))
;; => "/home/mpereira/linux/kernel/time"
(with-current-buffer remote-file-buffer
(shell-command-to-string "fd --extension c | wc -l"))
;; => 28
(with-current-buffer remote-file-buffer
(shell-command-to-string "fd . | head"))
;; => Kconfig
;; Makefile
;; alarmtimer.c
;; clockevents.c
;; clocksource.c
;; hrtimer.c
;; itimer.c
;; jiffies.c
;; namespace.c
;; ntp.c
fd
tells us that there are 28 C files in /home/mpereira/linux/kernel/time
.
Let's see if we can get the project root, which would be /home/mpereira/linux
.
*scratch*
(with-current-buffer remote-file-buffer
(ffip-project-root))
;; => "/ssh:mpereira@remote-host:/home/mpereira/linux/"
That seems to work.
Let's now play with default-directory
. This is a buffer-local variable that
holds a buffer's working directory. By evaluating forms with a redefined
default-directory
it's possible to emulate being in another directory, which
could even be on a remote host. The code block below is an example of that—the
second form redefines default-directory
to be the project root.
*scratch*
(with-current-buffer remote-file-buffer
(shell-command-to-string "pwd"))
;; => "/home/mpereira/linux/kernel/time"
(with-current-buffer remote-file-buffer
(let ((default-directory (ffip-project-root)))
(shell-command-to-string "pwd")))
;; => /home/mpereira/linux
Nice!
I wonder how much Assembly and C are currently in the project.
*scratch*
(with-current-buffer remote-file-buffer
(let ((default-directory (ffip-project-root)))
(shell-command-to-string "fd --extension asm --extension s --exec-batch cat '{}' | wc -l")))
;; => 373663
(with-current-buffer remote-file-buffer
(let ((default-directory (ffip-project-root)))
(shell-command-to-string "fd --extension c --extension h | xargs cat | wc -l")))
;; => 27088162
Twenty seven million, eighty eight thousand, one hundred and sixty two lines of C, and almost half a million lines of Assembly. It's fine.
Alright, at this point it feels like we have all the pieces: let's put them together.
*scratch*
(defun my-project-find-file (&optional pattern)
"Prompt the user to filter, scroll and select a file from a list of all
project files matching PATTERN."
(interactive)
(let* ((default-directory (ffip-project-root))
(fd (executable-find "fd" t))
(fd-options "--color never")
(command (concat fd " " fd-options " " pattern))
(candidates (split-string (shell-command-to-string command) "\n" t)))
(ivy-read "File: "
candidates
:action (lambda (candidate)
(find-file candidate)))))
This is a bit longer than what we've been playing with, but even folks new to Emacs Lisp should be able to follow it:
- Redefine
default-directory
to be the project root directory (line 5) - Build, execute, and parse the output of the
fd
command into a list of file names (lines 6-9) - Display a file prompt showing a narrowed list of all files in the project (lines 10-13)
Let's see if it works.
*scratch*
(with-current-buffer remote-file-buffer
(my-project-find-file "jif"))
It does!
Since it was declared (interactive)
we can also to call it via
M-x my-project-find-file
.
Going back to the large remote project and running my-project-find-file
a few
times shows that it now runs in a little over a second—a 30x improvement
compared with what we started with.
This is still not good enough, so I went ahead and evolved the function we were working on to most of the time show something on screen immediately and redraw it asynchronously. You can check out the code at fast-project-find-file.el.
As an aside: having the whole text editor block for over a second while I wait for it to show something so simple is unacceptable. Through desensitization and acquiescence, we, users of software have come to expect that it will either not work at all, not work consistently, or exhibit poor or unpredictable performance.
Jonathan Blow addresses this situation somewhat entertainingly in "Preventing the Collapse of Civilization".
* * *
Did you notice how the function implementation came almost naturally from exploration? The immediate feedback from evaluating forms and modifying a live system—even though old news to Lisp programmers—is incredibly powerful. Combine it with an "extensible, customizable, self-documenting" environment and you have a very satisfying and productive means of creation.
This article is part of How to open a file in Emacs: A short story about Lisp, technology, and human progress, published in January 03, 2021.