| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
|
|
|
| |
The performance is within 1% with much simpler code.
|
|
|
|
|
|
| |
GCC doesn't (yet) produce very fast code for strcmp() against constant
strings (see https://gcc.gnu.org/PR78809), so hand-coding the comparison
manually helps significantly.
|
|
|
|
|
|
|
|
|
|
| |
There were two problems:
- bubble_down() would always bubble the entry all the way down the heap,
instead of stopping at the appropriate place
- remove() may need to bubble the replacement entry *up* the heap, if it
came from a different subtree
|
|
|
|
|
| |
This lets bfs transparently support the new statx() system call on
Linux, giving it access to file birth times.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
Rather than open-code the fallback logic for broken symlinks everywhere
it's needed, introduce a new xfstatat() utility function that performs
the fallback automatically.
Using xfstatat() consistently fixes a few bugs, including cases where
broken symlinks are given as arguments to predicates like -samefile.
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
For a/b/c, ENOTDIR is returned instead of ENOENT if a or b are not
directories. Handle this uniformly when detecting broken symlinks,
readdir races, etc.
|
| |
|
|
|
|
|
| |
The root has depth == 0, but we still need to include it in the
components array.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is always possible to force a breadth-first traversal to encounter
ENAMETOOLONG, regardless of the dircache eviction policy. As a concrete
example, consider this directory structure:
./1/{NAME_MAX}/{NAME_MAX}/{NAME_MAX}/... (longer than {PATH_MAX})
./2/{NAME_MAX}/{NAME_MAX}/{NAME_MAX}/...
./3/{NAME_MAX}/{NAME_MAX}/{NAME_MAX}/...
...
(more than RLIMIT_NOFILE directories under .)
Eventually, the next file to be processed will not have any parents in
the cache, as the cache can only hold RLIMIT_NOFILE entries. Then the
whole path must be traversed from ., which will exceed {PATH_MAX} bytes.
Work around this by performing a component-by-component traversal
manually when we see ENAMETOOLONG. This is required by POSIX:
> The find utility shall be able to descend to arbitrary depths in a file
> hierarchy and shall not fail due to path length limitations (unless a
> path operand specified by the application exceeds {PATH_MAX}
> requirements).
|
|
|
|
|
|
|
| |
This reverts commit 20860becb5a0e89ee2aaaddbb0ba1eb248552640.
The terminating NUL will be useful for the upcoming per-component
traversal to handle ENAMETOOLONG.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Quadratic complexity is still possible for directory structures like
root -- a -- a -- a -- a ...
|
+- b -- b -- b -- b ...
But for most realistic directory structures, bfs should now spend less
time building paths.
(Of course if you print every path, overall complexity is quadratic
anyway.)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dircache_entry refcounts used to count every single descendant,
resulting in n refcount updates to create/delete an entry at depth n,
and thus O(n^2) complexity overall for deep directory structures.
Fix it by only counting direct children instead. The cache eviction
policy is changed to prefer shallower entries in all cases, attempting
to save at least some of the benefit of the previous accounting scheme.
Unfortunately, the average number of traversed components for openat()
calls still went up by ~20%, but the performance in practice is roughly
unchanged in my tests.
|
|
|
|
| |
Fixes #18.
|
|
|
|
| |
This simplifies a few things such as -name handling for ///.
|
|
|
|
| |
Can't forget to close it that way.
|
|
|
|
| |
This functionality is already part of GNU findutils git.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
If bftw_add() succeeds but dirqueue_push() fails, we need to clean up
the just-added dircache_entry. Otherwise it will leak, and we'll also
fail the cache->size == 0 assertion.
Fix it by extracting the dircache-related parts of bftw_pop() into a new
helper function bftw_gc(), and call it from bftw_pop() as well as the
bftw_push() failure path.
|
| |
|
|
|
|
|
| |
Based on a patch by Fangrui Song <i@maskray.me>.
Closes #16.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
The callback may modify the ftwbuf, but we check it after the callback
(for typeflag and statbuf). Have them mutate a copy instead.
|
|
|
|
|
| |
If stat() fails, they won't get filled in otherwise. Then cycle
detection would have read uninitialized values.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
With BFTW_RECOVER set, we're not supposed to fail just because a single
measly directory couldn't be handled. But using state.error as scratch
space made us fail in this case. The end result is that #7 resurfaced,
so fix it again.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
It was totally broken on filesystems that spit out DT_UNKNOWN.
|
| |
|
| |
|
| |
|
| |
|
| |
|