| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
The root has depth == 0, but we still need to include it in the
components array.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is always possible to force a breadth-first traversal to encounter
ENAMETOOLONG, regardless of the dircache eviction policy. As a concrete
example, consider this directory structure:
./1/{NAME_MAX}/{NAME_MAX}/{NAME_MAX}/... (longer than {PATH_MAX})
./2/{NAME_MAX}/{NAME_MAX}/{NAME_MAX}/...
./3/{NAME_MAX}/{NAME_MAX}/{NAME_MAX}/...
...
(more than RLIMIT_NOFILE directories under .)
Eventually, the next file to be processed will not have any parents in
the cache, as the cache can only hold RLIMIT_NOFILE entries. Then the
whole path must be traversed from ., which will exceed {PATH_MAX} bytes.
Work around this by performing a component-by-component traversal
manually when we see ENAMETOOLONG. This is required by POSIX:
> The find utility shall be able to descend to arbitrary depths in a file
> hierarchy and shall not fail due to path length limitations (unless a
> path operand specified by the application exceeds {PATH_MAX}
> requirements).
|
|
|
|
|
|
|
| |
This reverts commit 20860becb5a0e89ee2aaaddbb0ba1eb248552640.
The terminating NUL will be useful for the upcoming per-component
traversal to handle ENAMETOOLONG.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Quadratic complexity is still possible for directory structures like
root -- a -- a -- a -- a ...
|
+- b -- b -- b -- b ...
But for most realistic directory structures, bfs should now spend less
time building paths.
(Of course if you print every path, overall complexity is quadratic
anyway.)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dircache_entry refcounts used to count every single descendant,
resulting in n refcount updates to create/delete an entry at depth n,
and thus O(n^2) complexity overall for deep directory structures.
Fix it by only counting direct children instead. The cache eviction
policy is changed to prefer shallower entries in all cases, attempting
to save at least some of the benefit of the previous accounting scheme.
Unfortunately, the average number of traversed components for openat()
calls still went up by ~20%, but the performance in practice is roughly
unchanged in my tests.
|
|
|
|
| |
Fixes #18.
|
|
|
|
| |
This simplifies a few things such as -name handling for ///.
|
|
|
|
| |
Can't forget to close it that way.
|
|
|
|
| |
This functionality is already part of GNU findutils git.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
If bftw_add() succeeds but dirqueue_push() fails, we need to clean up
the just-added dircache_entry. Otherwise it will leak, and we'll also
fail the cache->size == 0 assertion.
Fix it by extracting the dircache-related parts of bftw_pop() into a new
helper function bftw_gc(), and call it from bftw_pop() as well as the
bftw_push() failure path.
|
| |
|
|
|
|
|
| |
Based on a patch by Fangrui Song <i@maskray.me>.
Closes #16.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
The callback may modify the ftwbuf, but we check it after the callback
(for typeflag and statbuf). Have them mutate a copy instead.
|
|
|
|
|
| |
If stat() fails, they won't get filled in otherwise. Then cycle
detection would have read uninitialized values.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
With BFTW_RECOVER set, we're not supposed to fail just because a single
measly directory couldn't be handled. But using state.error as scratch
space made us fail in this case. The end result is that #7 resurfaced,
so fix it again.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
It was totally broken on filesystems that spit out DT_UNKNOWN.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
Instead of simple LRU, we now evict the open entry with the lowest
refcount. This reduces the average number of components passed to
openat() by a significant margin, and speeds bfs up by about ~5%.
|
|
|
|
| |
Otherwise we might close the found parent.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DIR*'s were being kept around so dirfd(dir) could be passed to future
openat() calls. But DIR*'s are big, holding a cache of filenames etc.
read by readdir().
Instead, store the raw fd and dup() it to open a DIR* with fdopendir().
This way we can call dirclose() as soon as possible, while still keeping
an open fd.
Ideally there would be a way to closedir() without invoking close() on
the underlying fd, but this is a good approximation.
Reduces memory footprint by around 64% in a large directory tree.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|