summaryrefslogtreecommitdiffstats
path: root/bftw.c
Commit message (Collapse)AuthorAgeFilesLines
* Unify broken symlink handlingTavian Barnes2017-08-121-16/+6
| | | | | | | | | Rather than open-code the fallback logic for broken symlinks everywhere it's needed, introduce a new xfstatat() utility function that performs the fallback automatically. Using xfstatat() consistently fixes a few bugs, including cases where broken symlinks are given as arguments to predicates like -samefile.
* bftw: Assert that the queue is empty when freeing itTavian Barnes2017-08-101-0/+1
|
* util: Define O_DIRECTORY to 0 if it's not already definedTavian Barnes2017-07-291-5/+1
|
* Re-license under the BSD Zero Clause LicenseTavian Barnes2017-07-271-10/+15
|
* Handle ENOTDIR the same as ENOENTTavian Barnes2017-07-091-1/+1
| | | | | | For a/b/c, ENOTDIR is returned instead of ENOENT if a or b are not directories. Handle this uniformly when detecting broken symlinks, readdir races, etc.
* bftw: Rename and refactor the internalsTavian Barnes2017-07-091-235/+257
|
* bftw: Fix ENAMETOOLONG handling when the root is closedTavian Barnes2017-07-081-2/+7
| | | | | The root has depth == 0, but we still need to include it in the components array.
* bftw: Recover from ENAMETOOLONGTavian Barnes2017-07-081-23/+99
| | | | | | | | | | | | | | | | | | | | | | | | It is always possible to force a breadth-first traversal to encounter ENAMETOOLONG, regardless of the dircache eviction policy. As a concrete example, consider this directory structure: ./1/{NAME_MAX}/{NAME_MAX}/{NAME_MAX}/... (longer than {PATH_MAX}) ./2/{NAME_MAX}/{NAME_MAX}/{NAME_MAX}/... ./3/{NAME_MAX}/{NAME_MAX}/{NAME_MAX}/... ... (more than RLIMIT_NOFILE directories under .) Eventually, the next file to be processed will not have any parents in the cache, as the cache can only hold RLIMIT_NOFILE entries. Then the whole path must be traversed from ., which will exceed {PATH_MAX} bytes. Work around this by performing a component-by-component traversal manually when we see ENAMETOOLONG. This is required by POSIX: > The find utility shall be able to descend to arbitrary depths in a file > hierarchy and shall not fail due to path length limitations (unless a > path operand specified by the application exceeds {PATH_MAX} > requirements).
* Revert "bftw: Don't store the terminating '\0' in dircache_entry names."Tavian Barnes2017-07-081-1/+2
| | | | | | | This reverts commit 20860becb5a0e89ee2aaaddbb0ba1eb248552640. The terminating NUL will be useful for the upcoming per-component traversal to handle ENAMETOOLONG.
* bftw: Remove unused parameter to dircache_entry_base()Tavian Barnes2017-05-171-5/+3
|
* Release 1.01.0Tavian Barnes2017-04-241-1/+1
|
* Move bftw_typeflag converters to util.cTavian Barnes2017-04-081-108/+2
|
* bftw: Only rebuild the part of the path that changesTavian Barnes2017-03-221-37/+50
| | | | | | | | | | | | | | Quadratic complexity is still possible for directory structures like root -- a -- a -- a -- a ... | +- b -- b -- b -- b ... But for most realistic directory structures, bfs should now spend less time building paths. (Of course if you print every path, overall complexity is quadratic anyway.)
* bftw: Fix quadratic reference counting complexityTavian Barnes2017-03-201-8/+15
| | | | | | | | | | | | | dircache_entry refcounts used to count every single descendant, resulting in n refcount updates to create/delete an entry at depth n, and thus O(n^2) complexity overall for deep directory structures. Fix it by only counting direct children instead. The cache eviction policy is changed to prefer shallower entries in all cases, attempting to save at least some of the benefit of the previous accounting scheme. Unfortunately, the average number of traversed components for openat() calls still went up by ~20%, but the performance in practice is roughly unchanged in my tests.
* Color link targets for -lsTavian Barnes2017-03-161-19/+1
| | | | Fixes #18.
* bftw: Make the nameoff of "///" point to "/"Tavian Barnes2017-02-091-0/+3
| | | | This simplifies a few things such as -name handling for ///.
* bftw: Add the DIR* to bftw_stateTavian Barnes2017-02-091-15/+39
| | | | Can't forget to close it that way.
* Add support for -x?type with multiple typesTavian Barnes2017-02-081-30/+26
| | | | This functionality is already part of GNU findutils git.
* bftw: Add mising closedir() to error pathTavian Barnes2017-02-071-0/+1
|
* bftw: Plug a leak if dirqueue_push() failsTavian Barnes2017-02-061-16/+28
| | | | | | | | | | If bftw_add() succeeds but dirqueue_push() fails, we need to clean up the just-added dircache_entry. Otherwise it will leak, and we'll also fail the cache->size == 0 assertion. Fix it by extracting the dircache-related parts of bftw_pop() into a new helper function bftw_gc(), and call it from bftw_pop() as well as the bftw_push() failure path.
* bftw: Compute nameoff correctly for the root in BFTW_DEPTH modeTavian Barnes2017-02-051-1/+5
|
* Implement -printf/-fprintfTavian Barnes2017-02-051-0/+1
| | | | | Based on a patch by Fangrui Song <i@maskray.me>. Closes #16.
* Implement -regex, -iregex, and -regextype/-ETavian Barnes2016-12-181-1/+4
|
* bftw: Clean up the dirqueue implementation a bitTavian Barnes2016-12-171-38/+34
|
* Move portability code into util.hTavian Barnes2016-12-041-2/+2
|
* bftw: Infer the flags in ftwbuf_stat()Tavian Barnes2016-11-231-5/+5
|
* bftw: Make a defensive copy of the ftwbufTavian Barnes2016-11-211-1/+4
| | | | | The callback may modify the ftwbuf, but we check it after the callback (for typeflag and statbuf). Have them mutate a copy instead.
* bftw: Always initialize dircache_entry::{dev,ino}Tavian Barnes2016-11-211-6/+7
| | | | | If stat() fails, they won't get filled in otherwise. Then cycle detection would have read uninitialized values.
* bftw: Make bftw_flags more similar to fts() options.Tavian Barnes2016-11-211-5/+9
|
* Check for readdir() errors everywhere.Tavian Barnes2016-11-141-14/+2
|
* bftw: Keep trailing slashes on the root in BFTW_DEPTH mode.Tavian Barnes2016-11-131-6/+16
|
* bftw: Don't fail just because we couldn't open/read a directory.Tavian Barnes2016-11-031-3/+3
| | | | | | | With BFTW_RECOVER set, we're not supposed to fail just because a single measly directory couldn't be handled. But using state.error as scratch space made us fail in this case. The end result is that #7 resurfaced, so fix it again.
* Implement -ignore_readdir_race.Tavian Barnes2016-10-241-1/+4
|
* bftw: Add support for some exotic file types, where available.Tavian Barnes2016-10-021-1/+59
|
* bftw: Handle errors from readdir().Tavian Barnes2016-10-021-25/+66
|
* bftw: Fix do/to typo in a comment.Tavian Barnes2016-09-101-1/+1
|
* bftw: Initialize typeflag to BFTW_UNKNOWN.Tavian Barnes2016-08-241-2/+1
| | | | It was totally broken on filesystems that spit out DT_UNKNOWN.
* dstring: Clean up the API a bit.Tavian Barnes2016-05-221-1/+4
|
* bftw: Use realloc() to grow the dirqueue.Tavian Barnes2016-05-171-13/+11
|
* bftw: Remove some debugging counters that were left in accidentally.Tavian Barnes2016-05-171-10/+0
|
* dstring: Split out the dynamic string logic.Tavian Barnes2016-04-131-68/+25
|
* bftw: Update at_flags when not following a broken symbolic link.Tavian Barnes2016-02-231-1/+2
|
* bftw: Plug a leak when the root is not a directory.Tavian Barnes2016-02-231-1/+6
|
* bftw: Use the currently open directory as at_fd in BFTW_CHILD mode.Tavian Barnes2016-02-221-2/+5
|
* bftw: Use O_CLOEXEC.Tavian Barnes2016-02-211-3/+3
|
* bftw: Don't store the terminating '\0' in dircache_entry names.Tavian Barnes2016-02-211-2/+1
|
* bftw: Use a better cache eviction policy.Tavian Barnes2016-02-211-113/+186
| | | | | | Instead of simple LRU, we now evict the open entry with the lowest refcount. This reduces the average number of components passed to openat() by a significant margin, and speeds bfs up by about ~5%.
* bftw: Shrink the LRU before finding the parent.Tavian Barnes2016-02-201-10/+6
| | | | Otherwise we might close the found parent.
* bftw: Clean up dirqueue implementation a bit.Tavian Barnes2016-02-191-20/+28
|
* bftw: Don't keep DIR*'s around.Tavian Barnes2016-02-191-40/+75
| | | | | | | | | | | | | | | DIR*'s were being kept around so dirfd(dir) could be passed to future openat() calls. But DIR*'s are big, holding a cache of filenames etc. read by readdir(). Instead, store the raw fd and dup() it to open a DIR* with fdopendir(). This way we can call dirclose() as soon as possible, while still keeping an open fd. Ideally there would be a way to closedir() without invoking close() on the underlying fd, but this is a good approximation. Reduces memory footprint by around 64% in a large directory tree.