| Notes on propagate_umount() |
| |
| Umount propagation starts with a set of mounts we are already going to |
| take out. Ideally, we would like to add all downstream cognates to |
| that set - anything with the same mountpoint as one of the removed |
| mounts and with parent that would receive events from the parent of that |
| mount. However, there are some constraints the resulting set must |
| satisfy. |
| |
| It is convenient to define several properties of sets of mounts: |
| |
| 1) A set S of mounts is non-shifting if for any mount X belonging |
| to S all subtrees mounted strictly inside of X (i.e. not overmounting |
| the root of X) contain only elements of S. |
| |
| 2) A set S is non-revealing if all locked mounts that belong to S have |
| parents that also belong to S. |
| |
| 3) A set S is closed if it contains all children of its elements. |
| |
| The set of mounts taken out by umount(2) must be non-shifting and |
| non-revealing; the first constraint is what allows to reparent |
| any remaining mounts and the second is what prevents the exposure |
| of any concealed mountpoints. |
| |
| propagate_umount() takes the original set as an argument and tries to |
| extend that set. The original set is a full subtree and its root is |
| unlocked; what matters is that it's closed and non-revealing. |
| Resulting set may not be closed; there might still be mounts outside |
| of that set, but only on top of stacks of root-overmounting elements |
| of set. They can be reparented to the place where the bottom of |
| stack is attached to a mount that will survive. NOTE: doing that |
| will violate a constraint on having no more than one mount with |
| the same parent/mountpoint pair; however, the caller (umount_tree()) |
| will immediately remedy that - it may keep unmounted element attached |
| to parent, but only if the parent itself is unmounted. Since all |
| conflicts created by reparenting have common parent *not* in the |
| set and one side of the conflict (bottom of the stack of overmounts) |
| is in the set, it will be resolved. However, we rely upon umount_tree() |
| doing that pretty much immediately after the call of propagate_umount(). |
| |
| Algorithm is based on two statements: |
| 1) for any set S, there is a maximal non-shifting subset of S |
| and it can be calculated in O(#S) time. |
| 2) for any non-shifting set S, there is a maximal non-revealing |
| subset of S. That subset is also non-shifting and it can be calculated |
| in O(#S) time. |
| |
| Finding candidates. |
| |
| We are given a closed set U and we want to find all mounts that have |
| the same mountpoint as some mount m in U *and* whose parent receives |
| propagation from the parent of the same mount m. Naive implementation |
| would be |
| S = {} |
| for each m in U |
| add m to S |
| p = parent(m) |
| for each q in Propagation(p) - {p} |
| child = look_up(q, mountpoint(m)) |
| if child |
| add child to S |
| but that can lead to excessive work - there might be propagation among the |
| subtrees of U, in which case we'd end up examining the same candidates |
| many times. Since propagation is transitive, the same will happen to |
| everything downstream of that candidate and it's not hard to construct |
| cases where the approach above leads to the time quadratic by the actual |
| number of candidates. |
| |
| Note that if we run into a candidate we'd already seen, it must've been |
| added on an earlier iteration of the outer loop - all additions made |
| during one iteration of the outer loop have different parents. So |
| if we find a child already added to the set, we know that everything |
| in Propagation(parent(child)) with the same mountpoint has been already |
| added. |
| S = {} |
| for each m in U |
| if m in S |
| continue |
| add m to S |
| p = parent(m) |
| q = propagation_next(p, p) |
| while q |
| child = look_up(q, mountpoint(m)) |
| if child |
| if child in S |
| q = skip_them(q, p) |
| continue; |
| add child to S |
| q = propagation_next(q, p) |
| where |
| skip_them(q, p) |
| keep walking Propagation(p) from q until we find something |
| not in Propagation(q) |
| |
| would get rid of that problem, but we need a sane implementation of |
| skip_them(). That's not hard to do - split propagation_next() into |
| "down into mnt_slave_list" and "forward-and-up" parts, with the |
| skip_them() being "repeat the forward-and-up part until we get NULL |
| or something that isn't a peer of the one we are skipping". |
| |
| Note that there can be no absolute roots among the extra candidates - |
| they all come from mount lookups. Absolute root among the original |
| set is _currently_ impossible, but it might be worth protecting |
| against. |
| |
| Maximal non-shifting subsets. |
| |
| Let's call a mount m in a set S forbidden in that set if there is a |
| subtree mounted strictly inside m and containing mounts that do not |
| belong to S. |
| |
| The set is non-shifting when none of its elements are forbidden in it. |
| |
| If mount m is forbidden in a set S, it is forbidden in any subset S' it |
| belongs to. In other words, it can't belong to any of the non-shifting |
| subsets of S. If we had a way to find a forbidden mount or show that |
| there's none, we could use it to find the maximal non-shifting subset |
| simply by finding and removing them until none remain. |
| |
| Suppose mount m is forbidden in S; then any mounts forbidden in S - {m} |
| must have been forbidden in S itself. Indeed, since m has descendents |
| that do not belong to S, any subtree that fits into S will fit into |
| S - {m} as well. |
| |
| So in principle we could go through elements of S, checking if they |
| are forbidden in S and removing the ones that are. Removals will |
| not invalidate the checks done for earlier mounts - if they were not |
| forbidden at the time we checked, they won't become forbidden later. |
| It's too costly to be practical, but there is a similar approach that |
| is linear by size of S. |
| |
| Let's say that mount x in a set S is forbidden by mount y, if |
| * both x and y belong to S. |
| * there is a chain of mounts starting at x and leaving S |
| immediately after passing through y, with the first |
| mountpoint strictly inside x. |
| Note 1: x may be equal to y - that's the case when something not |
| belonging to S is mounted strictly inside x. |
| Note 2: if y does not belong to S, it can't forbid anything in S. |
| Note 3: if y has no children outside of S, it can't forbid anything in S. |
| |
| It's easy to show that mount x is forbidden in S if and only if x is |
| forbidden in S by some mount y. And it's easy to find all mounts in S |
| forbidden by a given mount. |
| |
| Consider the following operation: |
| Trim(S, m) = S - {x : x is forbidden by m in S} |
| |
| Note that if m does not belong to S or has no children outside of S we |
| are guaranteed that Trim(S, m) is equal to S. |
| |
| The following is true: if x is forbidden by y in Trim(S, m), it was |
| already forbidden by y in S. |
| |
| Proof: Suppose x is forbidden by y in Trim(S, m). Then there is a |
| chain of mounts (x_0 = x, ..., x_k = y, x_{k+1} = r), such that x_{k+1} |
| is the first element that doesn't belong to Trim(S, m) and the |
| mountpoint of x_1 is strictly inside x. If mount r belongs to S, it must |
| have been removed by Trim(S, m), i.e. it was forbidden in S by m. |
| Then there was a mount chain from r to some child of m that stayed in |
| S all the way until m, but that's impossible since x belongs to Trim(S, m) |
| and prepending (x_0, ..., x_k) to that chain demonstrates that x is also |
| forbidden in S by m, and thus can't belong to Trim(S, m). |
| Therefore r can not belong to S and our chain demonstrates that |
| x is forbidden by y in S. QED. |
| |
| Corollary: no mount is forbidden by m in Trim(S, m). Indeed, any |
| such mount would have been forbidden by m in S and thus would have been |
| in the part of S removed in Trim(S, m). |
| |
| Corollary: no mount is forbidden by m in Trim(Trim(S, m), n). Indeed, |
| any such would have to have been forbidden by m in Trim(S, m), which |
| is impossible. |
| |
| Corollary: after |
| S = Trim(S, x_1) |
| S = Trim(S, x_2) |
| ... |
| S = Trim(S, x_k) |
| no mount remaining in S will be forbidden by either of x_1,...,x_k. |
| |
| The following will reduce S to its maximal non-shifting subset: |
| visited = {} |
| while S contains elements not belonging to visited |
| let m be an arbitrary such element of S |
| S = Trim(S, m) |
| add m to visited |
| |
| S never grows, so the number of elements of S not belonging to visited |
| decreases at least by one on each iteration. When the loop terminates, |
| all mounts remaining in S belong to visited. It's easy to see that at |
| the beginning of each iteration no mount remaining in S will be forbidden |
| by any element of visited. In other words, no mount remaining in S will |
| be forbidden, i.e. final value of S will be non-shifting. It will be |
| the maximal non-shifting subset, since we were removing only forbidden |
| elements. |
| |
| There are two difficulties in implementing the above in linear |
| time, both due to the fact that Trim() might need to remove more than one |
| element. Naive implementation of Trim() is vulnerable to running into a |
| long chain of mounts, each mounted on top of parent's root. Nothing in |
| that chain is forbidden, so nothing gets removed from it. We need to |
| recognize such chains and avoid walking them again on subsequent calls of |
| Trim(), otherwise we will end up with worst-case time being quadratic by |
| the number of elements in S. Another difficulty is in implementing the |
| outer loop - we need to iterate through all elements of a shrinking set. |
| That would be trivial if we never removed more than one element at a time |
| (linked list, with list_for_each_entry_safe for iterator), but we may |
| need to remove more than one entry, possibly including the ones we have |
| already visited. |
| |
| Let's start with naive algorithm for Trim(): |
| |
| Trim_one(m) |
| found = false |
| for each n in children(m) |
| if n not in S |
| found = true |
| if (mountpoint(n) != root(m)) |
| remove m from S |
| break |
| if found |
| Trim_ancestors(m) |
| |
| Trim_ancestors(m) |
| for (; parent(m) in S; m = parent(m)) { |
| if (mountpoint(m) != root(parent(m))) |
| remove parent(m) from S |
| } |
| |
| If m belongs to S, Trim_one(m) will replace S with Trim(S, m). |
| Proof: |
| Consider the chains excluding elements from Trim(S, m). The last |
| two elements in such chain are m and some child of m that does not belong |
| to S. If m has no such children, Trim(S, m) is equal to S. |
| m itself is removed if and only if the chain has exactly two |
| elements, i.e. when the last element does not overmount the root of m. |
| In other words, that happens when m has a child not in S that does not |
| overmount the root of m. |
| All other elements to remove will be ancestors of m, such that |
| the entire descent chain from them to m is contained in S. Let |
| (x_0, x_1, ..., x_k = m) be the longest such chain. x_i needs to be |
| removed if and only if x_{i+1} does not overmount its root. It's easy |
| to see that Trim_ancestors(m) will iterate through that chain from |
| x_k to x_1 and that it will remove exactly the elements that need to be |
| removed. |
| |
| Note that if the loop in Trim_ancestors() walks into an already |
| visited element, we are guaranteed that remaining iterations will see |
| only elements that had already been visited and remove none of them. |
| That's the weakness that makes it vulnerable to long chains of full |
| overmounts. |
| |
| It's easy to deal with, if we can afford setting marks on |
| elements of S; we would mark all elements already visited by |
| Trim_ancestors() and have it bail out as soon as it sees an already |
| marked element. |
| |
| The problems with iterating through the set can be dealt with in |
| several ways, depending upon the representation we choose for our set. |
| One useful observation is that we are given a closed subset in S - the |
| original set passed to propagate_umount(). Its elements can neither |
| forbid anything nor be forbidden by anything - all their descendents |
| belong to S, so they can not occur anywhere in any excluding chain. |
| In other words, the elements of that subset will remain in S until |
| the end and Trim_one(S, m) is a no-op for all m from that subset. |
| |
| That suggests keeping S as a disjoint union of a closed set U |
| ('will be unmounted, no matter what') and the set of all elements of |
| S that do not belong to U. That set ('candidates') is all we need |
| to iterate through. Let's represent it as a subset in a cyclic list, |
| consisting of all list elements that are marked as candidates (initially - |
| all of them). Then we could have Trim_ancestors() only remove the mark, |
| leaving the elements on the list. Then Trim_one() would never remove |
| anything other than its argument from the containing list, allowing to |
| use list_for_each_entry_safe() as iterator. |
| |
| Assuming that representation we get the following: |
| |
| list_for_each_entry_safe(m, ..., Candidates, ...) |
| Trim_one(m) |
| where |
| Trim_one(m) |
| if (m is not marked as a candidate) |
| strip the "seen by Trim_ancestors" mark from m |
| remove m from the Candidates list |
| return |
| |
| remove_this = false |
| found = false |
| for each n in children(m) |
| if n not in S |
| found = true |
| if (mountpoint(n) != root(m)) |
| remove_this = true |
| break |
| if found |
| Trim_ancestors(m) |
| if remove_this |
| strip the "seen by Trim_ancestors" mark from m |
| strip the "candidate" mark from m |
| remove m from the Candidate list |
| |
| Trim_ancestors(m) |
| for (p = parent(m); p is marked as candidate ; m = p, p = parent(p)) { |
| if m is marked as seen by Trim_ancestors |
| return |
| mark m as seen by Trim_ancestors |
| if (mountpoint(m) != root(p)) |
| strip the "candidate" mark from p |
| } |
| |
| Terminating condition in the loop in Trim_ancestors() is correct, |
| since that that loop will never run into p belonging to U - p is always |
| an ancestor of argument of Trim_one() and since U is closed, the argument |
| of Trim_one() would also have to belong to U. But Trim_one() is never |
| called for elements of U. In other words, p belongs to S if and only |
| if it belongs to candidates. |
| |
| Time complexity: |
| * we get no more than O(#S) calls of Trim_one() |
| * the loop over children in Trim_one() never looks at the same child |
| twice through all the calls. |
| * iterations of that loop for children in S are no more than O(#S) |
| in the worst case |
| * at most two children that are not elements of S are considered per |
| call of Trim_one(). |
| * the loop in Trim_ancestors() sets its mark once per iteration and |
| no element of S has is set more than once. |
| |
| In the end we may have some elements excluded from S by |
| Trim_ancestors() still stuck on the list. We could do a separate |
| loop removing them from the list (also no worse than O(#S) time), |
| but it's easier to leave that until the next phase - there we will |
| iterate through the candidates anyway. |
| |
| The caller has already removed all elements of U from their parents' |
| lists of children, which means that checking if child belongs to S is |
| equivalent to checking if it's marked as a candidate; we'll never see |
| the elements of U in the loop over children in Trim_one(). |
| |
| What's more, if we see that children(m) is empty and m is not |
| locked, we can immediately move m into the committed subset (remove |
| from the parent's list of children, etc.). That's one fewer mount we'll |
| have to look into when we check the list of children of its parent *and* |
| when we get to building the non-revealing subset. |
| |
| Maximal non-revealing subsets |
| |
| If S is not a non-revealing subset, there is a locked element x in S |
| such that parent of x is not in S. |
| |
| Obviously, no non-revealing subset of S may contain x. Removing such |
| elements one by one will obviously end with the maximal non-revealing |
| subset (possibly empty one). Note that removal of an element will |
| require removal of all its locked children, etc. |
| |
| If the set had been non-shifting, it will remain non-shifting after |
| such removals. |
| Proof: suppose S was non-shifting, x is a locked element of S, parent of x |
| is not in S and S - {x} is not non-shifting. Then there is an element m |
| in S - {x} and a subtree mounted strictly inside m, such that m contains |
| an element not in in S - {x}. Since S is non-shifting, everything in |
| that subtree must belong to S. But that means that this subtree must |
| contain x somewhere *and* that parent of x either belongs that subtree |
| or is equal to m. Either way it must belong to S. Contradiction. |
| |
| // same representation as for finding maximal non-shifting subsets: |
| // S is a disjoint union of a non-revealing set U (the ones we are committed |
| // to unmount) and a set of candidates, represented as a subset of list |
| // elements that have "is a candidate" mark on them. |
| // Elements of U are removed from their parents' lists of children. |
| // In the end candidates becomes empty and maximal non-revealing non-shifting |
| // subset of S is now in U |
| while (Candidates list is non-empty) |
| handle_locked(first(Candidates)) |
| |
| handle_locked(m) |
| if m is not marked as a candidate |
| strip the "seen by Trim_ancestors" mark from m |
| remove m from the list |
| return |
| cutoff = m |
| for (p = m; p in candidates; p = parent(p)) { |
| strip the "seen by Trim_ancestors" mark from p |
| strip the "candidate" mark from p |
| remove p from the Candidates list |
| if (!locked(p)) |
| cutoff = parent(p) |
| } |
| if p in U |
| cutoff = p |
| while m != cutoff |
| remove m from children(parent(m)) |
| add m to U |
| m = parent(m) |
| |
| Let (x_0, ..., x_n = m) be the maximal chain of descent of m within S. |
| * If it contains some elements of U, let x_k be the last one of those. |
| Then union of U with {x_{k+1}, ..., x_n} is obviously non-revealing. |
| * otherwise if all its elements are locked, then none of {x_0, ..., x_n} |
| may be elements of a non-revealing subset of S. |
| * otherwise let x_k be the first unlocked element of the chain. Then none |
| of {x_0, ..., x_{k-1}} may be an element of a non-revealing subset of |
| S and union of U and {x_k, ..., x_n} is non-revealing. |
| |
| handle_locked(m) finds which of these cases applies and adjusts Candidates |
| and U accordingly. U remains non-revealing, union of Candidates and |
| U still contains any non-revealing subset of S and after the call of |
| handle_locked(m) m is guaranteed to be not in Candidates list. So having |
| it called for each element of S would suffice to empty Candidates, |
| leaving U the maximal non-revealing subset of S. |
| |
| However, handle_locked(m) is a no-op when m belongs to U, so it's enough |
| to have it called for elements of Candidates list until none remain. |
| |
| Time complexity: number of calls of handle_locked() is limited by |
| #Candidates, each iteration of the first loop in handle_locked() removes |
| an element from the list, so their total number of executions is also |
| limited by #Candidates; number of iterations in the second loop is no |
| greater than the number of iterations of the first loop. |
| |
| |
| Reparenting |
| |
| After we'd calculated the final set, we still need to deal with |
| reparenting - if an element of the final set has a child not in it, |
| we need to reparent such child. |
| |
| Such children can only be root-overmounting (otherwise the set wouldn't |
| be non-shifting) and their parents can not belong to the original set, |
| since the original is guaranteed to be closed. |
| |
| |
| Putting all of that together |
| |
| The plan is to |
| * find all candidates |
| * trim down to maximal non-shifting subset |
| * trim down to maximal non-revealing subset |
| * reparent anything that needs to be reparented |
| * return the resulting set to the caller |
| |
| For the 2nd and 3rd steps we want to separate the set into growing |
| non-revealing subset, initially containing the original set ("U" in |
| terms of the pseudocode above) and everything we are still not sure about |
| ("candidates"). It means that for the output of the 1st step we'd like |
| the extra candidates separated from the stuff already in the original set. |
| For the 4th step we would like the additions to U separate from the |
| original set. |
| |
| So let's go for |
| * original set ("set"). Linkage via mnt_list |
| * undecided candidates ("candidates"). Subset of a list, |
| consisting of all its elements marked with a new flag (T_UMOUNT_CANDIDATE). |
| Initially all elements of the list will be marked that way; in the |
| end the list will become empty and no mounts will remain marked with |
| that flag. |
| * Reuse T_MARKED for "has been already seen by trim_ancestors()". |
| * anything in U that hadn't been in the original set - elements of |
| candidates will gradually be either discarded or moved there. In other |
| words, it's the candidates we have already decided to unmount. Its role |
| is reasonably close to the old "to_umount", so let's use that name. |
| Linkage via mnt_list. |
| |
| For gather_candidates() we'll need to maintain both candidates (S - |
| set) and intersection of S with set. Use T_UMOUNT_CANDIDATE for |
| all elements we encounter, putting the ones not already in the original |
| set into the list of candidates. When we are done, strip that flag from |
| all elements of the original set. That gives a cheap way to check |
| if element belongs to S (in gather_candidates) and to candidates |
| itself (at later stages). Call that predicate is_candidate(); it would |
| be m->mnt_t_flags & T_UMOUNT_CANDIDATE. |
| |
| All elements of the original set are marked with MNT_UMOUNT and we'll |
| need the same for elements added when joining the contents of to_umount |
| to set in the end. Let's set MNT_UMOUNT at the time we add an element |
| to to_umount; that's close to what the old 'umount_one' is doing, so |
| let's keep that name. It also gives us another predicate we need - |
| "belongs to union of set and to_umount"; will_be_unmounted() for now. |
| |
| Removals from the candidates list should strip both T_MARKED and |
| T_UMOUNT_CANDIDATE; call it remove_from_candidates_list(). |