Merge tag 'v5.15-rc4' into docs-next
This is needed to get a docs fix that entered via the DRM tree; testers
have requested it so that PDF builds in docs-next work again.
diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback b/Documentation/ABI/testing/sysfs-driver-xen-blkback
index ac2947b..a74dfe5 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
@@ -29,7 +29,7 @@
What: /sys/module/xen_blkback/parameters/buffer_squeeze_duration_ms
Date: December 2019
KernelVersion: 5.6
-Contact: SeongJae Park <sjpark@amazon.de>
+Contact: SeongJae Park <sj@kernel.org>
Description:
When memory pressure is reported to blkback this option
controls the duration in milliseconds that blkback will not
@@ -39,7 +39,7 @@
What: /sys/module/xen_blkback/parameters/feature_persistent
Date: September 2020
KernelVersion: 5.10
-Contact: SeongJae Park <sjpark@amazon.de>
+Contact: SeongJae Park <sj@kernel.org>
Description:
Whether to enable the persistent grants feature or not. Note
that this option only takes effect on newly created backends.
diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkfront b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
index 2800890..61fd173f 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkfront
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
@@ -12,7 +12,7 @@
What: /sys/module/xen_blkfront/parameters/feature_persistent
Date: September 2020
KernelVersion: 5.10
-Contact: SeongJae Park <sjpark@amazon.de>
+Contact: SeongJae Park <sj@kernel.org>
Description:
Whether to enable the persistent grants feature or not. Note
that this option only takes effect on newly created frontends.
diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
index fb578fb..4581527 100644
--- a/Documentation/admin-guide/mm/pagemap.rst
+++ b/Documentation/admin-guide/mm/pagemap.rst
@@ -196,6 +196,28 @@
in kpagecount, and tally up the number of pages that are only referenced
once.
+Exceptions for Shared Memory
+============================
+
+Page table entries for shared pages are cleared when the pages are zapped or
+swapped out. This makes swapped out pages indistinguishable from never-allocated
+ones.
+
+In kernel space, the swap location can still be retrieved from the page cache.
+However, values stored only on the normal PTE get lost irretrievably when the
+page is swapped out (i.e. SOFT_DIRTY).
+
+In user space, whether the page is present, swapped or none can be deduced with
+the help of lseek and/or mincore system calls.
+
+lseek() can differentiate between accessed pages (present or swapped out) and
+holes (none/non-allocated) by specifying the SEEK_DATA flag on the file where
+the pages are backed. For anonymous shared pages, the file can be found in
+``/proc/pid/map_files/``.
+
+mincore() can differentiate between pages in memory (present, including swap
+cache) and out of memory (swapped out or none/non-allocated).
+
Other notes
===========
diff --git a/Documentation/arm/marvell.rst b/Documentation/arm/marvell.rst
index 56bb592..8323c79 100644
--- a/Documentation/arm/marvell.rst
+++ b/Documentation/arm/marvell.rst
@@ -21,6 +21,7 @@
- Datasheet: https://web.archive.org/web/20210124231420/http://csclub.uwaterloo.ca/~board/ts7800/MV88F5182-datasheet.pdf
- Programmer's User Guide: https://web.archive.org/web/20210124231536/http://csclub.uwaterloo.ca/~board/ts7800/MV88F5182-opensource-manual.pdf
- User Manual: https://web.archive.org/web/20210124231631/http://csclub.uwaterloo.ca/~board/ts7800/MV88F5182-usermanual.pdf
+ - Functional Errata: https://web.archive.org/web/20210704165540/https://www.digriz.org.uk/ts78xx/88F5182_Functional_Errata.pdf
- 88F5281
- Datasheet: https://web.archive.org/web/20131028144728/http://www.ocmodshop.com/images/reviews/networking/qnap_ts409u/marvel_88f5281_data_sheet.pdf
@@ -212,6 +213,7 @@
arch/arm64/boot/dts/marvell/armada-37*
Armada 7K Flavors:
+ - 88F6040 (AP806 Quad 600 MHz + one CP110)
- 88F7020 (AP806 Dual + one CP110)
- 88F7040 (AP806 Quad + one CP110)
@@ -243,6 +245,23 @@
Device tree files:
arch/arm64/boot/dts/marvell/armada-80*
+ Octeon TX2 CN913x Flavors:
+ - CN9130 (AP807 Quad + one internal CP115)
+ - CN9131 (AP807 Quad + one internal CP115 + one external CP115 / 88F8215)
+ - CN9132 (AP807 Quad + one internal CP115 + two external CP115 / 88F8215)
+
+ Core:
+ ARM Cortex A72
+
+ Homepage:
+ https://web.archive.org/web/20200803150818/https://www.marvell.com/products/infrastructure-processors/multi-core-processors/octeon-tx2/octeon-tx2-cn9130.html
+
+ Product Brief:
+ https://web.archive.org/web/20200803150818/https://www.marvell.com/content/dam/marvell/en/public-collateral/embedded-processors/marvell-infrastructure-processors-octeon-tx2-cn913x-product-brief-2020-02.pdf
+
+ Device tree files:
+ arch/arm64/boot/dts/marvell/cn913*
+
Avanta family
-------------
diff --git a/Documentation/block/queue-sysfs.rst b/Documentation/block/queue-sysfs.rst
index 4dc7f0d..5fb4299 100644
--- a/Documentation/block/queue-sysfs.rst
+++ b/Documentation/block/queue-sysfs.rst
@@ -40,10 +40,11 @@
-------------------------
Devices that support discard functionality may have internal limits on
the number of bytes that can be trimmed or unmapped in a single operation.
-The discard_max_bytes parameter is set by the device driver to the maximum
-number of bytes that can be discarded in a single operation. Discard
-requests issued to the device must not exceed this limit. A discard_max_bytes
-value of 0 means that the device does not support discard functionality.
+The `discard_max_hw_bytes` parameter is set by the device driver to the
+maximum number of bytes that can be discarded in a single operation.
+Discard requests issued to the device must not exceed this limit.
+A `discard_max_hw_bytes` value of 0 means that the device does not support
+discard functionality.
discard_max_bytes (RW)
----------------------
diff --git a/Documentation/dev-tools/checkpatch.rst b/Documentation/dev-tools/checkpatch.rst
index f0956e9..5cbc846 100644
--- a/Documentation/dev-tools/checkpatch.rst
+++ b/Documentation/dev-tools/checkpatch.rst
@@ -710,6 +710,39 @@
See: https://www.kernel.org/doc/html/latest/process/coding-style.html#breaking-long-lines-and-strings
+ **SPLIT_STRING**
+ Quoted strings that appear as messages in userspace and can be
+ grepped, should not be split across multiple lines.
+
+ See: https://lore.kernel.org/lkml/20120203052727.GA15035@leaf/
+
+ **MULTILINE_DEREFERENCE**
+ A single dereferencing identifier spanned on multiple lines like::
+
+ struct_identifier->member[index].
+ member = <foo>;
+
+ is generally hard to follow. It can easily lead to typos and so makes
+ the code vulnerable to bugs.
+
+ If fixing the multiple line dereferencing leads to an 80 column
+ violation, then either rewrite the code in a more simple way or if the
+ starting part of the dereferencing identifier is the same and used at
+ multiple places then store it in a temporary variable, and use that
+ temporary variable only at all the places. For example, if there are
+ two dereferencing identifiers::
+
+ member1->member2->member3.foo1;
+ member1->member2->member3.foo2;
+
+ then store the member1->member2->member3 part in a temporary variable.
+ It not only helps to avoid the 80 column violation but also reduces
+ the program size by removing the unnecessary dereferences.
+
+ But if none of the above methods work then ignore the 80 column
+ violation because it is much easier to read a dereferencing identifier
+ on a single line.
+
**TRAILING_STATEMENTS**
Trailing statements (for example after any conditional) should be
on the next line.
@@ -845,6 +878,38 @@
Use the `fallthrough;` pseudo keyword instead of
`/* fallthrough */` like comments.
+ **TRAILING_SEMICOLON**
+ Macro definition should not end with a semicolon. The macro
+ invocation style should be consistent with function calls.
+ This can prevent any unexpected code paths::
+
+ #define MAC do_something;
+
+ If this macro is used within a if else statement, like::
+
+ if (some_condition)
+ MAC;
+
+ else
+ do_something;
+
+ Then there would be a compilation error, because when the macro is
+ expanded there are two trailing semicolons, so the else branch gets
+ orphaned.
+
+ See: https://lore.kernel.org/lkml/1399671106.2912.21.camel@joe-AO725/
+
+ **SINGLE_STATEMENT_DO_WHILE_MACRO**
+ For the multi-statement macros, it is necessary to use the do-while
+ loop to avoid unpredictable code paths. The do-while loop helps to
+ group the multiple statements into a single one so that a
+ function-like macro can be used as a function only.
+
+ But for the single statement macros, it is unnecessary to use the
+ do-while loop. Although the code is syntactically correct but using
+ the do-while loop is redundant. So remove the do-while loop for single
+ statement macros.
+
**WEAK_DECLARATION**
Using weak declarations like __attribute__((weak)) or __weak
can have unintended link defects. Avoid using them.
@@ -920,6 +985,11 @@
Your compiler (or rather your loader) automatically does
it for you.
+ **MULTIPLE_ASSIGNMENTS**
+ Multiple assignments on a single line makes the code unnecessarily
+ complicated. So on a single line assign value to a single variable
+ only, this makes the code more readable and helps avoid typos.
+
**RETURN_PARENTHESES**
return is not a function and as such doesn't need parentheses::
@@ -929,6 +999,13 @@
return bar;
+ **UNNECESSARY_INT**
+ int used after short, long and long long is unnecessary. So remove it.
+
+ **UNSPECIFIED_INT**
+ Kernel style prefers "unsigned int <foo>" over "unsigned <foo>" and
+ "signed int <foo>" over "signed <foo>".
+
Permissions
-----------
@@ -957,6 +1034,17 @@
Permission bits should use 4 digit octal permissions (like 0700 or 0444).
Avoid using any other base like decimal.
+ **SYMBOLIC_PERMS**
+ Permission bits in the octal form are more readable and easier to
+ understand than their symbolic counterparts because many command-line
+ tools use this notation. Experienced kernel developers have been using
+ these traditional Unix permission bits for decades and so they find it
+ easier to understand the octal notation than the symbolic macros.
+ For example, it is harder to read S_IWUSR|S_IRUGO than 0644, which
+ obscures the developer's intent rather than clarifying it.
+
+ See: https://lore.kernel.org/lkml/CA+55aFw5v23T-zvDZp-MmD_EYxF8WbafwwB59934FV7g21uMGQ@mail.gmail.com/
+
Spacing and Brackets
--------------------
@@ -1166,3 +1254,43 @@
**TYPO_SPELLING**
Some words may have been misspelled. Consider reviewing them.
+
+ **UNNECESSARY_ELSE**
+ Using an else statement just after a return or a break statement is
+ unnecassary. For example::
+
+ for (i = 0; i < 100; i++) {
+ int foo = bar();
+ if (foo < 1)
+ break;
+ else
+ usleep(1);
+ }
+
+ is generally better written as::
+
+ for (i = 0; i < 100; i++) {
+ int foo = bar();
+ if (foo < 1)
+ break;
+ usleep(1);
+ }
+
+ So remove the else statement. But suppose if a if-else statement each
+ with a single return statement, like::
+
+ if (foo)
+ return bar;
+ else
+ return baz;
+
+ then by removing the else statement::
+
+ if (foo)
+ return bar;
+ return baz;
+
+ their is no significant increase in the readability and one can argue
+ that the first form is more readable because of indentation, so for
+ such cases do not convert the existing code from first form to second
+ form or vice-versa.
diff --git a/Documentation/process/index.rst b/Documentation/process/index.rst
index dd231ff..9f1b884 100644
--- a/Documentation/process/index.rst
+++ b/Documentation/process/index.rst
@@ -27,6 +27,7 @@
submitting-patches
programming-language
coding-style
+ maintainer-handbooks
maintainer-pgp-guide
email-clients
kernel-enforcement-statement
diff --git a/Documentation/process/maintainer-handbooks.rst b/Documentation/process/maintainer-handbooks.rst
new file mode 100644
index 0000000..6af1abb
--- /dev/null
+++ b/Documentation/process/maintainer-handbooks.rst
@@ -0,0 +1,18 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _maintainer_handbooks_main:
+
+Subsystem and maintainer tree specific development process notes
+================================================================
+
+The purpose of this document is to provide subsystem specific information
+which is supplementary to the general development process handbook
+:ref:`Documentation/process <development_process_main>`.
+
+Contents:
+
+.. toctree::
+ :numbered:
+ :maxdepth: 2
+
+ maintainer-tip
diff --git a/Documentation/process/maintainer-tip.rst b/Documentation/process/maintainer-tip.rst
new file mode 100644
index 0000000..2b495c8
--- /dev/null
+++ b/Documentation/process/maintainer-tip.rst
@@ -0,0 +1,785 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+The tip tree handbook
+=====================
+
+What is the tip tree?
+---------------------
+
+The tip tree is a collection of several subsystems and areas of
+development. The tip tree is both a direct development tree and a
+aggregation tree for several sub-maintainer trees. The tip tree gitweb URL
+is: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
+
+The tip tree contains the following subsystems:
+
+ - **x86 architecture**
+
+ The x86 architecture development takes place in the tip tree except
+ for the x86 KVM and XEN specific parts which are maintained in the
+ corresponding subsystems and routed directly to mainline from
+ there. It's still good practice to Cc the x86 maintainers on
+ x86-specific KVM and XEN patches.
+
+ Some x86 subsystems have their own maintainers in addition to the
+ overall x86 maintainers. Please Cc the overall x86 maintainers on
+ patches touching files in arch/x86 even when they are not called out
+ by the MAINTAINER file.
+
+ Note, that ``x86@kernel.org`` is not a mailing list. It is merely a
+ mail alias which distributes mails to the x86 top-level maintainer
+ team. Please always Cc the Linux Kernel mailing list (LKML)
+ ``linux-kernel@vger.kernel.org``, otherwise your mail ends up only in
+ the private inboxes of the maintainers.
+
+ - **Scheduler**
+
+ Scheduler development takes place in the -tip tree, in the
+ sched/core branch - with occasional sub-topic trees for
+ work-in-progress patch-sets.
+
+ - **Locking and atomics**
+
+ Locking development (including atomics and other synchronization
+ primitives that are connected to locking) takes place in the -tip
+ tree, in the locking/core branch - with occasional sub-topic trees
+ for work-in-progress patch-sets.
+
+ - **Generic interrupt subsystem and interrupt chip drivers**:
+
+ - interrupt core development happens in the irq/core branch
+
+ - interrupt chip driver development also happens in the irq/core
+ branch, but the patches are usually applied in a separate maintainer
+ tree and then aggregated into irq/core
+
+ - **Time, timers, timekeeping, NOHZ and related chip drivers**:
+
+ - timekeeping, clocksource core, NTP and alarmtimer development
+ happens in the timers/core branch, but patches are usually applied in
+ a separate maintainer tree and then aggregated into timers/core
+
+ - clocksource/event driver development happens in the timers/core
+ branch, but patches are mostly applied in a separate maintainer tree
+ and then aggregated into timers/core
+
+ - **Performance counters core, architecture support and tooling**:
+
+ - perf core and architecture support development happens in the
+ perf/core branch
+
+ - perf tooling development happens in the perf tools maintainer
+ tree and is aggregated into the tip tree.
+
+ - **CPU hotplug core**
+
+ - **RAS core**
+
+ Mostly x86-specific RAS patches are collected in the tip ras/core
+ branch.
+
+ - **EFI core**
+
+ EFI development in the efi git tree. The collected patches are
+ aggregated in the tip efi/core branch.
+
+ - **RCU**
+
+ RCU development happens in the linux-rcu tree. The resulting changes
+ are aggregated into the tip core/rcu branch.
+
+ - **Various core code components**:
+
+ - debugobjects
+
+ - objtool
+
+ - random bits and pieces
+
+
+Patch submission notes
+----------------------
+
+Selecting the tree/branch
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In general, development against the head of the tip tree master branch is
+fine, but for the subsystems which are maintained separately, have their
+own git tree and are only aggregated into the tip tree, development should
+take place against the relevant subsystem tree or branch.
+
+Bug fixes which target mainline should always be applicable against the
+mainline kernel tree. Potential conflicts against changes which are already
+queued in the tip tree are handled by the maintainers.
+
+Patch subject
+^^^^^^^^^^^^^
+
+The tip tree preferred format for patch subject prefixes is
+'subsys/component:', e.g. 'x86/apic:', 'x86/mm/fault:', 'sched/fair:',
+'genirq/core:'. Please do not use file names or complete file paths as
+prefix. 'git log path/to/file' should give you a reasonable hint in most
+cases.
+
+The condensed patch description in the subject line should start with a
+uppercase letter and should be written in imperative tone.
+
+
+Changelog
+^^^^^^^^^
+
+The general rules about changelogs in the process documentation, see
+:ref:`Documentation/process/ <submittingpatches>`, apply.
+
+The tip tree maintainers set value on following these rules, especially on
+the request to write changelogs in imperative mood and not impersonating
+code or the execution of it. This is not just a whim of the
+maintainers. Changelogs written in abstract words are more precise and
+tend to be less confusing than those written in the form of novels.
+
+It's also useful to structure the changelog into several paragraphs and not
+lump everything together into a single one. A good structure is to explain
+the context, the problem and the solution in separate paragraphs and this
+order.
+
+Examples for illustration:
+
+ Example 1::
+
+ x86/intel_rdt/mbm: Fix MBM overflow handler during hot cpu
+
+ When a CPU is dying, we cancel the worker and schedule a new worker on a
+ different CPU on the same domain. But if the timer is already about to
+ expire (say 0.99s) then we essentially double the interval.
+
+ We modify the hot cpu handling to cancel the delayed work on the dying
+ cpu and run the worker immediately on a different cpu in same domain. We
+ donot flush the worker because the MBM overflow worker reschedules the
+ worker on same CPU and scans the domain->cpu_mask to get the domain
+ pointer.
+
+ Improved version::
+
+ x86/intel_rdt/mbm: Fix MBM overflow handler during CPU hotplug
+
+ When a CPU is dying, the overflow worker is canceled and rescheduled on a
+ different CPU in the same domain. But if the timer is already about to
+ expire this essentially doubles the interval which might result in a non
+ detected overflow.
+
+ Cancel the overflow worker and reschedule it immediately on a different CPU
+ in the same domain. The work could be flushed as well, but that would
+ reschedule it on the same CPU.
+
+ Example 2::
+
+ time: POSIX CPU timers: Ensure that variable is initialized
+
+ If cpu_timer_sample_group returns -EINVAL, it will not have written into
+ *sample. Checking for cpu_timer_sample_group's return value precludes the
+ potential use of an uninitialized value of now in the following block.
+ Given an invalid clock_idx, the previous code could otherwise overwrite
+ *oldval in an undefined manner. This is now prevented. We also exploit
+ short-circuiting of && to sample the timer only if the result will
+ actually be used to update *oldval.
+
+ Improved version::
+
+ posix-cpu-timers: Make set_process_cpu_timer() more robust
+
+ Because the return value of cpu_timer_sample_group() is not checked,
+ compilers and static checkers can legitimately warn about a potential use
+ of the uninitialized variable 'now'. This is not a runtime issue as all
+ call sites hand in valid clock ids.
+
+ Also cpu_timer_sample_group() is invoked unconditionally even when the
+ result is not used because *oldval is NULL.
+
+ Make the invocation conditional and check the return value.
+
+ Example 3::
+
+ The entity can also be used for other purposes.
+
+ Let's rename it to be more generic.
+
+ Improved version::
+
+ The entity can also be used for other purposes.
+
+ Rename it to be more generic.
+
+
+For complex scenarios, especially race conditions and memory ordering
+issues, it is valuable to depict the scenario with a table which shows
+the parallelism and the temporal order of events. Here is an example::
+
+ CPU0 CPU1
+ free_irq(X) interrupt X
+ spin_lock(desc->lock)
+ wake irq thread()
+ spin_unlock(desc->lock)
+ spin_lock(desc->lock)
+ remove action()
+ shutdown_irq()
+ release_resources() thread_handler()
+ spin_unlock(desc->lock) access released resources.
+ ^^^^^^^^^^^^^^^^^^^^^^^^^
+ synchronize_irq()
+
+Lockdep provides similar useful output to depict a possible deadlock
+scenario::
+
+ CPU0 CPU1
+ rtmutex_lock(&rcu->rt_mutex)
+ spin_lock(&rcu->rt_mutex.wait_lock)
+ local_irq_disable()
+ spin_lock(&timer->it_lock)
+ spin_lock(&rcu->mutex.wait_lock)
+ --> Interrupt
+ spin_lock(&timer->it_lock)
+
+
+Function references in changelogs
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+When a function is mentioned in the changelog, either the text body or the
+subject line, please use the format 'function_name()'. Omitting the
+brackets after the function name can be ambiguous::
+
+ Subject: subsys/component: Make reservation_count static
+
+ reservation_count is only used in reservation_stats. Make it static.
+
+The variant with brackets is more precise::
+
+ Subject: subsys/component: Make reservation_count() static
+
+ reservation_count() is only called from reservation_stats(). Make it
+ static.
+
+
+Backtraces in changelogs
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+See :ref:`backtraces`.
+
+Ordering of commit tags
+^^^^^^^^^^^^^^^^^^^^^^^
+
+To have a uniform view of the commit tags, the tip maintainers use the
+following tag ordering scheme:
+
+ - Fixes: 12char-SHA1 ("sub/sys: Original subject line")
+
+ A Fixes tag should be added even for changes which do not need to be
+ backported to stable kernels, i.e. when addressing a recently introduced
+ issue which only affects tip or the current head of mainline. These tags
+ are helpful to identify the original commit and are much more valuable
+ than prominently mentioning the commit which introduced a problem in the
+ text of the changelog itself because they can be automatically
+ extracted.
+
+ The following example illustrates the difference::
+
+ Commit
+
+ abcdef012345678 ("x86/xxx: Replace foo with bar")
+
+ left an unused instance of variable foo around. Remove it.
+
+ Signed-off-by: J.Dev <j.dev@mail>
+
+ Please say instead::
+
+ The recent replacement of foo with bar left an unused instance of
+ variable foo around. Remove it.
+
+ Fixes: abcdef012345678 ("x86/xxx: Replace foo with bar")
+ Signed-off-by: J.Dev <j.dev@mail>
+
+ The latter puts the information about the patch into the focus and
+ amends it with the reference to the commit which introduced the issue
+ rather than putting the focus on the original commit in the first place.
+
+ - Reported-by: ``Reporter <reporter@mail>``
+
+ - Originally-by: ``Original author <original-author@mail>``
+
+ - Suggested-by: ``Suggester <suggester@mail>``
+
+ - Co-developed-by: ``Co-author <co-author@mail>``
+
+ Signed-off: ``Co-author <co-author@mail>``
+
+ Note, that Co-developed-by and Signed-off-by of the co-author(s) must
+ come in pairs.
+
+ - Signed-off-by: ``Author <author@mail>``
+
+ The first Signed-off-by (SOB) after the last Co-developed-by/SOB pair is the
+ author SOB, i.e. the person flagged as author by git.
+
+ - Signed-off-by: ``Patch handler <handler@mail>``
+
+ SOBs after the author SOB are from people handling and transporting
+ the patch, but were not involved in development. SOB chains should
+ reflect the **real** route a patch took as it was propagated to us,
+ with the first SOB entry signalling primary authorship of a single
+ author. Acks should be given as Acked-by lines and review approvals
+ as Reviewed-by lines.
+
+ If the handler made modifications to the patch or the changelog, then
+ this should be mentioned **after** the changelog text and **above**
+ all commit tags in the following format::
+
+ ... changelog text ends.
+
+ [ handler: Replaced foo by bar and updated changelog ]
+
+ First-tag: .....
+
+ Note the two empty new lines which separate the changelog text and the
+ commit tags from that notice.
+
+ If a patch is sent to the mailing list by a handler then the author has
+ to be noted in the first line of the changelog with::
+
+ From: Author <author@mail>
+
+ Changelog text starts here....
+
+ so the authorship is preserved. The 'From:' line has to be followed
+ by a empty newline. If that 'From:' line is missing, then the patch
+ would be attributed to the person who sent (transported, handled) it.
+ The 'From:' line is automatically removed when the patch is applied
+ and does not show up in the final git changelog. It merely affects
+ the authorship information of the resulting Git commit.
+
+ - Tested-by: ``Tester <tester@mail>``
+
+ - Reviewed-by: ``Reviewer <reviewer@mail>``
+
+ - Acked-by: ``Acker <acker@mail>``
+
+ - Cc: ``cc-ed-person <person@mail>``
+
+ If the patch should be backported to stable, then please add a '``Cc:
+ stable@vger.kernel.org``' tag, but do not Cc stable when sending your
+ mail.
+
+ - Link: ``https://link/to/information``
+
+ For referring to an email on LKML or other kernel mailing lists,
+ please use the lkml.kernel.org redirector URL::
+
+ https://lkml.kernel.org/r/email-message@id
+
+ The kernel.org redirector is considered a stable URL, unlike other email
+ archives.
+
+ Maintainers will add a Link tag referencing the email of the patch
+ submission when they apply a patch to the tip tree. This tag is useful
+ for later reference and is also used for commit notifications.
+
+Please do not use combined tags, e.g. ``Reported-and-tested-by``, as
+they just complicate automated extraction of tags.
+
+
+Links to documentation
+^^^^^^^^^^^^^^^^^^^^^^
+
+Providing links to documentation in the changelog is a great help to later
+debugging and analysis. Unfortunately, URLs often break very quickly
+because companies restructure their websites frequently. Non-'volatile'
+exceptions include the Intel SDM and the AMD APM.
+
+Therefore, for 'volatile' documents, please create an entry in the kernel
+bugzilla https://bugzilla.kernel.org and attach a copy of these documents
+to the bugzilla entry. Finally, provide the URL of the bugzilla entry in
+the changelog.
+
+Patch resend or reminders
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+See :ref:`resend_reminders`.
+
+Merge window
+^^^^^^^^^^^^
+
+Please do not expect large patch series to be handled during the merge
+window or even during the week before. Such patches should be submitted in
+mergeable state *at* *least* a week before the merge window opens.
+Exceptions are made for bug fixes and *sometimes* for small standalone
+drivers for new hardware or minimally invasive patches for hardware
+enablement.
+
+During the merge window, the maintainers instead focus on following the
+upstream changes, fixing merge window fallout, collecting bug fixes, and
+allowing themselves a breath. Please respect that.
+
+The release candidate -rc1 is the starting point for new patches to be
+applied which are targeted for the next merge window.
+
+
+Git
+^^^
+
+The tip maintainers accept git pull requests from maintainers who provide
+subsystem changes for aggregation in the tip tree.
+
+Pull requests for new patch submissions are usually not accepted and do not
+replace proper patch submission to the mailing list. The main reason for
+this is that the review workflow is email based.
+
+If you submit a larger patch series it is helpful to provide a git branch
+in a private repository which allows interested people to easily pull the
+series for testing. The usual way to offer this is a git URL in the cover
+letter of the patch series.
+
+
+Coding style notes
+------------------
+
+Comment style
+^^^^^^^^^^^^^
+
+Sentences in comments start with an uppercase letter.
+
+Single line comments::
+
+ /* This is a single line comment */
+
+Multi-line comments::
+
+ /*
+ * This is a properly formatted
+ * multi-line comment.
+ *
+ * Larger multi-line comments should be split into paragraphs.
+ */
+
+No tail comments:
+
+ Please refrain from using tail comments. Tail comments disturb the
+ reading flow in almost all contexts, but especially in code::
+
+ if (somecondition_is_true) /* Don't put a comment here */
+ dostuff(); /* Neither here */
+
+ seed = MAGIC_CONSTANT; /* Nor here */
+
+ Use freestanding comments instead::
+
+ /* This condition is not obvious without a comment */
+ if (somecondition_is_true) {
+ /* This really needs to be documented */
+ dostuff();
+ }
+
+ /* This magic initialization needs a comment. Maybe not? */
+ seed = MAGIC_CONSTANT;
+
+Comment the important things:
+
+ Comments should be added where the operation is not obvious. Documenting
+ the obvious is just a distraction::
+
+ /* Decrement refcount and check for zero */
+ if (refcount_dec_and_test(&p->refcnt)) {
+ do;
+ lots;
+ of;
+ magic;
+ things;
+ }
+
+ Instead, comments should explain the non-obvious details and document
+ constraints::
+
+ if (refcount_dec_and_test(&p->refcnt)) {
+ /*
+ * Really good explanation why the magic things below
+ * need to be done, ordering and locking constraints,
+ * etc..
+ */
+ do;
+ lots;
+ of;
+ magic;
+ /* Needs to be the last operation because ... */
+ things;
+ }
+
+Function documentation comments:
+
+ To document functions and their arguments please use kernel-doc format
+ and not free form comments::
+
+ /**
+ * magic_function - Do lots of magic stuff
+ * @magic: Pointer to the magic data to operate on
+ * @offset: Offset in the data array of @magic
+ *
+ * Deep explanation of mysterious things done with @magic along
+ * with documentation of the return values.
+ *
+ * Note, that the argument descriptors above are arranged
+ * in a tabular fashion.
+ */
+
+ This applies especially to globally visible functions and inline
+ functions in public header files. It might be overkill to use kernel-doc
+ format for every (static) function which needs a tiny explanation. The
+ usage of descriptive function names often replaces these tiny comments.
+ Apply common sense as always.
+
+
+Documenting locking requirements
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+ Documenting locking requirements is a good thing, but comments are not
+ necessarily the best choice. Instead of writing::
+
+ /* Caller must hold foo->lock */
+ void func(struct foo *foo)
+ {
+ ...
+ }
+
+ Please use::
+
+ void func(struct foo *foo)
+ {
+ lockdep_assert_held(&foo->lock);
+ ...
+ }
+
+ In PROVE_LOCKING kernels, lockdep_assert_held() emits a warning
+ if the caller doesn't hold the lock. Comments can't do that.
+
+Bracket rules
+^^^^^^^^^^^^^
+
+Brackets should be omitted only if the statement which follows 'if', 'for',
+'while' etc. is truly a single line::
+
+ if (foo)
+ do_something();
+
+The following is not considered to be a single line statement even
+though C does not require brackets::
+
+ for (i = 0; i < end; i++)
+ if (foo[i])
+ do_something(foo[i]);
+
+Adding brackets around the outer loop enhances the reading flow::
+
+ for (i = 0; i < end; i++) {
+ if (foo[i])
+ do_something(foo[i]);
+ }
+
+
+Variable declarations
+^^^^^^^^^^^^^^^^^^^^^
+
+The preferred ordering of variable declarations at the beginning of a
+function is reverse fir tree order::
+
+ struct long_struct_name *descriptive_name;
+ unsigned long foo, bar;
+ unsigned int tmp;
+ int ret;
+
+The above is faster to parse than the reverse ordering::
+
+ int ret;
+ unsigned int tmp;
+ unsigned long foo, bar;
+ struct long_struct_name *descriptive_name;
+
+And even more so than random ordering::
+
+ unsigned long foo, bar;
+ int ret;
+ struct long_struct_name *descriptive_name;
+ unsigned int tmp;
+
+Also please try to aggregate variables of the same type into a single
+line. There is no point in wasting screen space::
+
+ unsigned long a;
+ unsigned long b;
+ unsigned long c;
+ unsigned long d;
+
+It's really sufficient to do::
+
+ unsigned long a, b, c, d;
+
+Please also refrain from introducing line splits in variable declarations::
+
+ struct long_struct_name *descriptive_name = container_of(bar,
+ struct long_struct_name,
+ member);
+ struct foobar foo;
+
+It's way better to move the initialization to a separate line after the
+declarations::
+
+ struct long_struct_name *descriptive_name;
+ struct foobar foo;
+
+ descriptive_name = container_of(bar, struct long_struct_name, member);
+
+
+Variable types
+^^^^^^^^^^^^^^
+
+Please use the proper u8, u16, u32, u64 types for variables which are meant
+to describe hardware or are used as arguments for functions which access
+hardware. These types are clearly defining the bit width and avoid
+truncation, expansion and 32/64-bit confusion.
+
+u64 is also recommended in code which would become ambiguous for 32-bit
+kernels when 'unsigned long' would be used instead. While in such
+situations 'unsigned long long' could be used as well, u64 is shorter
+and also clearly shows that the operation is required to be 64 bits wide
+independent of the target CPU.
+
+Please use 'unsigned int' instead of 'unsigned'.
+
+
+Constants
+^^^^^^^^^
+
+Please do not use literal (hexa)decimal numbers in code or initializers.
+Either use proper defines which have descriptive names or consider using
+an enum.
+
+
+Struct declarations and initializers
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Struct declarations should align the struct member names in a tabular
+fashion::
+
+ struct bar_order {
+ unsigned int guest_id;
+ int ordered_item;
+ struct menu *menu;
+ };
+
+Please avoid documenting struct members within the declaration, because
+this often results in strangely formatted comments and the struct members
+become obfuscated::
+
+ struct bar_order {
+ unsigned int guest_id; /* Unique guest id */
+ int ordered_item;
+ /* Pointer to a menu instance which contains all the drinks */
+ struct menu *menu;
+ };
+
+Instead, please consider using the kernel-doc format in a comment preceding
+the struct declaration, which is easier to read and has the added advantage
+of including the information in the kernel documentation, for example, as
+follows::
+
+
+ /**
+ * struct bar_order - Description of a bar order
+ * @guest_id: Unique guest id
+ * @ordered_item: The item number from the menu
+ * @menu: Pointer to the menu from which the item
+ * was ordered
+ *
+ * Supplementary information for using the struct.
+ *
+ * Note, that the struct member descriptors above are arranged
+ * in a tabular fashion.
+ */
+ struct bar_order {
+ unsigned int guest_id;
+ int ordered_item;
+ struct menu *menu;
+ };
+
+Static struct initializers must use C99 initializers and should also be
+aligned in a tabular fashion::
+
+ static struct foo statfoo = {
+ .a = 0,
+ .plain_integer = CONSTANT_DEFINE_OR_ENUM,
+ .bar = &statbar,
+ };
+
+Note that while C99 syntax allows the omission of the final comma,
+we recommend the use of a comma on the last line because it makes
+reordering and addition of new lines easier, and makes such future
+patches slightly easier to read as well.
+
+Line breaks
+^^^^^^^^^^^
+
+Restricting line length to 80 characters makes deeply indented code hard to
+read. Consider breaking out code into helper functions to avoid excessive
+line breaking.
+
+The 80 character rule is not a strict rule, so please use common sense when
+breaking lines. Especially format strings should never be broken up.
+
+When splitting function declarations or function calls, then please align
+the first argument in the second line with the first argument in the first
+line::
+
+ static int long_function_name(struct foobar *barfoo, unsigned int id,
+ unsigned int offset)
+ {
+
+ if (!id) {
+ ret = longer_function_name(barfoo, DEFAULT_BARFOO_ID,
+ offset);
+ ...
+
+Namespaces
+^^^^^^^^^^
+
+Function/variable namespaces improve readability and allow easy
+grepping. These namespaces are string prefixes for globally visible
+function and variable names, including inlines. These prefixes should
+combine the subsystem and the component name such as 'x86_comp\_',
+'sched\_', 'irq\_', and 'mutex\_'.
+
+This also includes static file scope functions that are immediately put
+into globally visible driver templates - it's useful for those symbols
+to carry a good prefix as well, for backtrace readability.
+
+Namespace prefixes may be omitted for local static functions and
+variables. Truly local functions, only called by other local functions,
+can have shorter descriptive names - our primary concern is greppability
+and backtrace readability.
+
+Please note that 'xxx_vendor\_' and 'vendor_xxx_` prefixes are not
+helpful for static functions in vendor-specific files. After all, it
+is already clear that the code is vendor-specific. In addition, vendor
+names should only be for truly vendor-specific functionality.
+
+As always apply common sense and aim for consistency and readability.
+
+
+Commit notifications
+--------------------
+
+The tip tree is monitored by a bot for new commits. The bot sends an email
+for each new commit to a dedicated mailing list
+(``linux-tip-commits@vger.kernel.org``) and Cc's all people who are
+mentioned in one of the commit tags. It uses the email message ID from the
+Link tag at the end of the tag list to set the In-Reply-To email header so
+the message is properly threaded with the patch submission email.
+
+The tip maintainers and submaintainers try to reply to the submitter
+when merging a patch, but they sometimes forget or it does not fit the
+workflow of the moment. While the bot message is purely mechanical, it
+also implies a 'Thank you! Applied.'.
diff --git a/Documentation/process/submitting-patches.rst b/Documentation/process/submitting-patches.rst
index 8ad6b93..21125d2 100644
--- a/Documentation/process/submitting-patches.rst
+++ b/Documentation/process/submitting-patches.rst
@@ -21,6 +21,10 @@
use it, it will make your life as a kernel developer and in general much
easier.
+Some subsystems and maintainer trees have additional information about
+their workflow and expectations, see :ref:`Documentation/process/maintainer
+handbooks <maintainer_handbooks_main>`.
+
Obtain a current source tree
----------------------------
@@ -326,6 +330,7 @@
See Documentation/process/email-clients.rst for recommendations on email
clients and mailing list etiquette.
+.. _resend_reminders:
Don't get discouraged - or impatient
------------------------------------
@@ -711,6 +716,8 @@
See more details on the proper patch format in the following
references.
+.. _backtraces:
+
Backtraces in commit mesages
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/Documentation/timers/no_hz.rst b/Documentation/timers/no_hz.rst
index 6cadad7..20ad23a 100644
--- a/Documentation/timers/no_hz.rst
+++ b/Documentation/timers/no_hz.rst
@@ -70,6 +70,10 @@
is to force a busy CPU to shift its attention among multiple duties,
and an idle CPU has no duties to shift its attention among.
+An idle CPU that is not receiving scheduling-clock interrupts is said to
+be "dyntick-idle", "in dyntick-idle mode", "in nohz mode", or "running
+tickless". The remainder of this document will use "dyntick-idle mode".
+
The CONFIG_NO_HZ_IDLE=y Kconfig option causes the kernel to avoid sending
scheduling-clock interrupts to idle CPUs, which is critically important
both to battery-powered devices and to highly virtualized mainframes.
@@ -91,10 +95,6 @@
run CONFIG_HZ_PERIODIC=y kernels (or CONFIG_NO_HZ=n for older kernels)
in order to avoid degrading from-idle transition latencies.
-An idle CPU that is not receiving scheduling-clock interrupts is said to
-be "dyntick-idle", "in dyntick-idle mode", "in nohz mode", or "running
-tickless". The remainder of this document will use "dyntick-idle mode".
-
There is also a boot parameter "nohz=" that can be used to disable
dyntick-idle mode in CONFIG_NO_HZ_IDLE=y kernels by specifying "nohz=off".
By default, CONFIG_NO_HZ_IDLE=y kernels boot with "nohz=on", enabling
diff --git a/Documentation/translations/ko_KR/memory-barriers.txt b/Documentation/translations/ko_KR/memory-barriers.txt
index 64d932f..75aa553 100644
--- a/Documentation/translations/ko_KR/memory-barriers.txt
+++ b/Documentation/translations/ko_KR/memory-barriers.txt
@@ -1,6 +1,6 @@
NOTE:
This is a version of Documentation/memory-barriers.txt translated into Korean.
-This document is maintained by SeongJae Park <sj38.park@gmail.com>.
+This document is maintained by SeongJae Park <sj@kernel.org>.
If you find any difference between this document and the original file or
a problem with the translation, please contact the maintainer of this file.
@@ -10,13 +10,13 @@
update the original English file first. The English version is
definitive, and readers should look there if they have any doubt.
-===================================
+=================================
이 문서는
Documentation/memory-barriers.txt
의 한글 번역입니다.
-역자: 박성재 <sj38.park@gmail.com>
-===================================
+역자: 박성재 <sj@kernel.org>
+=================================
=========================
diff --git a/Documentation/translations/zh_CN/admin-guide/index.rst b/Documentation/translations/zh_CN/admin-guide/index.rst
index 460034c..83db842 100644
--- a/Documentation/translations/zh_CN/admin-guide/index.rst
+++ b/Documentation/translations/zh_CN/admin-guide/index.rst
@@ -67,6 +67,7 @@
cpu-load
lockup-watchdogs
unicode
+ sysrq
Todolist:
@@ -118,7 +119,6 @@
rtc
serial-console
svga
- sysrq
thunderbolt
ufs
vga-softcursor
diff --git a/Documentation/translations/zh_CN/admin-guide/sysrq.rst b/Documentation/translations/zh_CN/admin-guide/sysrq.rst
new file mode 100644
index 0000000..8276d70
--- /dev/null
+++ b/Documentation/translations/zh_CN/admin-guide/sysrq.rst
@@ -0,0 +1,280 @@
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/admin-guide/sysrq.rst
+
+:翻译:
+
+ 黄军华 Junhua Huang <huang.junhua@zte.com.cn>
+
+:校译:
+
+ 司延腾 Yanteng Si <siyanteng@loongson.cn>
+
+.. _cn_admin-guide_sysrq:
+
+Linux 魔法系统请求键骇客
+========================
+
+针对 sysrq.c 的文档说明
+
+什么是魔法 SysRq 键?
+~~~~~~~~~~~~~~~~~~~~~
+
+它是一个你可以输入的具有魔法般的组合键。
+无论内核在做什么,内核都会响应 SysRq 键的输入,除非内核完全卡死。
+
+如何使能魔法 SysRq 键?
+~~~~~~~~~~~~~~~~~~~~~~~
+
+在配置内核时,我们需要设置 'Magic SysRq key (CONFIG_MAGIC_SYSRQ)' 为 'Y'。
+当运行一个编译进 sysrq 功能的内核时,/proc/sys/kernel/sysrq 控制着被
+SysRq 键调用的功能许可。这个文件的默认值由 CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE
+配置符号设定,文件本身默认设置为 1。以下是 /proc/sys/kernel/sysrq 中可能的
+值列表:
+
+ - 0 - 完全不使能 SysRq 键
+ - 1 - 使能 SysRq 键的全部功能
+ - >1 - 对于允许的 SysRq 键功能的比特掩码(参见下面更详细的功能描述)::
+
+ 2 = 0x2 - 使能对控制台日志记录级别的控制
+ 4 = 0x4 - 使能对键盘的控制 (SAK, unraw)
+ 8 = 0x8 - 使能对进程的调试导出等
+ 16 = 0x10 - 使能同步命令
+ 32 = 0x20 - 使能重新挂载只读
+ 64 = 0x40 - 使能对进程的信号操作 (term, kill, oom-kill)
+ 128 = 0x80 - 允许重启、断电
+ 256 = 0x100 - 允许让所有实时任务变普通任务
+
+你可以通过如下命令把值设置到这个文件中::
+
+ echo "number" >/proc/sys/kernel/sysrq
+
+这里被写入的 number 可以是 10 进制数,或者是带着 0x 前缀的 16 进制数。
+CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE 必须是以 16 进制数写入。
+
+注意,``/proc/sys/kernel/sysrq`` 的值只影响通过键盘触发 SySRq 的调用,对于
+通过 ``/proc/sysrq-trigger`` 的任何操作调用都是允许的
+(通过具有系统权限的用户)。
+
+如何使用魔法 SysRq 键?
+~~~~~~~~~~~~~~~~~~~~~~~
+
+在 x86 架构上
+ 你可以按下键盘组合键 :kbd:`ALT-SysRq-<command key>`。
+
+ .. note::
+ 一些键盘可能没有标识 'SySRq' 键。'SySRq' 键也被当做 'Print Screen'键。
+ 同时有些键盘无法处理同时按下这么多键,因此你可以先按下键盘 :kbd:`Alt` 键,
+ 然后按下键盘 :kbd:`SysRq` 键,再释放键盘 :kbd:`SysRq` 键,之后按下键盘上命令键
+ :kbd:`<command key>`,最后释放所有键。
+
+在 SPARC 架构上
+ 你可以按下键盘组合键 :kbd:`ALT-STOP-<command key>` 。
+
+在串行控制台(只针对 PC 类型的标准串口)
+ 你可以发一个 ``BREAK`` ,然后在 5 秒内发送一个命令键,
+ 发送 ``BREAK`` 两次将被翻译为一个正常的 BREAK 操作。
+
+在 PowerPC 架构上
+ 按下键盘组合键 :kbd:`ALT - Print Screen` (或者 :kbd:`F13`) - :kbd:`<命令键>` 。
+ :kbd:`Print Screen` (或者 :kbd:`F13`) - :kbd:`<命令键>` 或许也能实现。
+
+在其他架构上
+ 如果你知道其他架构的组合键,请告诉我,我可以把它们添加到这部分。
+
+在所有架构上
+ 写一个字符到 /proc/sysrq-trigger 文件,例如::
+
+ echo t > /proc/sysrq-trigger
+
+这个命令键 :kbd:`<command key>` 是区分大小写的。
+
+什么是命令键?
+~~~~~~~~~~~~~~
+
+=========== ================================================================
+命令键 功能
+=========== ================================================================
+``b`` 将立即重启系统,不会同步或者卸载磁盘。
+
+``c`` 将执行系统 crash,如果配置了系统 crashdump,将执行 crashdump。
+
+``d`` 显示所有持有的锁。
+
+``e`` 发送 SIGTERM 信号给所有进程,除了 init 进程。
+
+``f`` 将调用 oom killer 杀掉一个过度占用内存的进程,如果什么任务都没杀,
+ 也不会 panic。
+
+``g`` kgdb 使用(内核调试器)。
+
+``h`` 将会显示帮助。(实际上除了这里列举的键,其他的都将显示帮助,
+ 但是 ``h`` 容易记住):-)
+
+``i`` 发送 SIGKILL 给所有进程,除了 init 进程。
+
+``j`` 强制性的 “解冻它” - 用于被 FIFREEZE ioctl 操作冻住的文件系统。
+
+``k`` 安全访问秘钥(SAK)杀掉在当前虚拟控制台的所有程序,注意:参考
+ 下面 SAK 节重要论述。
+
+``l`` 显示所有活动 cpu 的栈回溯。
+
+``m`` 将导出当前内存信息到你的控制台。
+
+``n`` 用于使所有实时任务变成普通任务。
+
+``o`` 将关闭系统(如果配置和支持的话)。
+
+``p`` 将导出当前寄存器和标志位到控制台。
+
+``q`` 将导出每个 cpu 上所有已装备的高精度定时器(不是完整的
+ time_list 文件显示的 timers)和所有时钟事件设备的详细信息。
+
+``r`` 关闭键盘的原始模式,设置为转换模式。
+
+``s`` 将尝试同步所有的已挂载文件系统。
+
+``t`` 将导出当前所有任务列表和它们的信息到控制台。
+
+``u`` 将尝试重新挂载已挂载文件系统为只读。
+
+``v`` 强制恢复帧缓存控制台。
+``v`` 触发 ETM 缓存导出 [ARM 架构特有]
+
+``w`` 导出处于不可中断状态(阻塞)的任务。
+
+``x`` 在 ppc/powerpc 架构上用于 xmon 接口。
+ 在 sparc64 架构上用于显示全局的 PMU(性能监控单元)寄存器。
+ 在 MIPS 架构上导出所有的 tlb 条目。
+
+``y`` 显示全局 cpu 寄存器 [SPARC-64 架构特有]
+
+``z`` 导出 ftrace 缓存信息
+
+``0``-``9`` 设置控制台日志级别,该级别控制什么样的内核信息将被打印到你的
+ 控制台。(比如 ``0`` ,将使得只有紧急信息,像 PANICs or OOPSes
+ 才能到你的控制台。)
+=========== ================================================================
+
+好了,我能用他们做什么呢?
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+嗯,当你的 X 服务端或者 svgalib 程序崩溃,unraw(r) 非原始模式命令键是非常
+方便的。
+
+sak(k)(安全访问秘钥)在你尝试登陆的同时,又想确保当前控制台没有可以获取你的
+密码的特洛伊木马程序运行时是有用的。它会杀掉给定控制台的所有程序,这样你
+就可以确认当前的登陆提示程序是实际来自 init 进程的程序,而不是某些特洛伊
+木马程序。
+
+.. important::
+
+ 在其实际的形式中,在兼容 C2 安全标准的系统上,它不是一个真正的 SAK,
+ 它也不应该误认为此。
+
+似乎其他人发现其可以作为(系统终端联机键)当你想退出一个程序,
+同时不会让你切换控制台的方法。(比如,X 服务端或者 svgalib 程序)
+
+``reboot(b)`` 是个好方法,当你不能关闭机器时,它等同于按下"复位"按钮。
+
+``crash(c)`` 可以用于手动触发一个 crashdump,当系统卡住时。
+注意当 crashdump 机制不可用时,这个只是触发一个内核 crash。
+
+``sync(s)`` 在拔掉可移动介质之前,或者在使用不提供优雅关机的
+救援 shell 之后很方便 -- 它将确保你的数据被安全地写入磁盘。注意,在你看到
+屏幕上出现 "OK" 和 "Done" 之前,同步还没有发生。
+
+``umount(u)`` 可以用来标记文件系统正常卸载,从正在运行的系统角度来看,它们将
+被重新挂载为只读。这个重新挂载动作直到你看到 "OK" 和 "Done" 信息出现在屏幕上
+才算完成。
+
+日志级别 ``0`` - ``9`` 用于当你的控制台被大量的内核信息冲击,你不想看见的时候。
+选择 ``0`` 将禁止除了最紧急的内核信息外的所有的内核信息输出到控制台。(但是如果
+syslogd/klogd 进程是运行的,它们仍将被记录。)
+
+``term(e)`` 和 ``kill(i)`` 用于当你有些有点失控的进程,你无法通过其他方式杀掉
+它们的时候,特别是它正在创建其他进程。
+
+"just thaw ``it(j)`` " 用于当你的系统由于一个 FIFREEZE ioctl 调用而产生的文件
+系统冻结,而导致的不响应时。
+
+有的时候 SysRq 键在使用它之后,看起来像是“卡住”了,我能做些什么?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+这也会发生在我这,我发现轻敲键盘两侧的 shift、alt 和 control 键,然后再次敲击
+一个无效的 SysRq 键序列可以解决问题。(比如,像键盘组合键 :kbd:`alt-sysrq-z` )
+切换到另一个虚拟控制台(键盘操作 :kbd:`ALT+Fn` ),然后再切回来应该也有帮助。
+
+我敲击了 SysRq 键,但像是什么都没发生,发生了什么错误?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+有一些键盘对于 SysRq 键设置了不同的键值,而不是提前定义的 99
+(查看在 ``include/uapi/linux/input-event-codes.h`` 文件中 ``KEY_SYSRQ`` 的定义)
+或者就根本没有 SysRq 键。在这些场景下,执行 ``showkey -s`` 命令来找到一个合适
+的扫描码序列,然后使用 ``setkeycodes <sequence> 99`` 命令映射这个序列值到通用
+的 SysRq 键编码上(比如 ``setkeycodes e05b 99`` )。最好将这个命令放在启动脚本
+中。
+哦,顺便说一句,你十秒钟不输入任何东西就将退出 “showkey”。
+
+我想添加一个 SysRq 键事件到一个模块中,如何去做呢?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+为了注册一个基础函数到这个表中,首先你必须包含 ``include/linux/sysrq.h`` 头
+文件,这个头文件定义了你所需要的所有东西。然后你必须创建一个 ``sysrq_key_op``
+结构体,然后初始化它,使用如下内容,A) 你将使用的这个键的处理函数, B) 一个
+help_msg 字符串,在 SysRq 键打印帮助信息时将打印出来,C) 一个 action_msg 字
+符串,就在你的处理函数调用前打印出来。你的处理函数必须符合在 'sysrq.h' 文件中
+的函数原型。
+
+在 ``sysrq_key_op`` 结构体被创建后,你可以调用内核函数
+``register_sysrq_key(int key, const struct sysrq_key_op *op_p);``,
+该函数在表中的 'key' 对应位置内容是空的情况下,将通过 ``op_p`` 指针注册这个操作
+函数到表中 'key' 对应位置上。在模块卸载的时候,你必须调用
+``unregister_sysrq_key(int key, const struct sysrq_key_op *op_p)`` 函数,该函数
+只有在当前该键对应的处理函数被注册到了 'key' 对应位置时,才会移除 'op_p' 指针
+对应的键值操作函数。这是为了防止在你注册之后,该位置被改写的情况。
+
+魔法 SysRq 键系统的工作原理是将键对应操作函数注册到键的操作查找表,
+该表定义在 'drivers/tty/sysrq.c' 文件中。
+该键表有许多在编译时候就注册进去的操作函数,但是是可变的。
+并且有两个函数作为操作该表的接口被导出::
+
+ register_sysrq_key 和 unregister_sysrq_key.
+
+当然,永远不要在表中留下无效指针,即,当你的模块存在调用 register_sysrq_key()
+函数,它一定要调用 unregister_sysrq_key() 来清除它使用过的 SysRq 键表条目。
+表中的空指针是安全的。:)
+
+如果对于某种原因,在 handle_sysrq 调用的处理函数中,你认为有必要调用
+handle_sysrq 函数时,你必须意识到当前你处于一个锁中(你同时也处于一个中断处理
+函数中,这意味着不能睡眠)。所以这时你必须使用 ``__handle_sysrq_nolock`` 替代。
+
+当我敲击一个 SysRq 组合键时,只有标题打印出现在控制台?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+SysRq 键的输出和所有其他控制台输出一样,受制于控制台日志级别控制。
+这意味着,如果内核以发行版内核中常见的 "quiet" 方式启动,则输出可能不会出现在实际
+的控制台上,即使它会出现在 dmesg 缓存中,也可以通过 dmesg 命令和 ``/proc/kmsg``
+文件的消费访问到。作为一个特例,来自 sysrq 命令的标题行将被传递给所有控制台
+使用者,就好像当前日志级别是最大的一样。如果只发出标题头,则几乎可以肯定内核日志
+级别太低。如果你需要控制台上的输出,那么你将需要临时提高控制台日志级别,通过使用
+键盘组合键 :kbd:`alt-sysrq-8` 或者::
+
+ echo 8 > /proc/sysrq-trigger
+
+在触发了你感兴趣的 SysRq 键命令后,记得恢复日志级别到正常情况。
+
+我有很多问题时,可以请教谁?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+请教在内核邮件列表上的人,邮箱:
+ linux-kernel@vger.kernel.org
+
+致谢
+~~~~
+
+- Mydraal <vulpyne@vulpyne.net> 撰写了该文件
+- Adam Sulmicki <adam@cfar.umd.edu> 进行了更新
+- Jeremy M. Dolan <jmd@turbogeek.org> 在 2001/01/28 10:15:59 进行了更新
+- Crutcher Dunnavant <crutcher+kernel@datastacks.com> 添加键注册部分
diff --git a/Documentation/translations/zh_CN/core-api/boot-time-mm.rst b/Documentation/translations/zh_CN/core-api/boot-time-mm.rst
new file mode 100644
index 0000000..9e81dbe
--- /dev/null
+++ b/Documentation/translations/zh_CN/core-api/boot-time-mm.rst
@@ -0,0 +1,49 @@
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/core-api/boot-time-mm.rst
+
+:翻译:
+
+ 司延腾 Yanteng Si <siyanteng@loongson.cn>
+
+:校译:
+
+ 时奎亮 <alexs@kernel.org>
+
+.. _cn_core-api_boot-time-mm:
+
+================
+启动时的内存管理
+================
+
+系统初始化早期“正常”的内存管理由于没有设置完毕无法使用。但是内核仍然需要
+为各种数据结构分配内存,例如物理页分配器。
+
+一个叫做 ``memblock`` 的专用分配器执行启动时的内存管理。特定架构的初始化
+必须在setup_arch()中设置它,并在mem_init()函数中移除它。
+
+一旦早期的内存管理可用,它就为内存分配提供了各种函数和宏。分配请求可以指向
+第一个(也可能是唯一的)节点或NUMA系统中的某个特定节点。有一些API变体在分
+配失败时panic,也有一些不会panic的。
+
+Memblock还提供了各种控制其自身行为的API。
+
+Memblock概述
+============
+
+该API在以下内核代码中:
+
+mm/memblock.c
+
+
+函数和结构体
+============
+
+下面是关于memblock数据结构、函数和宏的描述。其中一些实际上是内部的,但由于
+它们被记录下来,漏掉它们是很愚蠢的。此外,阅读内部函数的注释可以帮助理解引
+擎盖下真正发生的事情。
+
+该API在以下内核代码中:
+
+include/linux/memblock.h
+mm/memblock.c
diff --git a/Documentation/translations/zh_CN/core-api/genalloc.rst b/Documentation/translations/zh_CN/core-api/genalloc.rst
new file mode 100644
index 0000000..3c78452
--- /dev/null
+++ b/Documentation/translations/zh_CN/core-api/genalloc.rst
@@ -0,0 +1,109 @@
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/core-api/genalloc.rst
+
+:翻译:
+
+ 司延腾 Yanteng Si <siyanteng@loongson.cn>
+
+:校译:
+
+ 时奎亮 <alexs@kernel.org>
+
+.. _cn_core-api_genalloc:
+
+genalloc/genpool子系统
+======================
+
+内核中有许多内存分配子系统,每一个都是针对特定的需求。然而,有时候,内核开发者需
+要为特定范围的特殊用途的内存实现一个新的分配器;通常这个内存位于某个设备上。该设
+备的驱动程序的作者当然可以写一个小的分配器来完成工作,但这是让内核充满几十个测试
+差劲的分配器的方法。早在2005年,Jes Sorensen从sym53c8xx_2驱动中提取了其中的一
+个分配器,并将其作为一个通用模块发布,用于创建特设的内存分配器。这段代码在2.6.13
+版本中被合并;此后它被大大地修改了。
+
+.. _posted: https://lwn.net/Articles/125842/
+
+使用这个分配器的代码应该包括<linux/genalloc.h>。这个动作从创建一个池开始,使用
+一个:
+
+该API在以下内核代码中:
+
+lib/genalloc.c
+
+对gen_pool_create()的调用将创建一个内存池。分配的粒度由min_alloc_order设置;它
+是一个log-base-2(以2为底的对数)的数字,就像页面分配器使用的数字一样,但它指的是
+字节而不是页面。因此,如果min_alloc_order被传递为3,那么所有的分配将是8字节的倍数。
+增加min_alloc_order可以减少跟踪池中内存所需的内存。nid参数指定哪一个NUMA节点应该被
+用于分配管家结构体;如果调用者不关心,它可以是-1。
+
+“管理的”接口devm_gen_pool_create()将内存池与一个特定的设备联系起来。在其他方面,
+当给定的设备被销毁时,它将自动清理内存池。
+
+一个内存池池被关闭的方法是:
+
+该API在以下内核代码中:
+
+lib/genalloc.c
+
+值得注意的是,如果在给定的内存池中仍有未完成的分配,这个函数将采取相当极端的步骤,调用
+BUG(),使整个系统崩溃。你已经被警告了。
+
+一个新创建的内存池没有内存可以分配。在这种状态下,它是相当无用的,所以首要任务之一通常
+是向内存池里添加内存。这可以通过以下方式完成:
+
+该API在以下内核代码中:
+
+include/linux/genalloc.h
+
+lib/genalloc.c
+
+对gen_pool_add()的调用将把从地址(在内核的虚拟地址空间)开始的内存的大小字节放入
+给定的池中,再次使用nid作为节点ID进行辅助内存分配。gen_pool_add_virt()变体将显式
+物理地址与内存联系起来;只有在内存池被用于DMA分配时,这才是必要的。
+
+从内存池中分配内存(并将其放回)的函数是:
+
+该API在以下内核代码中:
+
+include/linux/genalloc.h
+
+lib/genalloc.c
+
+正如人们所期望的,gen_pool_alloc()将从给定的池中分配size<字节。gen_pool_dma_alloc()
+变量分配内存用于DMA操作,返回dma所指向的空间中的相关物理地址。这只有在内存是用
+gen_pool_add_virt()添加的情况下才会起作用。请注意,这个函数偏离了genpool通常使用
+无符号长值来表示内核地址的模式;它返回一个void * 来代替。
+
+这一切看起来都比较简单;事实上,一些开发者显然认为这太简单了。毕竟,上面的接口没有提
+供对分配函数如何选择返回哪块特定内存的控制。如果需要这样的控制,下面的函数将是有意义
+的:
+
+该API在以下内核代码中:
+
+lib/genalloc.c
+
+使用gen_pool_alloc_algo()进行的分配指定了一种用于选择要分配的内存的算法;默认算法可
+以用gen_pool_set_algo()来设置。数据值被传递给算法;大多数算法会忽略它,但偶尔也会需
+要它。当然,人们可以写一个特殊用途的算法,但是已经有一套公平的算法可用了:
+
+- gen_pool_first_fit是一个简单的初配分配器;如果没有指定其他算法,这是默认算法。
+
+- gen_pool_first_fit_align强迫分配有一个特定的对齐方式(通过genpool_data_align结
+ 构中的数据传递)。
+
+- gen_pool_first_fit_order_align 按照大小的顺序排列分配。例如,一个60字节的分配将
+ 以64字节对齐。
+
+- gen_pool_best_fit,正如人们所期望的,是一个简单的最佳匹配分配器。
+
+- gen_pool_fixed_alloc在池中的一个特定偏移量(通过数据参数在genpool_data_fixed结
+ 构中传递)进行分配。如果指定的内存不可用,则分配失败。
+
+还有一些其他的函数,主要是为了查询内存池中的可用空间或迭代内存块等目的。然而,大多数
+用户应该不需要以上描述的功能。如果幸运的话,对这个模块的广泛认识将有助于防止在未来编
+写特殊用途的内存分配器。
+
+该API在以下内核代码中:
+
+lib/genalloc.c
diff --git a/Documentation/translations/zh_CN/core-api/gfp_mask-from-fs-io.rst b/Documentation/translations/zh_CN/core-api/gfp_mask-from-fs-io.rst
new file mode 100644
index 0000000..75d2997
--- /dev/null
+++ b/Documentation/translations/zh_CN/core-api/gfp_mask-from-fs-io.rst
@@ -0,0 +1,66 @@
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/core-api/gfp_mask-from-fs-io.rst
+
+:翻译:
+
+ 司延腾 Yanteng Si <siyanteng@loongson.cn>
+
+:校译:
+
+ 时奎亮 <alexs@kernel.org>
+
+.. _cn_core-api_gfp_mask-from-fs-io:
+
+============================
+从FS/IO上下文中使用的GFP掩码
+============================
+
+:日期: 2018年5月
+:作者: Michal Hocko <mhocko@kernel.org>
+
+简介
+====
+
+文件系统和IO栈中的代码路径在分配内存时必须小心,以防止因直接调用FS或IO路径的内
+存回收和阻塞已经持有的资源(例如锁--最常见的是用于事务上下文的锁)而造成递归死
+锁。
+
+避免这种死锁问题的传统方法是在调用分配器时,在gfp掩码中清除__GFP_FS和__GFP_IO
+(注意后者意味着也要清除第一个)。GFP_NOFS和GFP_NOIO可以作为快捷方式使用。但事
+实证明,上述方法导致了滥用,当限制性的gfp掩码被用于“万一”时,没有更深入的考虑,
+这导致了问题,因为过度使用GFP_NOFS/GFP_NOIO会导致内存过度回收或其他内存回收的问
+题。
+
+新API
+=====
+
+从4.12开始,我们为NOFS和NOIO上下文提供了一个通用的作用域API,分别是
+``memalloc_nofs_save`` , ``memalloc_nofs_restore`` 和 ``memalloc_noio_save`` ,
+``memalloc_noio_restore`` ,允许从文件系统或I/O的角度将一个作用域标记为一个
+关键部分。从该作用域的任何分配都将从给定的掩码中删除__GFP_FS和__GFP_IO,所以
+没有内存分配可以追溯到FS/IO中。
+
+
+该API在以下内核代码中:
+
+include/linux/sched/mm.h
+
+然后,FS/IO代码在任何与回收有关的关键部分开始之前简单地调用适当的保存函数
+——例如,与回收上下文共享的锁或当事务上下文嵌套可能通过回收进行时。恢复函数
+应该在关键部分结束时被调用。所有这一切最好都伴随着解释什么是回收上下文,以
+方便维护。
+
+请注意,保存/恢复函数的正确配对允许嵌套,所以从现有的NOIO或NOFS范围分别调
+用 ``memalloc_noio_save`` 或 ``memalloc_noio_restore`` 是安全的。
+
+那么__vmalloc(GFP_NOFS)呢?
+===========================
+
+vmalloc不支持GFP_NOFS语义,因为在分配器的深处有硬编码的GFP_KERNEL分配,要修
+复这些分配是相当不容易的。这意味着用GFP_NOFS/GFP_NOIO调用 ``vmalloc`` 几乎
+总是一个错误。好消息是,NOFS/NOIO语义可以通过范围API实现。
+
+在理想的世界中,上层应该已经标记了危险的上下文,因此不需要特别的照顾, ``vmalloc``
+的调用应该没有任何问题。有时,如果上下文不是很清楚,或者有叠加的违规行为,那么
+推荐的方法是用范围API包装vmalloc,并加上注释来解释问题。
diff --git a/Documentation/translations/zh_CN/core-api/index.rst b/Documentation/translations/zh_CN/core-api/index.rst
index 72f0a36..9e03b8d 100644
--- a/Documentation/translations/zh_CN/core-api/index.rst
+++ b/Documentation/translations/zh_CN/core-api/index.rst
@@ -39,10 +39,11 @@
:maxdepth: 1
kobject
+ kref
Todolist:
- kref
+
assoc_array
xarray
idr
@@ -101,19 +102,23 @@
如何在内核中分配和使用内存。请注意,在
:doc:`/vm/index` 中有更多的内存管理文档。
-Todolist:
+.. toctree::
+ :maxdepth: 1
memory-allocation
unaligned-memory-access
+ mm-api
+ genalloc
+ boot-time-mm
+ gfp_mask-from-fs-io
+
+Todolist:
+
dma-api
dma-api-howto
dma-attributes
dma-isa-lpc
- mm-api
- genalloc
pin_user_pages
- boot-time-mm
- gfp_mask-from-fs-io
内核调试的接口
==============
diff --git a/Documentation/translations/zh_CN/core-api/kref.rst b/Documentation/translations/zh_CN/core-api/kref.rst
new file mode 100644
index 0000000..b9902af
--- /dev/null
+++ b/Documentation/translations/zh_CN/core-api/kref.rst
@@ -0,0 +1,311 @@
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/core-api/kref.rst
+
+翻译:
+
+司延腾 Yanteng Si <siyanteng@loongson.cn>
+
+校译:
+
+ <此处请校译员签名(自愿),我将在下一个版本添加>
+
+.. _cn_core_api_kref.rst:
+
+=================================
+为内核对象添加引用计数器(krefs)
+=================================
+
+:作者: Corey Minyard <minyard@acm.org>
+:作者: Thomas Hellstrom <thellstrom@vmware.com>
+
+其中很多内容都是从Greg Kroah-Hartman2004年关于krefs的OLS论文和演讲中摘
+录的,可以在以下网址找到:
+
+ - http://www.kroah.com/linux/talks/ols_2004_kref_paper/Reprint-Kroah-Hartman-OLS2004.pdf
+ - http://www.kroah.com/linux/talks/ols_2004_kref_talk/
+
+简介
+====
+
+krefs允许你为你的对象添加引用计数器。如果你有在多个地方使用和传递的对象,
+而你没有refcounts,你的代码几乎肯定是坏的。如果你想要引用计数,krefs是个
+好办法。
+
+要使用kref,请在你的数据结构中添加一个,如::
+
+ struct my_data
+ {
+ .
+ .
+ struct kref refcount;
+ .
+ .
+ };
+
+kref可以出现在数据结构体中的任何地方。
+
+初始化
+======
+
+你必须在分配kref之后初始化它。 要做到这一点,可以这样调用kref_init::
+
+ struct my_data *data;
+
+ data = kmalloc(sizeof(*data), GFP_KERNEL);
+ if (!data)
+ return -ENOMEM;
+ kref_init(&data->refcount);
+
+这将kref中的refcount设置为1。
+
+Kref规则
+========
+
+一旦你有一个初始化的kref,你必须遵循以下规则:
+
+1) 如果你对一个指针做了一个非临时性的拷贝,特别是如果它可以被传递给另一个执
+ 行线程,你必须在传递之前用kref_get()增加refcount::
+
+ kref_get(&data->refcount);
+
+ 如果你已经有了一个指向kref-ed结构体的有效指针(refcount不能为零),你
+ 可以在没有锁的情况下这样做。
+
+2) 当你完成对一个指针的处理时,你必须调用kref_put()::
+
+ kref_put(&data->refcount, data_release);
+
+ 如果这是对该指针的最后一次引用,释放程序将被调用。如果代码从来没有尝试过
+ 在没有已经持有有效指针的情况下获得一个kref-ed结构体的有效指针,那么在没
+ 有锁的情况下这样做是安全的。
+
+3) 如果代码试图获得对一个kref-ed结构体的引用,而不持有一个有效的指针,它必
+ 须按顺序访问,在kref_put()期间不能发生kref_get(),并且该结构体在kref_get()
+ 期间必须保持有效。
+
+例如,如果你分配了一些数据,然后将其传递给另一个线程来处理::
+
+ void data_release(struct kref *ref)
+ {
+ struct my_data *data = container_of(ref, struct my_data, refcount);
+ kfree(data);
+ }
+
+ void more_data_handling(void *cb_data)
+ {
+ struct my_data *data = cb_data;
+ .
+ . do stuff with data here
+ .
+ kref_put(&data->refcount, data_release);
+ }
+
+ int my_data_handler(void)
+ {
+ int rv = 0;
+ struct my_data *data;
+ struct task_struct *task;
+ data = kmalloc(sizeof(*data), GFP_KERNEL);
+ if (!data)
+ return -ENOMEM;
+ kref_init(&data->refcount);
+
+ kref_get(&data->refcount);
+ task = kthread_run(more_data_handling, data, "more_data_handling");
+ if (task == ERR_PTR(-ENOMEM)) {
+ rv = -ENOMEM;
+ kref_put(&data->refcount, data_release);
+ goto out;
+ }
+
+ .
+ . do stuff with data here
+ .
+ out:
+ kref_put(&data->refcount, data_release);
+ return rv;
+ }
+
+这样,两个线程处理数据的顺序并不重要,kref_put()处理知道数据不再被引用并释
+放它。kref_get()不需要锁,因为我们已经有了一个有效的指针,我们拥有一个
+refcount。put不需要锁,因为没有任何东西试图在没有持有指针的情况下获取数据。
+
+在上面的例子中,kref_put()在成功和错误路径中都会被调用2次。这是必要的,因
+为引用计数被kref_init()和kref_get()递增了2次。
+
+请注意,规则1中的 "before "是非常重要的。你不应该做类似于::
+
+ task = kthread_run(more_data_handling, data, "more_data_handling");
+ if (task == ERR_PTR(-ENOMEM)) {
+ rv = -ENOMEM;
+ goto out;
+ } else
+ /* BAD BAD BAD - 在交接后得到 */
+ kref_get(&data->refcount);
+
+不要以为你知道自己在做什么而使用上述构造。首先,你可能不知道自己在做什么。
+其次,你可能知道自己在做什么(有些情况下涉及到锁,上述做法可能是合法的),
+但其他不知道自己在做什么的人可能会改变代码或复制代码。这是很危险的作风。请
+不要这样做。
+
+在有些情况下,你可以优化get和put。例如,如果你已经完成了一个对象,并且给其
+他对象排队,或者把它传递给其他对象,那么就没有理由先做一个get,然后再做一个
+put::
+
+ /* 糟糕的额外获取(get)和输出(put) */
+ kref_get(&obj->ref);
+ enqueue(obj);
+ kref_put(&obj->ref, obj_cleanup);
+
+只要做enqueue就可以了。 我们随时欢迎对这个问题的评论::
+
+ enqueue(obj);
+ /* 我们已经完成了对obj的处理,所以我们把我们的refcount传给了队列。
+ 在这之后不要再碰obj了! */
+
+最后一条规则(规则3)是最难处理的一条。例如,你有一个每个项目都被krefed的列表,
+而你希望得到第一个项目。你不能只是从列表中抽出第一个项目,然后kref_get()它。
+这违反了规则3,因为你还没有持有一个有效的指针。你必须添加一个mutex(或其他锁)。
+比如说::
+
+ static DEFINE_MUTEX(mutex);
+ static LIST_HEAD(q);
+ struct my_data
+ {
+ struct kref refcount;
+ struct list_head link;
+ };
+
+ static struct my_data *get_entry()
+ {
+ struct my_data *entry = NULL;
+ mutex_lock(&mutex);
+ if (!list_empty(&q)) {
+ entry = container_of(q.next, struct my_data, link);
+ kref_get(&entry->refcount);
+ }
+ mutex_unlock(&mutex);
+ return entry;
+ }
+
+ static void release_entry(struct kref *ref)
+ {
+ struct my_data *entry = container_of(ref, struct my_data, refcount);
+
+ list_del(&entry->link);
+ kfree(entry);
+ }
+
+ static void put_entry(struct my_data *entry)
+ {
+ mutex_lock(&mutex);
+ kref_put(&entry->refcount, release_entry);
+ mutex_unlock(&mutex);
+ }
+
+如果你不想在整个释放操作过程中持有锁,kref_put()的返回值是有用的。假设你不想在
+上面的例子中在持有锁的情况下调用kfree()(因为这样做有点无意义)。你可以使用kref_put(),
+如下所示::
+
+ static void release_entry(struct kref *ref)
+ {
+ /* 所有的工作都是在从kref_put()返回后完成的。*/
+ }
+
+ static void put_entry(struct my_data *entry)
+ {
+ mutex_lock(&mutex);
+ if (kref_put(&entry->refcount, release_entry)) {
+ list_del(&entry->link);
+ mutex_unlock(&mutex);
+ kfree(entry);
+ } else
+ mutex_unlock(&mutex);
+ }
+
+如果你必须调用其他程序作为释放操作的一部分,而这些程序可能需要很长的时间,或者可
+能要求相同的锁,那么这真的更有用。请注意,在释放例程中做所有的事情还是比较好的,
+因为它比较整洁。
+
+上面的例子也可以用kref_get_unless_zero()来优化,方法如下::
+
+ static struct my_data *get_entry()
+ {
+ struct my_data *entry = NULL;
+ mutex_lock(&mutex);
+ if (!list_empty(&q)) {
+ entry = container_of(q.next, struct my_data, link);
+ if (!kref_get_unless_zero(&entry->refcount))
+ entry = NULL;
+ }
+ mutex_unlock(&mutex);
+ return entry;
+ }
+
+ static void release_entry(struct kref *ref)
+ {
+ struct my_data *entry = container_of(ref, struct my_data, refcount);
+
+ mutex_lock(&mutex);
+ list_del(&entry->link);
+ mutex_unlock(&mutex);
+ kfree(entry);
+ }
+
+ static void put_entry(struct my_data *entry)
+ {
+ kref_put(&entry->refcount, release_entry);
+ }
+
+这对于在put_entry()中移除kref_put()周围的mutex锁是很有用的,但是重要的是
+kref_get_unless_zero被封装在查找表中的同一关键部分,否则kref_get_unless_zero
+可能引用已经释放的内存。注意,在不检查其返回值的情况下使用kref_get_unless_zero
+是非法的。如果你确信(已经有了一个有效的指针)kref_get_unless_zero()会返回true,
+那么就用kref_get()代替。
+
+Krefs和RCU
+==========
+
+函数kref_get_unless_zero也使得在上述例子中使用rcu锁进行查找成为可能::
+
+ struct my_data
+ {
+ struct rcu_head rhead;
+ .
+ struct kref refcount;
+ .
+ .
+ };
+
+ static struct my_data *get_entry_rcu()
+ {
+ struct my_data *entry = NULL;
+ rcu_read_lock();
+ if (!list_empty(&q)) {
+ entry = container_of(q.next, struct my_data, link);
+ if (!kref_get_unless_zero(&entry->refcount))
+ entry = NULL;
+ }
+ rcu_read_unlock();
+ return entry;
+ }
+
+ static void release_entry_rcu(struct kref *ref)
+ {
+ struct my_data *entry = container_of(ref, struct my_data, refcount);
+
+ mutex_lock(&mutex);
+ list_del_rcu(&entry->link);
+ mutex_unlock(&mutex);
+ kfree_rcu(entry, rhead);
+ }
+
+ static void put_entry(struct my_data *entry)
+ {
+ kref_put(&entry->refcount, release_entry_rcu);
+ }
+
+但要注意的是,在调用release_entry_rcu后,结构kref成员需要在有效内存中保留一个rcu
+宽限期。这可以通过使用上面的kfree_rcu(entry, rhead)来实现,或者在使用kfree之前
+调用synchronize_rcu(),但注意synchronize_rcu()可能会睡眠相当长的时间。
diff --git a/Documentation/translations/zh_CN/core-api/memory-allocation.rst b/Documentation/translations/zh_CN/core-api/memory-allocation.rst
new file mode 100644
index 0000000..e17b87d
--- /dev/null
+++ b/Documentation/translations/zh_CN/core-api/memory-allocation.rst
@@ -0,0 +1,138 @@
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/core-api/memory-allocation.rst
+
+:翻译:
+
+ 司延腾 Yanteng Si <siyanteng@loongson.cn>
+
+:校译:
+
+ 时奎亮 <alexs@kernel.org>
+
+.. _cn_core-api_memory-allocation:
+
+============
+内存分配指南
+============
+
+Linux为内存分配提供了多种API。你可以使用 `kmalloc` 或 `kmem_cache_alloc`
+系列分配小块内存,使用 `vmalloc` 及其派生产品分配大的几乎连续的区域,或者
+你可以用 alloc_pages 直接向页面分配器请求页面。也可以使用更专业的分配器,
+例如 `cma_alloc` 或 `zs_malloc` 。
+
+大多数的内存分配API使用GFP标志来表达该内存应该如何分配。GFP的缩写代表
+“(get free pages)获取空闲页”,是底层的内存分配功能。
+
+(内存)分配API的多样性与众多的GFP标志相结合,使得“我应该如何分配内存?”这个问
+题不那么容易回答,尽管很可能你应该使用
+
+::
+
+ kzalloc(<size>, GFP_KERNEL);
+
+当然,有些情况下必须使用其他分配API和不同的GFP标志。
+
+获取空闲页标志
+==============
+GFP标志控制分配器的行为。它们告诉我们哪些内存区域可以被使用,分配器应该多努力寻
+找空闲的内存,这些内存是否可以被用户空间访问等等。内存管理API为GFP标志和它们的
+组合提供了参考文件,这里我们简要介绍一下它们的推荐用法:
+
+ * 大多数时候, ``GFP_KERNEL`` 是你需要的。内核数据结构的内存,DMA可用内存,inode
+ 缓存,所有这些和其他许多分配类型都可以使用 ``GFP_KERNEL`` 。注意,使用 ``GFP_KERNEL``
+ 意味着 ``GFP_RECLAIM`` ,这意味着在有内存压力的情况下可能会触发直接回收;调用上
+ 下文必须允许睡眠。
+
+ * 如果分配是从一个原子上下文中进行的,例如中断处理程序,使用 ``GFP_NOWAIT`` 。这个
+ 标志可以防止直接回收和IO或文件系统操作。因此,在内存压力下, ``GFP_NOWAIT`` 分配
+ 可能会失败。有合理退路的分配应该使用 ``GFP_NOWARN`` 。
+
+ * 如果你认为访问保留内存区是合理的,并且除非分配成功,否则内核会有压力,你可以使用 ``GFP_ATOMIC`` 。
+
+ * 从用户空间触发的不可信任的分配应该是kmem核算的对象,必须设置 ``__GFP_ACCOUNT`` 位。
+ 有一个方便的用于 ``GFP_KERNEL`` 分配的 ``GFP_KERNEL_ACCOUNT`` 快捷键,其应该被核
+ 算。
+
+ * 用户空间的分配应该使用 ``GFP_USER`` 、 ``GFP_HIGHUSER`` 或 ``GFP_HIGHUSER_MOVABLE``
+ 中的一个标志。标志名称越长,限制性越小。
+
+ ``GFP_HIGHUSER_MOVABLE`` 不要求分配的内存将被内核直接访问,并意味着数据是可迁移的。
+
+ ``GFP_HIGHUSER`` 意味着所分配的内存是不可迁移的,但也不要求它能被内核直接访问。举个
+ 例子就是一个硬件分配内存,这些数据直接映射到用户空间,但没有寻址限制。
+
+ ``GFP_USER`` 意味着分配的内存是不可迁移的,它必须被内核直接访问。
+
+你可能会注意到,在现有的代码中,有相当多的分配指定了 ``GFP_NOIO`` 或 ``GFP_NOFS`` 。
+从历史上看,它们被用来防止递归死锁,这种死锁是由直接内存回收调用到FS或IO路径以及对已
+经持有的资源进行阻塞引起的。从4.12开始,解决这个问题的首选方法是使用新的范围API,即
+:ref:`Documentation/core-api/gfp_mask-from-fs-io.rst <gfp_mask_from_fs_io>`.
+
+其他传统的GFP标志是 ``GFP_DMA`` 和 ``GFP_DMA32`` 。它们用于确保分配的内存可以被寻
+址能力有限的硬件访问。因此,除非你正在为一个有这种限制的设备编写驱动程序,否则要避免
+使用这些标志。而且,即使是有限制的硬件,也最好使用dma_alloc* APIs。
+
+GFP标志和回收行为
+-----------------
+内存分配可能会触发直接或后台回收,了解页面分配器将如何努力满足该请求或其他请求是非常
+有用的。
+
+ * ``GFP_KERNEL & ~__GFP_RECLAIM`` - 乐观分配,完全不尝试释放内存。最轻量级的模
+ 式,甚至不启动后台回收。应该小心使用,因为它可能会耗尽内存,而下一个用户可能会启
+ 动更积极的回收。
+
+ * ``GFP_KERNEL & ~__GFP_DIRECT_RECLAIM`` (or ``GFP_NOWAIT`` ) - 乐观分配,不
+ 试图从当前上下文中释放内存,但如果该区域低于低水位,可以唤醒kswapd来回收内存。可
+ 以从原子上下文中使用,或者当请求是一个性能优化,并且有另一个慢速路径的回退。
+
+ * ``(GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM`` (aka ``GFP_ATOMIC`` ) - 非
+ 睡眠分配,有一个昂贵的回退,所以它可以访问某些部分的内存储备。通常从中断/底层上下
+ 文中使用,有一个昂贵的慢速路径回退。
+
+ * ``GFP_KERNEL`` - 允许后台和直接回收,并使用默认的页面分配器行为。这意味着廉价
+ 的分配请求基本上是不会失败的,但不能保证这种行为,所以失败必须由调用者适当检查(例
+ 如,目前允许OOM杀手失败)。
+
+ * ``GFP_KERNEL | __GFP_NORETRY`` - 覆盖默认的分配器行为,所有的分配请求都会提前
+ 失败,而不是导致破坏性的回收(在这个实现中是一轮的回收)。OOM杀手不被调用。
+
+ * ``GFP_KERNEL | __GFP_RETRY_MAYFAIL`` - 覆盖 **默认** 的分配器行为,所有分配请求都非
+ 常努力。如果回收不能取得任何进展,该请求将失败。OOM杀手不会被触发。
+
+ * ``GFP_KERNEL | __GFP_NOFAIL`` - 覆盖默认的分配器行为,所有分配请求将无休止地循
+ 环,直到成功。这可能真的很危险,特别是对于较大的需求。
+
+选择内存分配器
+==============
+
+分配内存的最直接的方法是使用kmalloc()系列的函数。而且,为了安全起见,最好使用将内存
+设置为零的例程,如kzalloc()。如果你需要为一个数组分配内存,有kmalloc_array()和kcalloc()
+辅助程序。辅助程序struct_size()、array_size()和array3_size()可以用来安全地计算对
+象的大小而不会溢出。
+
+可以用 `kmalloc` 分配的块的最大尺寸是有限的。实际的限制取决于硬件和内核配置,但是对于
+小于页面大小的对象,使用 `kmalloc` 是一个好的做法。
+
+用 `kmalloc` 分配的块的地址至少要对齐到ARCH_KMALLOC_MINALIGN字节。对于2的幂的大小,
+对齐方式也被保证为至少是各自的大小。
+
+用kmalloc()分配的块可以用krealloc()调整大小。与kmalloc_array()类似:以krealloc_array()
+的形式提供了一个用于调整数组大小的辅助工具。
+
+对于大量的分配,你可以使用vmalloc()和vzalloc(),或者直接向页面分配器请求页面。由vmalloc
+和相关函数分配的内存在物理上是不连续的。
+
+如果你不确定分配的大小对 `kmalloc` 来说是否太大,可以使用kvmalloc()及其派生函数。它将尝
+试用kmalloc分配内存,如果分配失败,将用 `vmalloc` 重新尝试。对于哪些GFP标志可以与 `kvmalloc`
+一起使用是有限制的;请看kvmalloc_node()参考文档。注意, `kvmalloc` 可能会返回物理上不连
+续的内存。
+
+如果你需要分配许多相同的对象,你可以使用slab缓存分配器。在使用缓存之前,应该用
+kmem_cache_create()或kmem_cache_create_usercopy()来设置缓存。如果缓存的一部分可能被复
+制到用户空间,应该使用第二个函数。在缓存被创建后,kmem_cache_alloc()和它的封装可以从该缓
+存中分配内存。
+
+当分配的内存不再需要时,它必须被释放。你可以使用kvfree()来处理用 `kmalloc` 、 `vmalloc`
+和 `kvmalloc` 分配的内存。slab缓存应该用kmem_cache_free()来释放。不要忘记用
+kmem_cache_destroy()来销毁缓存。
diff --git a/Documentation/translations/zh_CN/core-api/mm-api.rst b/Documentation/translations/zh_CN/core-api/mm-api.rst
new file mode 100644
index 0000000..0ea43dc
--- /dev/null
+++ b/Documentation/translations/zh_CN/core-api/mm-api.rst
@@ -0,0 +1,110 @@
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/core-api/mm-api.rst
+
+:翻译:
+
+ 司延腾 Yanteng Si <siyanteng@loongson.cn>
+
+:校译:
+
+ 时奎亮<alexs@kernel.org>
+
+.. _cn_core-api_mm-api:
+
+============
+内存管理APIs
+============
+
+API(Application Programming Interface,应用程序接口)
+
+用户空间内存访问
+================
+
+该API在以下内核代码中:
+
+arch/x86/include/asm/uaccess.h
+
+arch/x86/lib/usercopy_32.c
+
+mm/gup.c
+
+.. _cn_mm-api-gfp-flags:
+
+内存分配控制
+============
+
+该API在以下内核代码中:
+
+include/linux/gfp.h
+
+Slab缓存
+========
+
+此缓存非cpu片上缓存,请读者自行查阅资料。
+
+该API在以下内核代码中:
+
+include/linux/slab.h
+
+mm/slab.c
+
+mm/slab_common.c
+
+mm/util.c
+
+虚拟连续(内存页)映射
+======================
+
+该API在以下内核代码中:
+
+mm/vmalloc.c
+
+
+文件映射和页面缓存
+==================
+
+该API在以下内核代码中:
+
+mm/readahead.c
+
+mm/filemap.c
+
+mm/page-writeback.c
+
+mm/truncate.c
+
+include/linux/pagemap.h
+
+内存池
+======
+
+该API在以下内核代码中:
+
+mm/mempool.c
+
+DMA池
+=====
+
+DMA(Direct Memory Access,直接存储器访问)
+
+该API在以下内核代码中:
+
+mm/dmapool.c
+
+更多的内存管理函数
+==================
+
+该API在以下内核代码中:
+
+mm/memory.c
+
+mm/page_alloc.c
+
+mm/mempolicy.c
+
+include/linux/mm_types.h
+
+include/linux/mm.h
+
+include/linux/mmzone.h
diff --git a/Documentation/translations/zh_CN/core-api/unaligned-memory-access.rst b/Documentation/translations/zh_CN/core-api/unaligned-memory-access.rst
new file mode 100644
index 0000000..29c33e7
--- /dev/null
+++ b/Documentation/translations/zh_CN/core-api/unaligned-memory-access.rst
@@ -0,0 +1,229 @@
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/core-api/unaligned-memory-access.rst
+
+:翻译:
+
+ 司延腾 Yanteng Si <siyanteng@loongson.cn>
+
+:校译:
+
+ 时奎亮 <alexs@kernel.org>
+
+.. _cn_core-api_unaligned-memory-access:
+
+==============
+非对齐内存访问
+==============
+
+:作者: Daniel Drake <dsd@gentoo.org>,
+:作者: Johannes Berg <johannes@sipsolutions.net>
+
+:感谢他们的帮助: Alan Cox, Avuton Olrich, Heikki Orsila, Jan Engelhardt,
+ Kyle McMartin, Kyle Moffett, Randy Dunlap, Robert Hancock, Uli Kunitz,
+ Vadim Lobanov
+
+
+Linux运行在各种各样的架构上,这些架构在内存访问方面有不同的表现。本文介绍了一些
+关于不对齐访问的细节,为什么你需要编写不引起不对齐访问的代码,以及如何编写这样的
+代码
+
+
+非对齐访问的定义
+================
+
+当你试图从一个不被N偶数整除的地址(即addr % N != 0)开始读取N字节的数据时,就
+会发生无对齐内存访问。例如,从地址0x10004读取4个字节的数据是可以的,但从地址
+0x10005读取4个字节的数据将是一个不对齐的内存访问。
+
+上述内容可能看起来有点模糊,因为内存访问可以以不同的方式发生。这里的背景是在机器
+码层面上:某些指令在内存中读取或写入一些字节(例如x86汇编中的movb、movw、movl)。
+正如将变得清晰的那样,相对容易发现那些将编译为多字节内存访问指令的C语句,即在处理
+u16、u32和u64等类型时。
+
+
+自然对齐
+========
+
+上面提到的规则构成了我们所说的自然对齐。当访问N个字节的内存时,基础内存地址必须被
+N平均分割,即addr % N == 0。
+
+在编写代码时,假设目标架构有自然对齐的要求。
+
+在现实中,只有少数架构在所有大小的内存访问上都要求自然对齐。然而,我们必须考虑所
+有支持的架构;编写满足自然对齐要求的代码是实现完全可移植性的最简单方法。
+
+
+为什么非对齐访问时坏事
+======================
+
+执行非对齐内存访问的效果因架构不同而不同。在这里写一整篇关于这些差异的文档是很容
+易的;下面是对常见情况的总结:
+
+ - 一些架构能够透明地执行非对齐内存访问,但通常会有很大的性能代价。
+ - 当不对齐的访问发生时,一些架构会引发处理器异常。异常处理程序能够纠正不对齐的
+ 访问,但要付出很大的性能代价。
+ - 一些架构在发生不对齐访问时,会引发处理器异常,但异常中并没有包含足够的信息来
+ 纠正不对齐访问。
+ - 有些架构不能进行无对齐内存访问,但会默默地执行与请求不同的内存访问,从而导致
+ 难以发现的微妙的代码错误!
+
+从上文可以看出,如果你的代码导致不对齐的内存访问发生,那么你的代码在某些平台上将无
+法正常工作,在其他平台上将导致性能问题。
+
+不会导致非对齐访问的代码
+========================
+
+起初,上面的概念似乎有点难以与实际编码实践联系起来。毕竟,你对某些变量的内存地址没
+有很大的控制权,等等。
+
+幸运的是事情并不复杂,因为在大多数情况下,编译器会确保代码工作正常。例如,以下面的
+结构体为例::
+
+ struct foo {
+ u16 field1;
+ u32 field2;
+ u8 field3;
+ };
+
+让我们假设上述结构体的一个实例驻留在从地址0x10000开始的内存中。根据基本的理解,访问
+field2会导致非对齐访问,这并不是不合理的。你会期望field2位于该结构体的2个字节的偏移
+量,即地址0x10002,但该地址不能被4平均整除(注意,我们在这里读一个4字节的值)。
+
+幸运的是,编译器理解对齐约束,所以在上述情况下,它会在field1和field2之间插入2个字节
+的填充。因此,对于标准的结构体类型,你总是可以依靠编译器来填充结构体,以便对字段的访
+问可以适当地对齐(假设你没有将字段定义不同长度的类型)。
+
+同样,你也可以依靠编译器根据变量类型的大小,将变量和函数参数对齐到一个自然对齐的方案。
+
+在这一点上,应该很清楚,访问单个字节(u8或char)永远不会导致无对齐访问,因为所有的内
+存地址都可以被1均匀地整除。
+
+在一个相关的话题上,考虑到上述因素,你可以观察到,你可以对结构体中的字段进行重新排序,
+以便将字段放在不重排就会插入填充物的地方,从而减少结构体实例的整体常驻内存大小。上述
+例子的最佳布局是::
+
+ struct foo {
+ u32 field2;
+ u16 field1;
+ u8 field3;
+ };
+
+对于一个自然对齐方案,编译器只需要在结构的末尾添加一个字节的填充。添加这种填充是为了满
+足这些结构的数组的对齐约束。
+
+另一点值得一提的是在结构体类型上使用__attribute__((packed))。这个GCC特有的属性告诉编
+译器永远不要在结构体中插入任何填充,当你想用C结构体来表示一些“off the wire”的固定排列
+的数据时,这个属性很有用。
+
+你可能会倾向于认为,在访问不满足架构对齐要求的字段时,使用这个属性很容易导致不对齐的访
+问。然而,编译器也意识到了对齐的限制,并且会产生额外的指令来执行内存访问,以避免造成不
+对齐的访问。当然,与non-packed的情况相比,额外的指令显然会造成性能上的损失,所以packed
+属性应该只在避免结构填充很重要的时候使用。
+
+
+导致非对齐访问的代码
+====================
+
+考虑到上述情况,让我们来看看一个现实生活中可能导致非对齐内存访问的函数的例子。下面这个
+函数取自include/linux/etherdevice.h,是一个优化的例程,用于比较两个以太网MAC地址是否
+相等::
+
+ bool ether_addr_equal(const u8 *addr1, const u8 *addr2)
+ {
+ #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+ u32 fold = ((*(const u32 *)addr1) ^ (*(const u32 *)addr2)) |
+ ((*(const u16 *)(addr1 + 4)) ^ (*(const u16 *)(addr2 + 4)));
+
+ return fold == 0;
+ #else
+ const u16 *a = (const u16 *)addr1;
+ const u16 *b = (const u16 *)addr2;
+ return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) == 0;
+ #endif
+ }
+
+在上述函数中,当硬件具有高效的非对齐访问能力时,这段代码没有问题。但是当硬件不能在任意
+边界上访问内存时,对a[0]的引用导致从地址addr1开始的2个字节(16位)被读取。
+
+想一想,如果addr1是一个奇怪的地址,如0x10003,会发生什么?(提示:这将是一个非对齐访
+问。)
+
+尽管上述函数存在潜在的非对齐访问问题,但它还是被包含在内核中,但被理解为只在16位对齐
+的地址上正常工作。调用者应该确保这种对齐方式或者根本不使用这个函数。这个不对齐的函数
+仍然是有用的,因为它是在你能确保对齐的情况下的一个很好的优化,这在以太网网络环境中几
+乎是一直如此。
+
+
+下面是另一个可能导致非对齐访问的代码的例子::
+
+ void myfunc(u8 *data, u32 value)
+ {
+ [...]
+ *((u32 *) data) = cpu_to_le32(value);
+ [...]
+ }
+
+每当数据参数指向的地址不被4均匀整除时,这段代码就会导致非对齐访问。
+
+综上所述,你可能遇到非对齐访问问题的两种主要情况包括:
+
+ 1. 将变量定义不同长度的类型
+ 2. 指针运算后访问至少2个字节的数据
+
+
+避免非对齐访问
+==============
+
+避免非对齐访问的最简单方法是使用<asm/unaligned.h>头文件提供的get_unaligned()和
+put_unaligned()宏。
+
+回到前面的一个可能导致非对齐访问的代码例子::
+
+ void myfunc(u8 *data, u32 value)
+ {
+ [...]
+ *((u32 *) data) = cpu_to_le32(value);
+ [...]
+ }
+
+为了避免非对齐的内存访问,你可以将其改写如下::
+
+ void myfunc(u8 *data, u32 value)
+ {
+ [...]
+ value = cpu_to_le32(value);
+ put_unaligned(value, (u32 *) data);
+ [...]
+ }
+
+get_unaligned()宏的工作原理与此类似。假设'data'是一个指向内存的指针,并且你希望避免
+非对齐访问,其用法如下::
+
+ u32 value = get_unaligned((u32 *) data);
+
+这些宏适用于任何长度的内存访问(不仅仅是上面例子中的32位)。请注意,与标准的对齐内存
+访问相比,使用这些宏来访问非对齐内存可能会在性能上付出代价。
+
+如果使用这些宏不方便,另一个选择是使用memcpy(),其中源或目标(或两者)的类型为u8*或
+非对齐char*。由于这种操作的字节性质,避免了非对齐访问。
+
+
+对齐 vs. 网络
+=============
+
+在需要对齐负载的架构上,网络要求IP头在四字节边界上对齐,以优化IP栈。对于普通的以太网
+硬件,常数NET_IP_ALIGN被使用。在大多数架构上,这个常数的值是2,因为正常的以太网头是
+14个字节,所以为了获得适当的对齐,需要DMA到一个可以表示为4*n+2的地址。一个值得注意的
+例外是powerpc,它将NET_IP_ALIGN定义为0,因为DMA到未对齐的地址可能非常昂贵,与未对齐
+的负载的成本相比相形见绌。
+
+对于一些不能DMA到未对齐地址的以太网硬件,如4*n+2或非以太网硬件,这可能是一个问题,这
+时需要将传入的帧复制到一个对齐的缓冲区。因为这在可以进行非对齐访问的架构上是不必要的,
+所以可以使代码依赖于CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS,像这样::
+
+ #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+ skb = original skb
+ #else
+ skb = copy skb
+ #endif
diff --git a/Documentation/translations/zh_CN/process/5.Posting.rst b/Documentation/translations/zh_CN/process/5.Posting.rst
index b0c6561..4ee7de1 100644
--- a/Documentation/translations/zh_CN/process/5.Posting.rst
+++ b/Documentation/translations/zh_CN/process/5.Posting.rst
@@ -23,7 +23,7 @@
:ref:`Documentation/translations/zh_CN/process/submitting-drivers.rst <cn_submittingdrivers>`
和 :ref:`Documentation/translations/zh_CN/process/submit-checklist.rst <cn_submitchecklist>`。
-何时邮寄
+何时寄送
--------
在补丁完全“准备好”之前,避免发布补丁是一种持续的诱惑。对于简单的补丁,这
@@ -142,7 +142,7 @@
一般来说,你越把自己放在每个阅读你变更日志的人的位置上,变更日志(和内核
作为一个整体)就越好。
-不消说,变更日志是将变更提交到版本控制系统时使用的文本。接下来将是:
+不需要说,变更日志是将变更提交到版本控制系统时使用的文本。接下来将是:
- 补丁本身,采用统一的(“-u”)补丁格式。使用“-p”选项来diff将使函数名与
更改相关联,从而使结果补丁更容易被其他人读取。
@@ -186,10 +186,10 @@
在补丁中添加标签时要小心:只有Cc:才适合在没有指定人员明确许可的情况下添加。
-发送补丁
+寄送补丁
--------
-在寄出补丁之前,您还需要注意以下几点:
+在寄送补丁之前,您还需要注意以下几点:
- 您确定您的邮件发送程序不会损坏补丁吗?被邮件客户端更改空白或修饰了行的补丁
无法被另一端接受,并且通常不会进行任何详细检查。如果有任何疑问,先把补丁寄
diff --git a/Documentation/translations/zh_CN/process/howto.rst b/Documentation/translations/zh_CN/process/howto.rst
index ee3dee4..2903d71 100644
--- a/Documentation/translations/zh_CN/process/howto.rst
+++ b/Documentation/translations/zh_CN/process/howto.rst
@@ -381,7 +381,7 @@
内核社区的工作模式同大多数传统公司开发队伍的工作模式并不相同。下面这些例
子,可以帮助你避免某些可能发生问题:
-用这些话介绍你的修改提案会有好处:
+用这些话介绍你的修改提案会有好处:(在任何时候你都不应该用中文写提案)
- 它同时解决了多个问题
- 它删除了2000行代码
@@ -448,8 +448,8 @@
保证修改分成很多小块,这样在整个项目都准备好被包含进内核之前,其中的一部
分可能会先被接收。
-必须了解这样做是不可接受的:试图将未完成的工作提交进内核,然后再找时间修
-复。
+你必须明白这么做是无法令人接受的:试图将不完整的代码提交进内核,然后再找
+时间修复。
证明修改的必要性
@@ -475,8 +475,8 @@
https://www.ozlabs.org/~akpm/stuff/tpp.txt
-这些事情有时候做起来很难。要在任何方面都做到完美可能需要好几年时间。这是
-一个持续提高的过程,它需要大量的耐心和决心。只要不放弃,你一定可以做到。
+这些事情有时候做起来很难。想要在任何方面都做到完美可能需要好几年时间。这
+是一个持续提高的过程,它需要大量的耐心和决心。只要不放弃,你一定可以做到。
很多人已经做到了,而他们都曾经和现在的你站在同样的起点上。
diff --git a/Documentation/translations/zh_CN/process/submitting-patches.rst b/Documentation/translations/zh_CN/process/submitting-patches.rst
index 4fc6d16..3296b8f 100644
--- a/Documentation/translations/zh_CN/process/submitting-patches.rst
+++ b/Documentation/translations/zh_CN/process/submitting-patches.rst
@@ -127,8 +127,8 @@
URL来查找补丁描述并将其放入补丁中。也就是说,补丁(系列)及其描述应该是独立的。
这对维护人员和审查人员都有好处。一些评审者可能甚至没有收到补丁的早期版本。
-描述你在命令语气中的变化,例如“make xyzzy do frotz”而不是“[这个补丁]make
-xyzzy do frotz”或“[我]changed xyzzy to do frotz”,就好像你在命令代码库改变
+描述你在命令语气中的变化,例如“make xyzzy do frotz”而不是“[This patch]make
+xyzzy do frotz”或“[I]changed xyzzy to do frotz”,就好像你在命令代码库改变
它的行为一样。
如果修补程序修复了一个记录的bug条目,请按编号和URL引用该bug条目。如果补丁来
diff --git a/Documentation/translations/zh_TW/index.rst b/Documentation/translations/zh_TW/index.rst
index 2a28103..f56f78b 100644
--- a/Documentation/translations/zh_TW/index.rst
+++ b/Documentation/translations/zh_TW/index.rst
@@ -140,11 +140,6 @@
體系結構無關文檔
----------------
-.. toctree::
- :maxdepth: 2
-
- arm64/index
-
TODOList:
* asm-annotations
@@ -152,6 +147,11 @@
特定體系結構文檔
----------------
+.. toctree::
+ :maxdepth: 2
+
+ arm64/index
+
TODOList:
* arch
diff --git a/Documentation/vm/page_migration.rst b/Documentation/vm/page_migration.rst
index db9d7e5..08810f5 100644
--- a/Documentation/vm/page_migration.rst
+++ b/Documentation/vm/page_migration.rst
@@ -205,7 +205,7 @@
In this function, the driver should put the isolated page back into its own data
structure.
-4. non-LRU movable page flags
+Non-LRU movable page flags
There are two page flags for supporting non-LRU movable page.