| Documentation for /proc/sys/net/* | 
 | 	(c) 1999		Terrehon Bowden <terrehon@pacbell.net> | 
 | 				Bodo Bauer <bb@ricochet.net> | 
 | 	(c) 2000		Jorge Nerin <comandante@zaralinux.com> | 
 | 	(c) 2009		Shen Feng <shen@cn.fujitsu.com> | 
 |  | 
 | For general info and legal blurb, please look in README. | 
 |  | 
 | ============================================================== | 
 |  | 
 | This file contains the documentation for the sysctl files in | 
 | /proc/sys/net | 
 |  | 
 | The interface  to  the  networking  parts  of  the  kernel  is  located  in | 
 | /proc/sys/net. The following table shows all possible subdirectories.  You may | 
 | see only some of them, depending on your kernel's configuration. | 
 |  | 
 |  | 
 | Table : Subdirectories in /proc/sys/net | 
 | .............................................................................. | 
 |  Directory Content             Directory  Content | 
 |  core      General parameter   appletalk  Appletalk protocol | 
 |  unix      Unix domain sockets netrom     NET/ROM | 
 |  802       E802 protocol       ax25       AX25 | 
 |  ethernet  Ethernet protocol   rose       X.25 PLP layer | 
 |  ipv4      IP version 4        x25        X.25 protocol | 
 |  ipx       IPX                 token-ring IBM token ring | 
 |  bridge    Bridging            decnet     DEC net | 
 |  ipv6      IP version 6        tipc       TIPC | 
 | .............................................................................. | 
 |  | 
 | 1. /proc/sys/net/core - Network core options | 
 | ------------------------------------------------------- | 
 |  | 
 | bpf_jit_enable | 
 | -------------- | 
 |  | 
 | This enables the BPF Just in Time (JIT) compiler. BPF is a flexible | 
 | and efficient infrastructure allowing to execute bytecode at various | 
 | hook points. It is used in a number of Linux kernel subsystems such | 
 | as networking (e.g. XDP, tc), tracing (e.g. kprobes, uprobes, tracepoints) | 
 | and security (e.g. seccomp). LLVM has a BPF back end that can compile | 
 | restricted C into a sequence of BPF instructions. After program load | 
 | through bpf(2) and passing a verifier in the kernel, a JIT will then | 
 | translate these BPF proglets into native CPU instructions. There are | 
 | two flavors of JITs, the newer eBPF JIT currently supported on: | 
 |   - x86_64 | 
 |   - x86_32 | 
 |   - arm64 | 
 |   - arm32 | 
 |   - ppc64 | 
 |   - sparc64 | 
 |   - mips64 | 
 |   - s390x | 
 |   - riscv | 
 |  | 
 | And the older cBPF JIT supported on the following archs: | 
 |   - mips | 
 |   - ppc | 
 |   - sparc | 
 |  | 
 | eBPF JITs are a superset of cBPF JITs, meaning the kernel will | 
 | migrate cBPF instructions into eBPF instructions and then JIT | 
 | compile them transparently. Older cBPF JITs can only translate | 
 | tcpdump filters, seccomp rules, etc, but not mentioned eBPF | 
 | programs loaded through bpf(2). | 
 |  | 
 | Values : | 
 | 	0 - disable the JIT (default value) | 
 | 	1 - enable the JIT | 
 | 	2 - enable the JIT and ask the compiler to emit traces on kernel log. | 
 |  | 
 | bpf_jit_harden | 
 | -------------- | 
 |  | 
 | This enables hardening for the BPF JIT compiler. Supported are eBPF | 
 | JIT backends. Enabling hardening trades off performance, but can | 
 | mitigate JIT spraying. | 
 | Values : | 
 | 	0 - disable JIT hardening (default value) | 
 | 	1 - enable JIT hardening for unprivileged users only | 
 | 	2 - enable JIT hardening for all users | 
 |  | 
 | bpf_jit_kallsyms | 
 | ---------------- | 
 |  | 
 | When BPF JIT compiler is enabled, then compiled images are unknown | 
 | addresses to the kernel, meaning they neither show up in traces nor | 
 | in /proc/kallsyms. This enables export of these addresses, which can | 
 | be used for debugging/tracing. If bpf_jit_harden is enabled, this | 
 | feature is disabled. | 
 | Values : | 
 | 	0 - disable JIT kallsyms export (default value) | 
 | 	1 - enable JIT kallsyms export for privileged users only | 
 |  | 
 | bpf_jit_limit | 
 | ------------- | 
 |  | 
 | This enforces a global limit for memory allocations to the BPF JIT | 
 | compiler in order to reject unprivileged JIT requests once it has | 
 | been surpassed. bpf_jit_limit contains the value of the global limit | 
 | in bytes. | 
 |  | 
 | dev_weight | 
 | -------------- | 
 |  | 
 | The maximum number of packets that kernel can handle on a NAPI interrupt, | 
 | it's a Per-CPU variable. For drivers that support LRO or GRO_HW, a hardware | 
 | aggregated packet is counted as one packet in this context. | 
 |  | 
 | Default: 64 | 
 |  | 
 | dev_weight_rx_bias | 
 | -------------- | 
 |  | 
 | RPS (e.g. RFS, aRFS) processing is competing with the registered NAPI poll function | 
 | of the driver for the per softirq cycle netdev_budget. This parameter influences | 
 | the proportion of the configured netdev_budget that is spent on RPS based packet | 
 | processing during RX softirq cycles. It is further meant for making current | 
 | dev_weight adaptable for asymmetric CPU needs on RX/TX side of the network stack. | 
 | (see dev_weight_tx_bias) It is effective on a per CPU basis. Determination is based | 
 | on dev_weight and is calculated multiplicative (dev_weight * dev_weight_rx_bias). | 
 | Default: 1 | 
 |  | 
 | dev_weight_tx_bias | 
 | -------------- | 
 |  | 
 | Scales the maximum number of packets that can be processed during a TX softirq cycle. | 
 | Effective on a per CPU basis. Allows scaling of current dev_weight for asymmetric | 
 | net stack processing needs. Be careful to avoid making TX softirq processing a CPU hog. | 
 | Calculation is based on dev_weight (dev_weight * dev_weight_tx_bias). | 
 | Default: 1 | 
 |  | 
 | default_qdisc | 
 | -------------- | 
 |  | 
 | The default queuing discipline to use for network devices. This allows | 
 | overriding the default of pfifo_fast with an alternative. Since the default | 
 | queuing discipline is created without additional parameters so is best suited | 
 | to queuing disciplines that work well without configuration like stochastic | 
 | fair queue (sfq), CoDel (codel) or fair queue CoDel (fq_codel). Don't use | 
 | queuing disciplines like Hierarchical Token Bucket or Deficit Round Robin | 
 | which require setting up classes and bandwidths. Note that physical multiqueue | 
 | interfaces still use mq as root qdisc, which in turn uses this default for its | 
 | leaves. Virtual devices (like e.g. lo or veth) ignore this setting and instead | 
 | default to noqueue. | 
 | Default: pfifo_fast | 
 |  | 
 | busy_read | 
 | ---------------- | 
 | Low latency busy poll timeout for socket reads. (needs CONFIG_NET_RX_BUSY_POLL) | 
 | Approximate time in us to busy loop waiting for packets on the device queue. | 
 | This sets the default value of the SO_BUSY_POLL socket option. | 
 | Can be set or overridden per socket by setting socket option SO_BUSY_POLL, | 
 | which is the preferred method of enabling. If you need to enable the feature | 
 | globally via sysctl, a value of 50 is recommended. | 
 | Will increase power usage. | 
 | Default: 0 (off) | 
 |  | 
 | busy_poll | 
 | ---------------- | 
 | Low latency busy poll timeout for poll and select. (needs CONFIG_NET_RX_BUSY_POLL) | 
 | Approximate time in us to busy loop waiting for events. | 
 | Recommended value depends on the number of sockets you poll on. | 
 | For several sockets 50, for several hundreds 100. | 
 | For more than that you probably want to use epoll. | 
 | Note that only sockets with SO_BUSY_POLL set will be busy polled, | 
 | so you want to either selectively set SO_BUSY_POLL on those sockets or set | 
 | sysctl.net.busy_read globally. | 
 | Will increase power usage. | 
 | Default: 0 (off) | 
 |  | 
 | rmem_default | 
 | ------------ | 
 |  | 
 | The default setting of the socket receive buffer in bytes. | 
 |  | 
 | rmem_max | 
 | -------- | 
 |  | 
 | The maximum receive socket buffer size in bytes. | 
 |  | 
 | tstamp_allow_data | 
 | ----------------- | 
 | Allow processes to receive tx timestamps looped together with the original | 
 | packet contents. If disabled, transmit timestamp requests from unprivileged | 
 | processes are dropped unless socket option SOF_TIMESTAMPING_OPT_TSONLY is set. | 
 | Default: 1 (on) | 
 |  | 
 |  | 
 | wmem_default | 
 | ------------ | 
 |  | 
 | The default setting (in bytes) of the socket send buffer. | 
 |  | 
 | wmem_max | 
 | -------- | 
 |  | 
 | The maximum send socket buffer size in bytes. | 
 |  | 
 | message_burst and message_cost | 
 | ------------------------------ | 
 |  | 
 | These parameters  are used to limit the warning messages written to the kernel | 
 | log from  the  networking  code.  They  enforce  a  rate  limit  to  make  a | 
 | denial-of-service attack  impossible. A higher message_cost factor, results in | 
 | fewer messages that will be written. Message_burst controls when messages will | 
 | be dropped.  The  default  settings  limit  warning messages to one every five | 
 | seconds. | 
 |  | 
 | warnings | 
 | -------- | 
 |  | 
 | This sysctl is now unused. | 
 |  | 
 | This was used to control console messages from the networking stack that | 
 | occur because of problems on the network like duplicate address or bad | 
 | checksums. | 
 |  | 
 | These messages are now emitted at KERN_DEBUG and can generally be enabled | 
 | and controlled by the dynamic_debug facility. | 
 |  | 
 | netdev_budget | 
 | ------------- | 
 |  | 
 | Maximum number of packets taken from all interfaces in one polling cycle (NAPI | 
 | poll). In one polling cycle interfaces which are registered to polling are | 
 | probed in a round-robin manner. Also, a polling cycle may not exceed | 
 | netdev_budget_usecs microseconds, even if netdev_budget has not been | 
 | exhausted. | 
 |  | 
 | netdev_budget_usecs | 
 | --------------------- | 
 |  | 
 | Maximum number of microseconds in one NAPI polling cycle. Polling | 
 | will exit when either netdev_budget_usecs have elapsed during the | 
 | poll cycle or the number of packets processed reaches netdev_budget. | 
 |  | 
 | netdev_max_backlog | 
 | ------------------ | 
 |  | 
 | Maximum number  of  packets,  queued  on  the  INPUT  side, when the interface | 
 | receives packets faster than kernel can process them. | 
 |  | 
 | netdev_rss_key | 
 | -------------- | 
 |  | 
 | RSS (Receive Side Scaling) enabled drivers use a 40 bytes host key that is | 
 | randomly generated. | 
 | Some user space might need to gather its content even if drivers do not | 
 | provide ethtool -x support yet. | 
 |  | 
 | myhost:~# cat /proc/sys/net/core/netdev_rss_key | 
 | 84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8: ... (52 bytes total) | 
 |  | 
 | File contains nul bytes if no driver ever called netdev_rss_key_fill() function. | 
 | Note: | 
 | /proc/sys/net/core/netdev_rss_key contains 52 bytes of key, | 
 | but most drivers only use 40 bytes of it. | 
 |  | 
 | myhost:~# ethtool -x eth0 | 
 | RX flow hash indirection table for eth0 with 8 RX ring(s): | 
 |     0:    0     1     2     3     4     5     6     7 | 
 | RSS hash key: | 
 | 84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8:43:e3:c9:0c:fd:17:55:c2:3a:4d:69:ed:f1:42:89 | 
 |  | 
 | netdev_tstamp_prequeue | 
 | ---------------------- | 
 |  | 
 | If set to 0, RX packet timestamps can be sampled after RPS processing, when | 
 | the target CPU processes packets. It might give some delay on timestamps, but | 
 | permit to distribute the load on several cpus. | 
 |  | 
 | If set to 1 (default), timestamps are sampled as soon as possible, before | 
 | queueing. | 
 |  | 
 | optmem_max | 
 | ---------- | 
 |  | 
 | Maximum ancillary buffer size allowed per socket. Ancillary data is a sequence | 
 | of struct cmsghdr structures with appended data. | 
 |  | 
 | fb_tunnels_only_for_init_net | 
 | ---------------------------- | 
 |  | 
 | Controls if fallback tunnels (like tunl0, gre0, gretap0, erspan0, | 
 | sit0, ip6tnl0, ip6gre0) are automatically created when a new | 
 | network namespace is created, if corresponding tunnel is present | 
 | in initial network namespace. | 
 | If set to 1, these devices are not automatically created, and | 
 | user space is responsible for creating them if needed. | 
 |  | 
 | Default : 0  (for compatibility reasons) | 
 |  | 
 | devconf_inherit_init_net | 
 | ---------------------------- | 
 |  | 
 | Controls if a new network namespace should inherit all current | 
 | settings under /proc/sys/net/{ipv4,ipv6}/conf/{all,default}/. By | 
 | default, we keep the current behavior: for IPv4 we inherit all current | 
 | settings from init_net and for IPv6 we reset all settings to default. | 
 |  | 
 | If set to 1, both IPv4 and IPv6 settings are forced to inherit from | 
 | current ones in init_net. If set to 2, both IPv4 and IPv6 settings are | 
 | forced to reset to their default values. | 
 |  | 
 | Default : 0  (for compatibility reasons) | 
 |  | 
 | 2. /proc/sys/net/unix - Parameters for Unix domain sockets | 
 | ------------------------------------------------------- | 
 |  | 
 | There is only one file in this directory. | 
 | unix_dgram_qlen limits the max number of datagrams queued in Unix domain | 
 | socket's buffer. It will not take effect unless PF_UNIX flag is specified. | 
 |  | 
 |  | 
 | 3. /proc/sys/net/ipv4 - IPV4 settings | 
 | ------------------------------------------------------- | 
 | Please see: Documentation/networking/ip-sysctl.txt and ipvs-sysctl.txt for | 
 | descriptions of these entries. | 
 |  | 
 |  | 
 | 4. Appletalk | 
 | ------------------------------------------------------- | 
 |  | 
 | The /proc/sys/net/appletalk  directory  holds the Appletalk configuration data | 
 | when Appletalk is loaded. The configurable parameters are: | 
 |  | 
 | aarp-expiry-time | 
 | ---------------- | 
 |  | 
 | The amount  of  time  we keep an ARP entry before expiring it. Used to age out | 
 | old hosts. | 
 |  | 
 | aarp-resolve-time | 
 | ----------------- | 
 |  | 
 | The amount of time we will spend trying to resolve an Appletalk address. | 
 |  | 
 | aarp-retransmit-limit | 
 | --------------------- | 
 |  | 
 | The number of times we will retransmit a query before giving up. | 
 |  | 
 | aarp-tick-time | 
 | -------------- | 
 |  | 
 | Controls the rate at which expires are checked. | 
 |  | 
 | The directory  /proc/net/appletalk  holds the list of active Appletalk sockets | 
 | on a machine. | 
 |  | 
 | The fields  indicate  the DDP type, the local address (in network:node format) | 
 | the remote  address,  the  size of the transmit pending queue, the size of the | 
 | received queue  (bytes waiting for applications to read) the state and the uid | 
 | owning the socket. | 
 |  | 
 | /proc/net/atalk_iface lists  all  the  interfaces  configured for appletalk.It | 
 | shows the  name  of the interface, its Appletalk address, the network range on | 
 | that address  (or  network number for phase 1 networks), and the status of the | 
 | interface. | 
 |  | 
 | /proc/net/atalk_route lists  each  known  network  route.  It lists the target | 
 | (network) that the route leads to, the router (may be directly connected), the | 
 | route flags, and the device the route is using. | 
 |  | 
 |  | 
 | 5. IPX | 
 | ------------------------------------------------------- | 
 |  | 
 | The IPX protocol has no tunable values in proc/sys/net. | 
 |  | 
 | The IPX  protocol  does,  however,  provide  proc/net/ipx. This lists each IPX | 
 | socket giving  the  local  and  remote  addresses  in  Novell  format (that is | 
 | network:node:port). In  accordance  with  the  strange  Novell  tradition, | 
 | everything but the port is in hex. Not_Connected is displayed for sockets that | 
 | are not  tied to a specific remote address. The Tx and Rx queue sizes indicate | 
 | the number  of  bytes  pending  for  transmission  and  reception.  The  state | 
 | indicates the  state  the  socket  is  in and the uid is the owning uid of the | 
 | socket. | 
 |  | 
 | The /proc/net/ipx_interface  file lists all IPX interfaces. For each interface | 
 | it gives  the network number, the node number, and indicates if the network is | 
 | the primary  network.  It  also  indicates  which  device  it  is bound to (or | 
 | Internal for  internal  networks)  and  the  Frame  Type if appropriate. Linux | 
 | supports 802.3,  802.2,  802.2  SNAP  and DIX (Blue Book) ethernet framing for | 
 | IPX. | 
 |  | 
 | The /proc/net/ipx_route  table  holds  a list of IPX routes. For each route it | 
 | gives the  destination  network, the router node (or Directly) and the network | 
 | address of the router (or Connected) for internal networks. | 
 |  | 
 | 6. TIPC | 
 | ------------------------------------------------------- | 
 |  | 
 | tipc_rmem | 
 | ---------- | 
 |  | 
 | The TIPC protocol now has a tunable for the receive memory, similar to the | 
 | tcp_rmem - i.e. a vector of 3 INTEGERs: (min, default, max) | 
 |  | 
 |     # cat /proc/sys/net/tipc/tipc_rmem | 
 |     4252725 34021800        68043600 | 
 |     # | 
 |  | 
 | The max value is set to CONN_OVERLOAD_LIMIT, and the default and min values | 
 | are scaled (shifted) versions of that same value.  Note that the min value | 
 | is not at this point in time used in any meaningful way, but the triplet is | 
 | preserved in order to be consistent with things like tcp_rmem. | 
 |  | 
 | named_timeout | 
 | -------------- | 
 |  | 
 | TIPC name table updates are distributed asynchronously in a cluster, without | 
 | any form of transaction handling. This means that different race scenarios are | 
 | possible. One such is that a name withdrawal sent out by one node and received | 
 | by another node may arrive after a second, overlapping name publication already | 
 | has been accepted from a third node, although the conflicting updates | 
 | originally may have been issued in the correct sequential order. | 
 | If named_timeout is nonzero, failed topology updates will be placed on a defer | 
 | queue until another event arrives that clears the error, or until the timeout | 
 | expires. Value is in milliseconds. |