Skip to content
Snippets Groups Projects
  1. Mar 08, 2019
  2. Mar 07, 2019
  3. Mar 02, 2019
    • brakmo's avatar
      bpf: HBM test script · 4ffd44cf
      brakmo authored
      
      Script for testing HBM (Host Bandwidth Manager) framework.
      It creates a cgroup to use for testing and load a BPF program to limit
      egress bandwidht. It then uses iperf3 or netperf to create
      loads. The output is the goodput in Mbps (unless -D is used).
      
      It can work on a single host using loopback or among two hosts (with netperf).
      When using loopback, it is recommended to also introduce a delay of at least
      1ms (-d=1), otherwise the assigned bandwidth is likely to be underutilized.
      
      USAGE: $name [out] [-b=<prog>|--bpf=<prog>] [-c=<cc>|--cc=<cc>] [-D]
                   [-d=<delay>|--delay=<delay>] [--debug] [-E]
                   [-f=<#flows>|--flows=<#flows>] [-h] [-i=<id>|--id=<id >] [-l]
      	     [-N] [-p=<port>|--port=<port>] [-P] [-q=<qdisc>]
                   [-R] [-s=<server>|--server=<server] [--stats]
      	     [-t=<time>|--time=<time>] [-w] [cubic|dctcp]
        Where:
          out               Egress (default egress)
          -b or --bpf       BPF program filename to load and attach.
                            Default is nrm_out_kern.o for egress,
          -c or -cc         TCP congestion control (cubic or dctcp)
          -d or --delay     Add a delay in ms using netem
          -D                In addition to the goodput in Mbps, it also outputs
                            other detailed information. This information is
                            test dependent (i.e. iperf3 or netperf).
          --debug           Print BPF trace buffer
          -E                Enable ECN (not required for dctcp)
          -f or --flows     Number of concurrent flows (default=1)
          -i or --id        cgroup id (an integer, default is 1)
          -l                Do not limit flows using loopback
          -N                Use netperf instead of iperf3
          -h                Help
          -p or --port      iperf3 port (default is 5201)
          -P                Use an iperf3 instance for each flow
          -q                Use the specified qdisc.
          -r or --rate      Rate in Mbps (default 1s 1Gbps)
          -R                Use TCP_RR for netperf. 1st flow has req
                            size of 10KB, rest of 1MB. Reply in all
                            cases is 1 byte.
                            More detailed output for each flow can be found
                            in the files netperf.<cg>.<flow>, where <cg> is the
                            cgroup id as specified with the -i flag, and <flow>
                            is the flow id starting at 1 and increasing by 1 for
                            flow (as specified by -f).
          -s or --server    hostname of netperf server. Used to create netperf
                            test traffic between to hosts (default is within host)
                            netserver must be running on the host.
          --stats           Get HBM stats (marked, dropped, etc.)
          -t or --time      duration of iperf3 in seconds (default=5)
          -w                Work conserving flag. cgroup can increase its
                            bandwidth beyond the rate limit specified
                            while there is available bandwidth. Current
                            implementation assumes there is only one NIC
                            (eth0), but can be extended to support multiple
                            NICs. This is just a proof of concept.
          cubic or dctcp    specify TCP CC to use
      
      Examples:
       ./do_hbm_test.sh -l -d=1 -D --stats
           Runs a 5 second test, using a single iperf3 flow and with the default
           rate limit of 1Gbps and a delay of 1ms (using netem) using the default
           TCP congestion control on the loopback device (hence we use "-l" to
           enforce bandwidth limit on loopback device). Since no direction is
           specified, it defaults to egress. Since no TCP CC algorithm is
           specified it uses the system default (Cubic for this test).
           With no -D flag, only the value of the AGGREGATE OUTPUT would show.
           id refers to the cgroup id and is useful when running multi cgroup
           tests (supported by a future patch).
           This patchset does not support calling TCP's congesion window
           reduction, even when packets are dropped by the BPF program, resulting
           in a large number of packets dropped. It is recommended that the  current
           HBM implemenation only be used with ECN enabled flows. A future patch
           will add support for reducing TCP's cwnd and will increase the
           performance of non-ECN enabled flows.
         Output:
           Details for HBM in cgroup 1
           id:1
           rate_mbps:493
           duration:4.8 secs
           packets:11355
           bytes_MB:590
           pkts_dropped:4497
           bytes_dropped_MB:292
           pkts_marked_percent: 39.60
           bytes_marked_percent: 49.49
           pkts_dropped_percent: 39.60
           bytes_dropped_percent: 49.49
           PING AVG DELAY:2.075
           AGGREGATE_GOODPUT:505
      
      ./do_nrm_test.sh -l -d=1 -D --stats dctcp
           Same as above but using dctcp. Note that fewer bytes are dropped
           (0.01% vs. 49%).
         Output:
           Details for HBM in cgroup 1
           id:1
           rate_mbps:945
           duration:4.9 secs
           packets:16859
           bytes_MB:578
           pkts_dropped:1
           bytes_dropped_MB:0
           pkts_marked_percent: 28.74
           bytes_marked_percent: 45.15
           pkts_dropped_percent:  0.01
           bytes_dropped_percent:  0.01
           PING AVG DELAY:2.083
           AGGREGATE_GOODPUT:965
      
      ./do_nrm_test.sh -d=1 -D --stats
           As first example, but without limiting loopback device (i.e. no
           "-l" flag). Since there is no bandwidth limiting, no details for
           HBM are printed out.
         Output:
           Details for HBM in cgroup 1
           PING AVG DELAY:2.019
           AGGREGATE_GOODPUT:42655
      
      ./do_hbm.sh -l -d=1 -D --stats -f=2
           Uses iper3 and does 2 flows
      ./do_hbm.sh -l -d=1 -D --stats -f=4 -P
           Uses iperf3 and does 4 flows, each flow as a separate process.
      ./do_hbm.sh -l -d=1 -D --stats -f=4 -N
           Uses netperf, 4 flows
      ./do_hbm.sh -f=1 -r=2000 -t=5 -N -D --stats dctcp -s=<server-name>
           Uses netperf between two hosts. The remote host name is specified
           with -s= and you need to start the program netserver manually on
           the remote host. It will use 1 flow, a rate limit of 2Gbps and dctcp.
      ./do_hbm.sh -f=1 -r=2000 -t=5 -N -D --stats -w dctcp \
           -s=<server-name>
           As previous, but allows use of extra bandwidth. For this test the
           rate is 8Gbps vs. 1Gbps of the previous test.
      
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4ffd44cf
    • brakmo's avatar
      bpf: User program for testing HBM · a1270fe9
      brakmo authored
      The program nrm creates a cgroup and attaches a BPF program to the
      cgroup for testing HBM (Host Bandwidth Manager) for egress traffic.
      One still needs to create network traffic. This can be done through
      netesto, netperf or iperf3.
      A follow-up patch contains a script to create traffic.
      
      USAGE: hbm [-d] [-l] [-n <id>] [-r <rate>] [-s] [-t <secs>]
                 [-w] [-h] [prog]
        Where:
         -d        Print BPF trace debug buffer
         -l        Also limit flows doing loopback
         -n <#>    To create cgroup "/hbm#" and attach prog. Default is /nrm1
                   This is convenient when testing HBM in more than 1 cgroup
         -r <rate> Rate limit in Mbps
         -s        Get HBM stats (marked, dropped, etc.)
         -t <time> Exit after specified seconds (deault is 0)
         -w        Work conserving flag. cgroup can increase its bandwidth
                   beyond the rate limit specified while there is available
                   bandwidth. Current implementation assumes there is only
                   NIC (eth0), but can be extended to support multiple NICs.
                   Currrently only supported for egress. Note, this is just
      	     a proof of concept.
         -h        Print this info
         prog      BPF program file name. Name defaults to hbm_out_kern.o
      
      More information about HBM can be found in the paper "BPF Host Resource
      Management" presented at the 2018 Linux Plumbers Conference, Networking Track
      (http://vger.kernel.org/lpc_net2018_talks/LPC%20BPF%20Network%20Resource%20Paper.pdf
      
      )
      
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a1270fe9
    • brakmo's avatar
      bpf: Sample HBM BPF program to limit egress bw · 187d0738
      brakmo authored
      
      A cgroup skb BPF program to limit cgroup output bandwidth.
      It uses a modified virtual token bucket queue to limit average
      egress bandwidth. The implementation uses credits instead of tokens.
      Negative credits imply that queueing would have happened (this is
      a virtual queue, so no queueing is done by it. However, queueing may
      occur at the actual qdisc (which is not used for rate limiting).
      
      This implementation uses 3 thresholds, one to start marking packets and
      the other two to drop packets:
                                       CREDIT
             - <--------------------------|------------------------> +
                   |    |          |      0
                   |  Large pkt    |
                   |  drop thresh  |
        Small pkt drop             Mark threshold
            thresh
      
      The effect of marking depends on the type of packet:
      a) If the packet is ECN enabled, then the packet is ECN ce marked.
         The current mark threshold is tuned for DCTCP.
      c) Else, it is dropped if it is a large packet.
      
      If the credit is below the drop threshold, the packet is dropped.
      Note that dropping a packet through the BPF program does not trigger CWR
      (Congestion Window Reduction) in TCP packets. A future patch will add
      support for triggering CWR.
      
      This BPF program actually uses 2 drop thresholds, one threshold
      for larger packets (>= 120 bytes) and another for smaller packets. This
      protects smaller packets such as SYNs, ACKs, etc.
      
      The default bandwidth limit is set at 1Gbps but this can be changed by
      a user program through a shared BPF map. In addition, by default this BPF
      program does not limit connections using loopback. This behavior can be
      overwritten by the user program. There is also an option to calculate
      some statistics, such as percent of packets marked or dropped, which
      the user program can access.
      
      A latter patch provides such a program (hbm.c)
      
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      187d0738
    • Yonghong Song's avatar
      samples/bpf: silence compiler warning for xdpsock_user.c · b74e21ab
      Yonghong Song authored
      
      Compiling xdpsock_user.c with 4.8.5, I hit the following
      compilation warning:
          HOSTCC  samples/bpf/xdpsock_user.o
        /data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c: In function ‘main’:
        /data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c:449:6: warning: ‘idx_cq’ may be used unini
        tialized in this function [-Wmaybe-uninitialized]
          u32 idx_cq, idx_fq;
              ^
        /data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c:606:7: warning: ‘idx_rx’ may be used unini
        tialized in this function [-Wmaybe-uninitialized]
           u32 idx_rx, idx_tx = 0;
               ^
        /data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c:506:6: warning: ‘idx_rx’ may be used unini
        tialized in this function [-Wmaybe-uninitialized]
          u32 idx_rx, idx_fq = 0;
      
      As an example, the code pattern looks like:
          u32 idx_cq;
          ...
          ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
          if (ret) {
            ...
          }
          ... idx_fq ...
      The compiler warns since it does not know whether &idx_fq is assigned
      or not inside the library function xsk_ring_prod__reserve().
      
      Let us assign an initial value 0 to such auto variables to silence
      compiler warning.
      
      Fixes: 248c7f9c ("samples/bpf: convert xdpsock to use libbpf for AF_XDP access")
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      b74e21ab
  4. Feb 28, 2019
  5. Feb 27, 2019
  6. Feb 25, 2019
  7. Feb 22, 2019
  8. Feb 12, 2019
  9. Feb 05, 2019
  10. Feb 01, 2019
  11. Jan 30, 2019
  12. Jan 26, 2019
  13. Jan 25, 2019
  14. Jan 15, 2019
    • Yonghong Song's avatar
      samples/bpf: workaround clang asm goto compilation errors · 6bf3bbe1
      Yonghong Song authored
      
      x86 compilation has required asm goto support since 4.17.
      Since clang does not support asm goto, at 4.17,
      Commit b1ae32db ("x86/cpufeature: Guard asm_volatile_goto usage
      for BPF compilation") worked around the issue by permitting an
      alternative implementation without asm goto for clang.
      
      At 5.0, more asm goto usages appeared.
        [yhs@148 x86]$ egrep -r asm_volatile_goto
        include/asm/cpufeature.h:     asm_volatile_goto("1: jmp 6f\n"
        include/asm/jump_label.h:     asm_volatile_goto("1:"
        include/asm/jump_label.h:     asm_volatile_goto("1:"
        include/asm/rmwcc.h:  asm_volatile_goto (fullop "; j" #cc " %l[cc_label]"     \
        include/asm/uaccess.h:        asm_volatile_goto("\n"                          \
        include/asm/uaccess.h:        asm_volatile_goto("\n"                          \
        [yhs@148 x86]$
      
      Compiling samples/bpf directories, most bpf programs failed
      compilation with error messages like:
        In file included from /home/yhs/work/bpf-next/samples/bpf/xdp_sample_pkts_kern.c:2:
        In file included from /home/yhs/work/bpf-next/include/linux/ptrace.h:6:
        In file included from /home/yhs/work/bpf-next/include/linux/sched.h:15:
        In file included from /home/yhs/work/bpf-next/include/linux/sem.h:5:
        In file included from /home/yhs/work/bpf-next/include/uapi/linux/sem.h:5:
        In file included from /home/yhs/work/bpf-next/include/linux/ipc.h:9:
        In file included from /home/yhs/work/bpf-next/include/linux/refcount.h:72:
        /home/yhs/work/bpf-next/arch/x86/include/asm/refcount.h:70:9: error: 'asm goto' constructs are not supported yet
              return GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
                     ^
        /home/yhs/work/bpf-next/arch/x86/include/asm/rmwcc.h:67:2: note: expanded from macro 'GEN_BINARY_SUFFIXED_RMWcc'
              __GEN_RMWcc(op " %[val], %[var]\n\t" suffix, var, cc,           \
              ^
        /home/yhs/work/bpf-next/arch/x86/include/asm/rmwcc.h:21:2: note: expanded from macro '__GEN_RMWcc'
              asm_volatile_goto (fullop "; j" #cc " %l[cc_label]"             \
              ^
        /home/yhs/work/bpf-next/include/linux/compiler_types.h:188:37: note: expanded from macro 'asm_volatile_goto'
        #define asm_volatile_goto(x...) asm goto(x)
      
      Most implementation does not even provide an alternative
      implementation. And it is also not practical to make changes
      for each call site.
      
      This patch workarounded the asm goto issue by redefining the macro like below:
        #define asm_volatile_goto(x...) asm volatile("invalid use of asm_volatile_goto")
      
      If asm_volatile_goto is not used by bpf programs, which is typically the case, nothing bad
      will happen. If asm_volatile_goto is used by bpf programs, which is incorrect, the compiler
      will issue an error since "invalid use of asm_volatile_goto" is not valid assembly codes.
      
      With this patch, all bpf programs under samples/bpf can pass compilation.
      
      Note that bpf programs under tools/testing/selftests/bpf/ compiled fine as
      they do not access kernel internal headers.
      
      Fixes: e769742d ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"")
      Fixes: 18fe5822 ("x86, asm: change the GEN_*_RMWcc() macros to not quote the condition")
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      6bf3bbe1
    • Christian Brauner's avatar
      samples: add binderfs sample program · 9762dc14
      Christian Brauner authored
      
      This adds a simple sample program mounting binderfs and adding, then
      removing a binder device.  Hopefully, it will be helpful to users who want
      to know how binderfs is supposed to be used.
      
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarJonathan Corbet <corbet@lwn.net>
      9762dc14
  15. Jan 11, 2019
    • Petr Mladek's avatar
      livepatch: Simplify API by removing registration step · 958ef1e3
      Petr Mladek authored
      
      The possibility to re-enable a registered patch was useful for immediate
      patches where the livepatch module had to stay until the system reboot.
      The improved consistency model allows to achieve the same result by
      unloading and loading the livepatch module again.
      
      Also we are going to add a feature called atomic replace. It will allow
      to create a patch that would replace all already registered patches.
      The aim is to handle dependent patches more securely. It will obsolete
      the stack of patches that helped to handle the dependencies so far.
      Then it might be unclear when a cumulative patch re-enabling is safe.
      
      It would be complicated to support the many modes. Instead we could
      actually make the API and code easier to understand.
      
      Therefore, remove the two step public API. All the checks and init calls
      are moved from klp_register_patch() to klp_enabled_patch(). Also the patch
      is automatically freed, including the sysfs interface when the transition
      to the disabled state is completed.
      
      As a result, there is never a disabled patch on the top of the stack.
      Therefore we do not need to check the stack in __klp_enable_patch().
      And we could simplify the check in __klp_disable_patch().
      
      Also the API and logic is much easier. It is enough to call
      klp_enable_patch() in module_init() call. The patch can be disabled
      by writing '0' into /sys/kernel/livepatch/<patch>/enabled. Then the module
      can be removed once the transition finishes and sysfs interface is freed.
      
      The only problem is how to free the structures and kobjects safely.
      The operation is triggered from the sysfs interface. We could not put
      the related kobject from there because it would cause lock inversion
      between klp_mutex and kernfs locks, see kn->count lockdep map.
      
      Therefore, offload the free task to a workqueue. It is perfectly fine:
      
        + The patch can no longer be used in the livepatch operations.
      
        + The module could not be removed until the free operation finishes
          and module_put() is called.
      
        + The operation is asynchronous already when the first
          klp_try_complete_transition() fails and another call
          is queued with a delay.
      
      Suggested-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      Acked-by: default avatarMiroslav Benes <mbenes@suse.cz>
      Acked-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      958ef1e3
  16. Jan 10, 2019
  17. Jan 08, 2019
  18. Jan 07, 2019
  19. Dec 23, 2018
  20. Dec 18, 2018
  21. Dec 17, 2018
  22. Dec 12, 2018
    • Tycho Andersen's avatar
      samples: add an example of seccomp user trap · fec7b669
      Tycho Andersen authored
      
      The idea here is just to give a demonstration of how one could safely use
      the SECCOMP_RET_USER_NOTIF feature to do mount policies. This particular
      policy is (as noted in the comment) not very interesting, but it serves to
      illustrate how one might apply a policy dodging the various TOCTOU issues.
      
      Signed-off-by: default avatarTycho Andersen <tycho@tycho.ws>
      CC: Kees Cook <keescook@chromium.org>
      CC: Andy Lutomirski <luto@amacapital.net>
      CC: Oleg Nesterov <oleg@redhat.com>
      CC: Eric W. Biederman <ebiederm@xmission.com>
      CC: "Serge E. Hallyn" <serge@hallyn.com>
      CC: Christian Brauner <christian@brauner.io>
      CC: Tyler Hicks <tyhicks@canonical.com>
      CC: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      fec7b669
  23. Dec 03, 2018
  24. Dec 01, 2018
    • Masahiro Yamada's avatar
      kbuild: announce removal of SUBDIRS if used · 0126be38
      Masahiro Yamada authored
      
      SUBDIRS has been kept as a backward compatibility since
      commit ("[PATCH] kbuild: external module support") in 2002.
      
      We do not need multiple ways to do the same thing, so I will remove
      SUBDIRS after the Linux 5.3 release. I cleaned up in-tree code, and
      updated the document so that nobody would try to use it.
      
      Meanwhile, display the following warning if SUBDIRS is used.
      
      Makefile:189: ================= WARNING ================
      Makefile:190: 'SUBDIRS' will be removed after Linux 5.3
      Makefile:191: Please use 'M=' or 'KBUILD_EXTMOD' instead
      Makefile:192: ==========================================
      
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: Boris Brezillon <boris.brezillon@bootlin.com> # for scx200_docflash.c
      Acked-by: Guenter Roeck <linux@roeck-us.net> # for scx200_wdt.c
      0126be38
    • Matteo Croce's avatar
      samples: bpf: get ifindex from ifname · dc378a1a
      Matteo Croce authored
      
      Find the ifindex with if_nametoindex() instead of requiring the
      numeric ifindex.
      
      Signed-off-by: default avatarMatteo Croce <mcroce@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      dc378a1a
Loading