Skip to content
Snippets Groups Projects
  1. Sep 26, 2016
  2. Sep 21, 2016
  3. Dec 01, 2015
  4. Aug 31, 2015
    • Ard Biesheuvel's avatar
      md/raid6: delta syndrome for ARM NEON · 0e833e69
      Ard Biesheuvel authored
      
      This implements XOR syndrome calculation using NEON intrinsics.
      As before, the module can be built for ARM and arm64 from the
      same source.
      
      Relative performance on a Cortex-A57 based system:
      
        raid6: int64x1  gen()   905 MB/s
        raid6: int64x1  xor()   881 MB/s
        raid6: int64x2  gen()  1343 MB/s
        raid6: int64x2  xor()  1286 MB/s
        raid6: int64x4  gen()  1896 MB/s
        raid6: int64x4  xor()  1321 MB/s
        raid6: int64x8  gen()  1773 MB/s
        raid6: int64x8  xor()  1165 MB/s
        raid6: neonx1   gen()  1834 MB/s
        raid6: neonx1   xor()  1278 MB/s
        raid6: neonx2   gen()  2528 MB/s
        raid6: neonx2   xor()  1942 MB/s
        raid6: neonx4   gen()  2888 MB/s
        raid6: neonx4   xor()  2334 MB/s
        raid6: neonx8   gen()  2957 MB/s
        raid6: neonx8   xor()  2232 MB/s
        raid6: using algorithm neonx8 gen() 2957 MB/s
        raid6: .... xor() 2232 MB/s, rmw enabled
      
      Cc: Markus Stockhausen <stockhausen@collogia.de>
      Cc: Neil Brown <neilb@suse.de>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      0e833e69
  5. Jun 11, 2015
  6. May 19, 2015
    • Ingo Molnar's avatar
      x86/fpu: Rename i387.h to fpu/api.h · df6b35f4
      Ingo Molnar authored
      
      We already have fpu/types.h, move i387.h to fpu/api.h.
      
      The file name has become a misnomer anyway: it offers generic FPU APIs,
      but is not limited to i387 functionality.
      
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      df6b35f4
  7. Apr 21, 2015
    • Markus Stockhausen's avatar
      md/raid6 algorithms: xor_syndrome() for SSE2 · a582564b
      Markus Stockhausen authored
      
      The second and (last) optimized XOR syndrome calculation. This version
      supports right and left side optimization. All CPUs with architecture
      older than Haswell will benefit from it.
      
      It should be noted that SSE2 movntdq kills performance for memory areas
      that are read and written simultaneously in chunks smaller than cache
      line size. So use movdqa instead for P/Q writes in sse21 and sse22 XOR
      functions.
      
      Signed-off-by: default avatarMarkus Stockhausen <stockhausen@collogia.de>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      a582564b
    • Markus Stockhausen's avatar
      md/raid6 algorithms: xor_syndrome() for generic int · 9a5ce91d
      Markus Stockhausen authored
      
      Start the algorithms with the very basic one. It is left and right
      optimized. That means we can avoid all calculations for unneeded pages
      above the right stop offset. For pages below the left start offset we
      still need the syndrome multiplication but without reading data pages.
      
      Signed-off-by: default avatarMarkus Stockhausen <stockhausen@collogia.de>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      9a5ce91d
    • Markus Stockhausen's avatar
      md/raid6 algorithms: improve test program · 7e92e1d7
      Markus Stockhausen authored
      
      It is always helpful to have a test tool in place if we implement
      new data critical algorithms. So add some test routines to the raid6
      checker that can prove if the new xor_syndrome() works as expected.
      
      Run through all permutations of start/stop pages per algorithm and
      simulate a xor_syndrome() assisted rmw run. After each rmw check if
      the recovery algorithm still confirms that the stripe is fine.
      
      Signed-off-by: default avatarMarkus Stockhausen <stockhausen@collogia.de>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      7e92e1d7
    • Markus Stockhausen's avatar
      md/raid6 algorithms: delta syndrome functions · fe5cbc6e
      Markus Stockhausen authored
      
      v3: s-o-b comment, explanation of performance and descision for
      the start/stop implementation
      
      Implementing rmw functionality for RAID6 requires optimized syndrome
      calculation. Up to now we can only generate a complete syndrome. The
      target P/Q pages are always overwritten. With this patch we provide
      a framework for inplace P/Q modification. In the first place simply
      fill those functions with NULL values.
      
      xor_syndrome() has two additional parameters: start & stop. These
      will indicate the first and last page that are changing during a
      rmw run. That makes it possible to avoid several unneccessary loops
      and speed up calculation. The caller needs to implement the following
      logic to make the functions work.
      
      1) xor_syndrome(disks, start, stop, ...): "Remove" all data of source
      blocks inside P/Q between (and including) start and end.
      
      2) modify any block with start <= block <= stop
      
      3) xor_syndrome(disks, start, stop, ...): "Reinsert" all data of
      source blocks into P/Q between (and including) start and end.
      
      Pages between start and stop that won't be changed should be filled
      with a pointer to the kernel zero page. The reasons for not taking NULL
      pages are:
      
      1) Algorithms cross the whole source data line by line. Thus avoid
      additional branches.
      
      2) Having a NULL page avoids calculating the XOR P parity but still
      need calulation steps for the Q parity. Depending on the algorithm
      unrolling that might be only a difference of 2 instructions per loop.
      
      The benchmark numbers of the gen_syndrome() functions are displayed in
      the kernel log. Do the same for the xor_syndrome() functions. This
      will help to analyze performance problems and give an rough estimate
      how well the algorithm works. The choice of the fastest algorithm will
      still depend on the gen_syndrome() performance.
      
      With the start/stop page implementation the speed can vary a lot in real
      life. E.g. a change of page 0 & page 15 on a stripe will be harder to
      compute than the case where page 0 & page 1 are XOR candidates. To be not
      to enthusiatic about the expected speeds we will run a worse case test
      that simulates a change on the upper half of the stripe. So we do:
      
      1) calculation of P/Q for the upper pages
      
      2) continuation of Q for the lower (empty) pages
      
      Signed-off-by: default avatarMarkus Stockhausen <stockhausen@collogia.de>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      fe5cbc6e
  8. Feb 03, 2015
  9. Oct 14, 2014
  10. Aug 27, 2013
  11. Jul 08, 2013
  12. Dec 13, 2012
  13. May 28, 2012
  14. May 22, 2012
  15. Mar 28, 2012
  16. Oct 31, 2011
  17. Oct 20, 2011
  18. Aug 30, 2010
  19. Aug 11, 2010
  20. Aug 10, 2010
  21. Oct 29, 2009
Loading