Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inline function swap_store in blosc2.h #500

Open
DimitriPapadopoulos opened this issue May 18, 2023 · 12 comments
Open

inline function swap_store in blosc2.h #500

DimitriPapadopoulos opened this issue May 18, 2023 · 12 comments

Comments

@DimitriPapadopoulos
Copy link
Contributor

DimitriPapadopoulos commented May 18, 2023

Describe the bug

Why is the inline function swap_store part of the API?

It is mostly used for int32_t and int64_t. In that case, calls to malloc and memcpy are overkill. Just use:

uint32_t swap32(uint32_t x) {
  return ((x & 0x000000FF) << 24) | ((x & 0x0000FF00) <<  8) |
         ((x & 0x00FF0000) >>  8) | ((x & 0xFF000000) >> 24);
}

uint64_t swap64(uint64_t x) {
  return ((x & 0x00000000000000FF) << 56) | ((x & 0x000000000000FF00) << 40) | ((x & 0x0000000000FF0000) << 24) | ((x & 0x00000000FF000000) <<  8) | 
         ((x & 0x000000FF00000000) >>  8) | ((x & 0x0000FF0000000000) >> 24) | ((x & 0x00FF000000000000) >> 40) | ((x & 0xFF00000000000000) >> 56);
}

Actually, you should use compiler intrinsics if they exist:

2 bytes 4 bytes 8 bytes
MSVC _byteswap_ushort _byteswap_ulong _byteswap_uint64
GCC __builtin_bswap16 __builtin_bswap32 __builtin_bswap64
Clang __builtin_bswap16 __builtin_bswap32 __builtin_bswap64

To Reproduce

c-blosc2/include/blosc2.h

Lines 2121 to 2127 in 65a1a30

static inline void swap_store(void *dest, const void *pa, int size) {
uint8_t *pa_ = (uint8_t *) pa;
uint8_t *pa2_ = (uint8_t*)malloc((size_t) size);
int i = 1; /* for big/little endian detection */
char *p = (char *) &i;
if (p[0] == 1) {

Expected behavior

Remove from the API.

Rewrite using compiler intrinsics and fall back to the above piece of code.

Additional context

See xxHash implementation:

/*!
 * @internal
 * @fn xxh_u32 XXH_swap32(xxh_u32 x)
 * @brief A 32-bit byteswap.
 *
 * @param x The 32-bit integer to byteswap.
 * @return @p x, byteswapped.
 */
#if defined(_MSC_VER)     /* Visual Studio */
#  define XXH_swap32 _byteswap_ulong
#elif XXH_GCC_VERSION >= 403
#  define XXH_swap32 __builtin_bswap32
#else
static xxh_u32 XXH_swap32 (xxh_u32 x)
{
    return  ((x << 24) & 0xff000000 ) |
            ((x <<  8) & 0x00ff0000 ) |
            ((x >>  8) & 0x0000ff00 ) |
            ((x >> 24) & 0x000000ff );
}
#endif
@DimitriPapadopoulos
Copy link
Contributor Author

Actually, you already have such functions:

/* Reverse swap bits in a 32-bit integer */
static inline int32_t bswap32_(int32_t a) {
#if defined (__GNUC__)
return __builtin_bswap32(a);
#elif defined (_MSC_VER) /* Visual Studio */
return _byteswap_ulong(a);
#else
a = ((a & 0x000000FF) << 24) |
((a & 0x0000FF00) << 8) |
((a & 0x00FF0000) >> 8) |
((a & 0xFF000000) >> 24);
return a;
#endif
}

@DimitriPapadopoulos
Copy link
Contributor Author

You even have a similar endian_handler function:

static inline void endian_handler(bool little, void *dest, const void *pa, int size)
{
bool little_endian = is_little_endian();
if (little_endian == little) {
memcpy(dest, pa, size);
}
else {
uint8_t* pa_ = (uint8_t*)pa;
uint8_t pa2_[8];
switch (size) {
case 8:
pa2_[0] = pa_[7];
pa2_[1] = pa_[6];
pa2_[2] = pa_[5];
pa2_[3] = pa_[4];
pa2_[4] = pa_[3];
pa2_[5] = pa_[2];
pa2_[6] = pa_[1];
pa2_[7] = pa_[0];
break;
case 4:
pa2_[0] = pa_[3];
pa2_[1] = pa_[2];
pa2_[2] = pa_[1];
pa2_[3] = pa_[0];
break;
case 2:
pa2_[0] = pa_[1];
pa2_[1] = pa_[0];
break;
case 1:
pa2_[0] = pa_[0];
break;
default:
BLOSC_TRACE_ERROR("Unhandled size: %d.", size);
}
memcpy(dest, pa2_, size);
}
}

@FrancescAlted
Copy link
Member

IIRC we made this in order to avoid replicating the code in codecs, which do not have access to the blosc2 internals. See e.g. 8359e6d

@DimitriPapadopoulos
Copy link
Contributor Author

I see. How about moving that code to plugins/plugin_utils.h?

@FrancescAlted
Copy link
Member

I see. How about moving that code to plugins/plugin_utils.h?

+1

@DimitriPapadopoulos
Copy link
Contributor Author

Also, swap_store is used only on 32- and 64-bits integers:

$ grep -R swap_store
blosc/b2nd.c:    swap_store(pmeta, shape + i, sizeof(int64_t));
blosc/b2nd.c:    swap_store(pmeta, chunkshape + i, sizeof(int32_t));
blosc/b2nd.c:    swap_store(pmeta, blockshape + i, sizeof(int32_t));
blosc/b2nd.c:  swap_store(pmeta, &dtype_len, sizeof(int32_t));
plugins/codecs/zfp/blosc2-zfp.c:          swap_store(blockmeta + i, pmeta, sizeof(int32_t));
plugins/plugin_utils.c:    swap_store(shape + i, pmeta, sizeof(int64_t));
plugins/plugin_utils.c:    swap_store(chunkshape + i, pmeta, sizeof(int32_t));
plugins/plugin_utils.c:    swap_store(blockshape + i, pmeta, sizeof(int32_t));
include/b2nd.h:    swap_store(shape + i, pmeta, sizeof(int64_t));
include/b2nd.h:    swap_store(chunkshape + i, pmeta, sizeof(int32_t));
include/b2nd.h:    swap_store(blockshape + i, pmeta, sizeof(int32_t));
include/b2nd.h:    swap_store(&dtype_len, pmeta, sizeof(int32_t));
include/blosc2.h:static inline void swap_store(void *dest, const void *pa, int size) {
$ 

I plan on adding specialised functions for those types only (and maybe 16-bits integers just in case).

@DimitriPapadopoulos
Copy link
Contributor Author

Actually, (some) codecs do access blosc2 internals – at the very least they include headers from blosc.

For example, while compiling plugins/codecs/zfp/blosc2-zfp.c, the compiler complains about redefinitions of the new swap functions in plugins/plugin_utils.h, with the previous definitions in blosc/blosc-private.h. That's because plugins/codecs/zfp/blosc2-zfp.c includes private blosc header files:

#include "blosc-private.h"
#include "frame.h"
#include "blosc2/codecs-registry.h"
#include "zfp.h"
#include "blosc2-zfp.h"
#include <math.h>
#include "context.h"

What do you mean exactly by “codecs [...] do not have access to the blosc2 internals”?

@FrancescAlted
Copy link
Member

I don't remember exactly how things went, but probably we've made swap_store public without thinking too much. Having said that, recently we have introduced the concept of dynamic plugins, and I'd say that swap_store in blosc2.h can be good for them.

Now that I look into the code, a possibility is to move these functions to include/blosc2/blosc2-common.h which is distributed in binary (devel) packages, and that would be good for plugin developers?

@DimitriPapadopoulos
Copy link
Contributor Author

So, it would be part of the API after all.

I will move/merge everything from include/blosc2.h and blosc/blosc-private.h to include/blosc2/blosc2-common.h.

@DimitriPapadopoulos
Copy link
Contributor Author

DimitriPapadopoulos commented May 21, 2023

These two functions are (almost) identical:

  • swap_store() from include/blosc2.h
  • endian_handler() from blosc/blosc-private.h

These functions are really strange, it feels like they were written as an example not to follow:

  • they were written as if they could swap any number of bytes,
  • yet they swap only 1, 2, 4 or 8 bytes,
  • the fail to optimise these few supported cases with compiler intrinsics,
  • they allocate memory without reason as far as I can see,
  • when they fail, they leave rubbish behind and just output the message Unhandled nitems.

I'll see if we can get rid of them, and stick to functions specialised for 2, 4 or 8 bytes.

@FrancescAlted
Copy link
Member

Any progress on this one?

@DimitriPapadopoulos
Copy link
Contributor Author

I can't recall why this issue is stalled. I guess I'm not entirely certain how to fix this. Ideally, these functions should be removed from the API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants