Skip to content

Commit

Permalink
Update UDF tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
FrancescAlted committed Dec 3, 2024
1 parent 57bd2a6 commit a500191
Showing 1 changed file with 32 additions and 43 deletions.
75 changes: 32 additions & 43 deletions doc/getting_started/tutorials/03.lazyarray-udf.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,9 @@
},
"outputs": [],
"source": [
"shape = (5_000, 10_000)\n",
"a = blosc2.linspace(0, 1, np.prod(shape), dtype=np.float32, shape=shape)"
"shape = (5_000, 2_000)\n",
"a = blosc2.linspace(0, 1, np.prod(shape), dtype=np.float32, shape=shape)\n",
"b = np.arange(np.prod(shape), dtype=np.int32).reshape(shape)"
]
},
{
Expand All @@ -68,9 +69,9 @@
},
"outputs": [],
"source": [
"def add_one(inputs_tuple, output, offset):\n",
" x = inputs_tuple[0]\n",
" output[:] = x**3 + np.sin(x) + 1"
"def myudf(inputs_tuple, output, offset):\n",
" x, y = inputs_tuple\n",
" output[:] = x**3 + np.sin(y) + 1"
]
},
{
Expand Down Expand Up @@ -101,7 +102,7 @@
}
],
"source": [
"larray = blosc2.lazyudf(add_one, (a,), a.dtype)\n",
"larray = blosc2.lazyudf(myudf, (a, b), a.dtype)\n",
"print(f\"Class: {type(larray)}\")"
]
},
Expand All @@ -128,8 +129,8 @@
"output_type": "stream",
"text": [
"Type: <class 'numpy.ndarray'>\n",
"CPU times: user 351 ms, sys: 92.3 ms, total: 443 ms\n",
"Wall time: 396 ms\n"
"CPU times: user 137 ms, sys: 32.9 ms, total: 170 ms\n",
"Wall time: 152 ms\n"
]
}
],
Expand Down Expand Up @@ -163,20 +164,20 @@
"text": [
"Type: <class 'blosc2.ndarray.NDArray'>\n",
"type : NDArray\n",
"shape : (5000, 10000)\n",
"chunks : (100, 10000)\n",
"blocks : (1, 10000)\n",
"shape : (5000, 2000)\n",
"chunks : (500, 2000)\n",
"blocks : (8, 2000)\n",
"dtype : float32\n",
"cratio : 21.12\n",
"cratio : 1.90\n",
"cparams : CParams(codec=<Codec.ZSTD: 5>, codec_meta=0, clevel=1, use_dict=False, typesize=4,\n",
" : nthreads=7, blocksize=40000, splitmode=<SplitMode.AUTO_SPLIT: 3>,\n",
" : nthreads=7, blocksize=64000, splitmode=<SplitMode.AUTO_SPLIT: 3>,\n",
" : filters=[<Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>,\n",
" : <Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>, <Filter.SHUFFLE: 1>], filters_meta=[0, 0,\n",
" : 0, 0, 0, 0], tuner=<Tuner.STUNE: 0>)\n",
"dparams : DParams(nthreads=7)\n",
"\n",
"CPU times: user 519 ms, sys: 90.7 ms, total: 610 ms\n",
"Wall time: 422 ms\n"
"CPU times: user 179 ms, sys: 35.5 ms, total: 214 ms\n",
"Wall time: 164 ms\n"
]
}
],
Expand All @@ -187,27 +188,6 @@
"print(c.info)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"ename": "TypeError",
"evalue": "LazyUDF.save() takes 1 positional argument but 2 were given",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[10], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mlarray\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msave\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mtest.b2nd\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n",
"\u001b[0;31mTypeError\u001b[0m: LazyUDF.save() takes 1 positional argument but 2 were given"
]
}
],
"source": [
"larray.save(\"test.b2nd\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -228,9 +208,9 @@
"outputs": [],
"source": [
"@nb.jit(nopython=True, parallel=True)\n",
"def add_one_numba(inputs_tuple, output, offset):\n",
" x = inputs_tuple[0]\n",
" output[:] = x**3 + np.sin(x) + 1"
"def myudf_numba(inputs_tuple, output, offset):\n",
" x, y = inputs_tuple\n",
" output[:] = x**3 + np.sin(y) + 1"
]
},
{
Expand All @@ -244,7 +224,7 @@
},
"outputs": [],
"source": [
"larray2 = blosc2.lazyudf(add_one_numba, (a,), a.dtype)"
"larray2 = blosc2.lazyudf(myudf_numba, (a, b), a.dtype)"
]
},
{
Expand All @@ -263,8 +243,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 569 ms, sys: 110 ms, total: 679 ms\n",
"Wall time: 519 ms\n"
"CPU times: user 486 ms, sys: 63 ms, total: 549 ms\n",
"Wall time: 463 ms\n"
]
}
],
Expand All @@ -281,13 +261,22 @@
"large initialization overheads and the function is quite simple. For more complex functions, or larger arrays, the difference will be less noticeable or favorable to it."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Excercise\n",
"\n",
"Check which array size Numba UDF starts to be competitive. If you master Numba enough, you may also want to unroll loops in UDF and see whether you can make it faster."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"In this section, we have seen how to execute user-defined function and get the result as a NumPy or NDArray. We have also seen how to make a Numba UDF."
"We have seen how to build new LazyArray objects coming from other NDArray or NumPy objects and use User Defined Functions (UDFs) to create the desired result. We have also demonstrated that integrating Numba in UDF is pretty easy."
]
}
],
Expand Down

0 comments on commit a500191

Please sign in to comment.