Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 14 additions & 14 deletions docs/en/07-develop/09-udf.md
Original file line number Diff line number Diff line change
Expand Up @@ -241,33 +241,33 @@ To better operate the above data structures, some convenience functions are prov

### C UDF Example Code

#### Scalar Function Example [bit_and](https://github.com/taosdata/TDengine/blob/3.0/test/cases/12-UDFs/sh/bit_and.c)
#### Scalar Function Example

`bit_and` implements the bitwise AND function for multiple columns. If there is only one column, it returns that column. `bit_and` ignores null values.

<details>
<summary>bit_and.c</summary>

```c
{{#include test/cases/12-UDFs/sh/bit_and.c}}
{{#include docs/examples/udf/bit_and.c}}
```

</details>

#### Aggregate Function Example 1 Returning Numeric Type [l2norm](https://github.com/taosdata/TDengine/blob/3.0/test/cases/12-UDFs/sh/l2norm.c)
#### Aggregate Function Example 1 Returning Numeric Type

`l2norm` implements the second-order norm of all data in the input columns, i.e., squaring each data point, then summing them up, and finally taking the square root.

<details>
<summary>l2norm.c</summary>

```c
{{#include test/cases/12-UDFs/sh/l2norm.c}}
{{#include docs/examples/udf/l2norm.c}}
```

</details>

#### Aggregate Function Example 2 Returning String Type [max_vol](https://github.com/taosdata/TDengine/blob/3.0/test/cases/12-UDFs/sh/max_vol.c)
#### Aggregate Function Example 2 Returning String Type

`max_vol` implements finding the maximum voltage from multiple input voltage columns, returning a composite string value consisting of the device ID + the position (row, column) of the maximum voltage + the maximum voltage value.

Expand All @@ -293,12 +293,12 @@ select max_vol(vol1, vol2, vol3, deviceid) from battery;
<summary>max_vol.c</summary>

```c
{{#include test/cases/12-UDFs/sh/max_vol.c}}
{{#include docs/examples/udf/max_vol.c}}
```

</details>

#### Aggregate Function Example 3 Split string and calculate average value [extract_avg](https://github.com/taosdata/TDengine/blob/3.0/test/cases/12-UDFs/sh/extract_avg.c)
#### Aggregate Function Example 3 Split string and calculate average value

The `extract_avg` function converts a comma-separated string sequence into a set of numerical values, counts the results of all rows, and calculates the final average. Note when implementing:

Expand Down Expand Up @@ -334,7 +334,7 @@ gcc -g -O0 -fPIC -shared extract_vag.c -o libextract_avg.so
<summary>extract_avg.c</summary>

```c
{{#include test/cases/12-UDFs/sh/extract_avg.c}}
{{#include docs/examples/udf/extract_avg.c}}
```

</details>
Expand Down Expand Up @@ -866,40 +866,40 @@ Through this example, we learned how to define aggregate functions and print cus

### More Python UDF Example Code

#### Scalar Function Example [pybitand](https://github.com/taosdata/TDengine/blob/3.0/test/cases/12-UDFs/sh/pybitand.py)
#### Scalar Function Example

`pybitand` implements the bitwise AND function for multiple columns. If there is only one column, it returns that column. `pybitand` ignores null values.

<details>
<summary>pybitand.py</summary>

```python
{{#include test/cases/12-UDFs/sh/pybitand.py}}
{{#include docs/examples/udf/pybitand.py}}
```

</details>

#### Aggregate Function Example [pyl2norm](https://github.com/taosdata/TDengine/blob/3.0/test/cases/12-UDFs/sh/pyl2norm.py)
#### Aggregate Function Example

`pyl2norm` calculates the second-order norm of all data in the input column, i.e., squares each data point, then sums them up, and finally takes the square root.

<details>
<summary>pyl2norm.py</summary>

```python
{{#include test/cases/12-UDFs/sh/pyl2norm.py}}
{{#include docs/examples/udf/pyl2norm.py}}
```

</details>

#### Aggregate Function Example [pycumsum](https://github.com/taosdata/TDengine/blob/3.0/test/cases/12-UDFs/sh/pycumsum.py)
#### Aggregate Function Example

`pycumsum` uses numpy to calculate the cumulative sum of all data in the input column.
<details>
<summary>pycumsum.py</summary>

```python
{{#include test/cases/12-UDFs/sh/pycumsum.py}}
{{#include docs/examples/udf/pycumsum.py}}
```

</details>
Expand Down
61 changes: 61 additions & 0 deletions docs/examples/udf/bit_and.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "taosudf.h"

DLL_EXPORT int32_t bit_and_init() { return 0; }

DLL_EXPORT int32_t bit_and_destroy() { return 0; }

DLL_EXPORT int32_t bit_and(SUdfDataBlock* block, SUdfColumn* resultCol) {
udfTrace("block:%p, processing begins, rows:%d cols:%d", block, block->numOfRows, block->numOfCols);

if (block->numOfCols < 2) {
udfError("block:%p, cols:%d needs to be greater than 2", block, block->numOfCols);
return TSDB_CODE_UDF_INVALID_INPUT;
}
Comment on lines +13 to +16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The column count check is incorrect. The documentation states that the function should work for a single column, but the current check block->numOfCols < 2 causes it to fail with an error for one column. Additionally, the error message needs to be greater than 2 is confusing and inconsistent with the check. The check should be for at least one column.

  if (block->numOfCols < 1) {
    udfError("block:%p, cols:%d needs to be greater than or equal to 1", block, block->numOfCols);
    return TSDB_CODE_UDF_INVALID_INPUT;
  }


for (int32_t i = 0; i < block->numOfCols; ++i) {
SUdfColumn* col = block->udfCols[i];
if (col->colMeta.type != TSDB_DATA_TYPE_INT) {
udfError("block:%p, col:%d type:%d should be int(%d)", block, i, col->colMeta.type, TSDB_DATA_TYPE_INT);
return TSDB_CODE_UDF_INVALID_INPUT;
}
}

SUdfColumnData* resultData = &resultCol->colData;

for (int32_t i = 0; i < block->numOfRows; ++i) {
if (udfColDataIsNull(block->udfCols[0], i)) {
udfColDataSetNull(resultCol, i);
udfTrace("block:%p, row:%d result is null since col:0 is null", block, i);
continue;
}

int32_t result = *(int32_t*)udfColDataGetData(block->udfCols[0], i);
udfTrace("block:%p, row:%d col:0 data:%d", block, i, result);

int32_t j = 1;
for (; j < block->numOfCols; ++j) {
if (udfColDataIsNull(block->udfCols[j], i)) {
udfColDataSetNull(resultCol, i);
udfTrace("block:%p, row:%d result is null since col:%d is null", block, i, j);
break;
}

char* colData = udfColDataGetData(block->udfCols[j], i);
result &= *(int32_t*)colData;
udfTrace("block:%p, row:%d col:%d data:%d", block, i, j, *(int32_t*)colData);
}

if (j == block->numOfCols) {
udfColDataSet(resultCol, i, (char*)&result, false);
udfTrace("block:%p, row:%d result is %d", block, i, result);
}
}

resultData->numOfRows = block->numOfRows;
udfTrace("block:%p, processing completed", block);

return TSDB_CODE_SUCCESS;
}
11 changes: 11 additions & 0 deletions docs/examples/udf/compile_udf.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
set +e

rm -rf /tmp/udf/libbitand.so /tmp/udf/libsqrsum.so /tmp/udf/libgpd.so
mkdir -p /tmp/udf
echo "compile udf bit_and and sqr_sum"
Comment on lines +3 to +5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The script has inconsistencies. It attempts to remove libsqrsum.so and the echo message mentions sqr_sum, but this UDF is not compiled in this script. The compiled UDF is l2norm. Please update the script to be consistent.

Suggested change
rm -rf /tmp/udf/libbitand.so /tmp/udf/libsqrsum.so /tmp/udf/libgpd.so
mkdir -p /tmp/udf
echo "compile udf bit_and and sqr_sum"
rm -rf /tmp/udf/libbitand.so /tmp/udf/libl2norm.so /tmp/udf/libgpd.so
mkdir -p /tmp/udf
echo "compile udf bit_and, l2norm and gpd"

gcc -fPIC -shared cases/12-UDFs/sh/bit_and.c -I../../include/libs/function/ -I../../include/client -I../../include/util -o /tmp/udf/libbitand.so
gcc -fPIC -shared cases/12-UDFs/sh/l2norm.c -I../../include/libs/function/ -I../../include/client -I../../include/util -o /tmp/udf/libl2norm.so
gcc -fPIC -shared cases/12-UDFs/sh/gpd.c -I../../include/libs/function/ -I../../include/client -I../../include/util -o /tmp/udf/libgpd.so
Comment on lines +6 to +8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The source file paths in the gcc commands are incorrect. They point to the old location of the files (e.g., cases/12-UDFs/sh/bit_and.c). Since this script is now in docs/examples/udf, it should reference the source files in the same directory. Also, the relative include paths (-I../../...) are likely incorrect depending on where this script is expected to be run from. Assuming it's run from its own directory, the include paths should be adjusted to point to the project root.

Suggested change
gcc -fPIC -shared cases/12-UDFs/sh/bit_and.c -I../../include/libs/function/ -I../../include/client -I../../include/util -o /tmp/udf/libbitand.so
gcc -fPIC -shared cases/12-UDFs/sh/l2norm.c -I../../include/libs/function/ -I../../include/client -I../../include/util -o /tmp/udf/libl2norm.so
gcc -fPIC -shared cases/12-UDFs/sh/gpd.c -I../../include/libs/function/ -I../../include/client -I../../include/util -o /tmp/udf/libgpd.so
gcc -fPIC -shared bit_and.c -I../../../include/libs/function/ -I../../../include/client -I../../../include/util -o /tmp/udf/libbitand.so
gcc -fPIC -shared l2norm.c -I../../../include/libs/function/ -I../../../include/client -I../../../include/util -o /tmp/udf/libl2norm.so
gcc -fPIC -shared gpd.c -I../../../include/libs/function/ -I../../../include/client -I../../../include/util -o /tmp/udf/libgpd.so

echo "debug show /tmp/udf/*.so"
ls /tmp/udf/*.so

128 changes: 128 additions & 0 deletions docs/examples/udf/extract_avg.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "taos.h"
#include "taoserror.h"
#include "taosudf.h"

// Define a structure to store sum and count
typedef struct {
double sum;
int count;
} SumCount;

// initialization function
DLL_EXPORT int32_t extract_avg_init() {
udfTrace("extract_avg_init: Initializing UDF");
return TSDB_CODE_SUCCESS;
}

DLL_EXPORT int32_t extract_avg_start(SUdfInterBuf *interBuf) {
int32_t bufLen = sizeof(SumCount);
if (interBuf->bufLen < bufLen) {
udfError("extract_avg_start: Failed to execute UDF since input buflen:%d < %d", interBuf->bufLen, bufLen);
return TSDB_CODE_UDF_INVALID_BUFSIZE;
}

// Initialize sum and count
SumCount *sumCount = (SumCount *)interBuf->buf;
sumCount->sum = 0.0;
sumCount->count = 0;

interBuf->numOfResult = 0;

udfTrace("extract_avg_start: Initialized sum=0.0, count=0");
return TSDB_CODE_SUCCESS;
}

DLL_EXPORT int32_t extract_avg(SUdfDataBlock *inputBlock, SUdfInterBuf *interBuf, SUdfInterBuf *newInterBuf) {
udfTrace("extract_avg: Processing data block with %d rows", inputBlock->numOfRows);

// Check the number of columns in the input data block
if (inputBlock->numOfCols != 1) {
udfError("extract_avg: Invalid number of columns. Expected 1, got %d", inputBlock->numOfCols);
return TSDB_CODE_UDF_INVALID_INPUT;
}

// Get the input column
SUdfColumn *inputCol = inputBlock->udfCols[0];

if (inputCol->colMeta.type != TSDB_DATA_TYPE_VARCHAR) {
udfError("extract_avg: Invalid data type. Expected VARCHAR, got %d", inputCol->colMeta.type);
return TSDB_CODE_UDF_INVALID_INPUT;
}

// Read the current sum and count from interBuf
SumCount *sumCount = (SumCount *)interBuf->buf;
udfTrace("extract_avg: Starting with sum=%f, count=%d", sumCount->sum, sumCount->count);

for (int i = 0; i < inputBlock->numOfRows; i++) {
if (udfColDataIsNull(inputCol, i)) {
udfTrace("extract_avg: Skipping NULL value at row %d", i);
continue;
}

char *buf = (char *)udfColDataGetData(inputCol, i);

char data[64];
memset(data, 0, 64);
memcpy(data, varDataVal(buf), varDataLen(buf));
Comment on lines +67 to +69
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a critical buffer overflow vulnerability here. The data buffer is allocated on the stack with a fixed size of 64 bytes, but memcpy copies varDataLen(buf) bytes into it. According to the documentation, the input varchar can be up to 128 bytes, which will overflow the buffer. You should use a safe copy mechanism, for example by using a correctly-sized buffer and snprintf.

        char data[129];
        snprintf(data, sizeof(data), "%.*s", (int)varDataLen(buf), varDataVal(buf));


udfTrace("extract_avg: Processing row %d, data='%s'", i, data);

char *rest = data;
char *token;
while ((token = strtok_r(rest, ",", &rest))) {
while (*token == ' ') token++;
int tokenLen = strlen(token);
while (tokenLen > 0 && token[tokenLen - 1] == ' ') token[--tokenLen] = '\0';

if (tokenLen == 0) {
udfTrace("extract_avg: Empty string encountered at row %d", i);
continue;
}

char *endPtr;
double value = strtod(token, &endPtr);

if (endPtr == token || *endPtr != '\0') {
udfError("extract_avg: Failed to convert string '%s' to double at row %d", token, i);
continue;
}

sumCount->sum += value;
sumCount->count++;
udfTrace("extract_avg: Updated sum=%f, count=%d", sumCount->sum, sumCount->count);
}
}

newInterBuf->bufLen = sizeof(SumCount);
newInterBuf->buf = (char *)malloc(newInterBuf->bufLen);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This function allocates memory for newInterBuf->buf using malloc. However, other UDF examples suggest that the framework pre-allocates this buffer. If the framework does not free this memory, it will lead to a memory leak. Please verify the memory management contract for newInterBuf. If the buffer is pre-allocated, you should write directly into it (after checking its size) instead of allocating new memory.

if (newInterBuf->buf == NULL) {
udfError("extract_avg: Failed to allocate memory for newInterBuf");
return TSDB_CODE_UDF_INTERNAL_ERROR;
}
memcpy(newInterBuf->buf, sumCount, newInterBuf->bufLen);
newInterBuf->numOfResult = 0;

udfTrace("extract_avg: Final sum=%f, count=%d", sumCount->sum, sumCount->count);
return TSDB_CODE_SUCCESS;
}

DLL_EXPORT int32_t extract_avg_finish(SUdfInterBuf *interBuf, SUdfInterBuf *result) {
SumCount *sumCount = (SumCount *)interBuf->buf;

double avg = (sumCount->count > 0) ? (sumCount->sum / sumCount->count) : 0.0;

*(double *)result->buf = avg;
result->bufLen = sizeof(double);
result->numOfResult = sumCount->count > 0 ? 1 : 0;

udfTrace("extract_avg_finish: Final result=%f (sum=%f, count=%d)", avg, sumCount->sum, sumCount->count);
return TSDB_CODE_SUCCESS;
}

DLL_EXPORT int32_t extract_avg_destroy() {
udfTrace("extract_avg_destroy: Cleaning up UDF");
return TSDB_CODE_SUCCESS;
}
46 changes: 46 additions & 0 deletions docs/examples/udf/gpd.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#ifdef LINUX
#include <unistd.h>
#endif
#ifdef WINDOWS
#include <windows.h>
#endif
#include "taosudf.h"

TAOS* taos = NULL;

DLL_EXPORT int32_t gpd_init() {
return 0;
}

DLL_EXPORT int32_t gpd_destroy() {
return 0;
}

DLL_EXPORT int32_t gpd(SUdfDataBlock* block, SUdfColumn *resultCol) {
SUdfColumnMeta *meta = &resultCol->colMeta;
meta->bytes = 4;
meta->type = TSDB_DATA_TYPE_INT;
meta->scale = 0;
meta->precision = 0;

SUdfColumnData *resultData = &resultCol->colData;
resultData->numOfRows = block->numOfRows;
for (int32_t i = 0; i < resultData->numOfRows; ++i) {
int64_t* calc_ts = (int64_t*)udfColDataGetData(block->udfCols[0], i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variable calc_ts is assigned a value but is never used. This is dead code and should be removed to improve clarity.

char* varTbname = udfColDataGetData(block->udfCols[1], i);
char* varDbname = udfColDataGetData(block->udfCols[2], i);

char dbName[256] = {0};
char tblName[256] = {0};
memcpy(dbName, varDataVal(varDbname), varDataLen(varDbname));
memcpy(tblName, varDataVal(varTbname), varDataLen(varTbname));
Comment on lines +38 to +39
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a critical buffer overflow vulnerability here. dbName and tblName are fixed-size buffers of 256 bytes, but memcpy copies varDataLen(...) bytes without checking if the length exceeds the buffer size. This can lead to a buffer overflow if the database or table names are too long. Use a safe copy function like snprintf to prevent this.

    snprintf(dbName, sizeof(dbName), "%.*s", (int)varDataLen(varDbname), varDataVal(varDbname));
    snprintf(tblName, sizeof(tblName), "%.*s", (int)varDataLen(varTbname), varDataVal(varTbname));

printf("%s, %s\n", dbName, tblName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of printf is generally discouraged in UDFs as it writes to the standard output of the UDF process, which may not be monitored. It's better to use the provided logging functions like udfTrace for debugging information.

    udfTrace("%s, %s", dbName, tblName);

int32_t result = 0;
udfColDataSet(resultCol, i, (char*)&result, false);
}

return 0;
}
Loading
Loading