Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Routed_sender tests fail because of jest/node bug after esm conversion #861

Open
sotojn opened this issue Jun 29, 2024 · 0 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@sotojn
Copy link
Contributor

sotojn commented Jun 29, 2024

After converting standard-assets to esm I came across a bug when running tests related to the routed_sender operation. The two problematic tests files are listed here:

The test will fail when the test harness gets initialized and starts importing all the necessary operations for the job. Depending on what node version you're running you could get two variations of the error.

node 18.19.1:

    _err:  Error: request for '@terascope/utils' is not yet fulfilled
        at SourceTextModule.link (node:internal/vm/module:200:17)
        at Runtime.linkAndEvaluateModule (/Users/jsoto/Workspace/TerasliceAssets/standard-assets/node_modules/jest-runtime/build/index.js:708:5)
        at importModuleDynamicallyWrapper (node:internal/vm/module:429:15)
        at OperationLoader.require (/Users/jsoto/Workspace/TerasliceAssets/standard-assets/node_modules/@terascope/job-components/src/operation-loader/loader.ts:347:18)
        at OperationLoader.loadProcessor (/Users/jsoto/Workspace/TerasliceAssets/standard-assets/node_modules/@terascope/job-components/src/operation-loader/loader.ts:153:23)
        at /Users/jsoto/Workspace/TerasliceAssets/standard-assets/node_modules/@terascope/job-components/src/job-validator.ts:66:17
        at /Users/jsoto/Workspace/TerasliceAssets/standard-assets/node_modules/p-map/index.js:57:22 {
      code: 'ERR_VM_MODULE_LINK_FAILURE'
    }

node 22.2.0:

_err: Error: request for '@terascope/utils' is not in cache
    at SourceTextModule.link (node:internal/vm/module:204:17)
    at Runtime.linkAndEvaluateModule (/Users/jsoto/Workspace/TerasliceAssets/standard-assets/node_modules/jest-runtime/build/index.js:708:5)
    at importModuleDynamicallyWrapper (node:internal/vm/module:436:15)
    at OperationLoader.require (/Users/jsoto/Workspace/TerasliceAssets/standard-assets/node_modules/@terascope/job-components/src/operation-loader/loader.ts:348:23)
    at OperationLoader.loadProcessor (/Users/jsoto/Workspace/TerasliceAssets/standard-assets/node_modules/@terascope/job-components/src/operation-loader/loader.ts:153:23)
    at /Users/jsoto/Workspace/TerasliceAssets/standard-assets/node_modules/@terascope/job-components/src/job-validator.ts:66:17
    at /Users/jsoto/Workspace/TerasliceAssets/standard-assets/node_modules/p-map/index.js:57:22 {
   code: 'ERR_VM_MODULE_LINK_FAILURE'
  }

I believe this is caused by a node vm problem with jest and node on bridging the gab with commonjs files and the esm syntax.
Here are references that led to this conclusion:

A potential fix could be to convert @terascope/utils to esm but that may or may not work.

I have a hotfix that seems to overcome this by importing the routed_sender processor.ts file at the beginning of the two test files. Although it isn't perfect, we get the benefit of still having test coverage for this operation.

To be sure things are still working, I have done some manual testing for this operation to validate its functionality. I started a k8s cluster with the following teraslice configuration:

terafoundation:
  environment: "development"
  log_level: debug
  connectors:
    elasticsearch-next:
      es-1:
        node:
          - "http://elasticsearch.services-dev1:9200"
      es-2:
        node:
          - "http://elasticsearch2.services-dev1:9200"
    kafka:
      default:
        brokers:
          - "cpkafka.services-dev1:9092"
    s3:
      default:
        endpoint: "http://minio.services-dev1:9000"
        accessKeyId: "minioadmin"
        secretAccessKey: "minioadmin"
        forcePathStyle: true
        sslEnabled: false
        region: "us-east-1"
teraslice:
  state:
    connection: es-1
  asset_storage_connection_type: s3
  worker_disconnect_timeout: 60000
  node_disconnect_timeout: 60000
  slicer_timeout: 60000
  shutdown_timeout: 30000
  assets_directory: "/app/assets/"
  autoload_directory: "/app/autoload"
  cluster_manager_type: "kubernetes"
  master: true
  master_hostname: "127.0.0.1"
  kubernetes_image: "teraslice-workspace:e2e-nodev18.19.1"
  kubernetes_image_pull_secrets:
    - "docker-tera1-secret"
  kubernetes_namespace: "ts-dev1"
  kubernetes_overrides_enabled: true
  kubernetes_priority_class_name: "high-priority"
  name: "ts-dev1"
  cpu: 1
  memory: 536870912

I first created 10,000 records in an s3 bucket running this job:

{
    "name": "data-to-s3",
    "lifecycle": "once",
    "workers": 2,
    "assets": [
        "standard",
        "file"
    ],
    "operations": [
        {
            "_op": "data_generator",
            "size": 10000
        },
        {
            "_op": "s3_exporter",
            "path": "test_bucket",
            "format": "ldjson"
        }
    ]
}

Then ran a job with the routed sender that used the hash_router to make 4 partitions in which I can route using the routed_sender operation:

{
    "name": "s3-route-to-es",
    "workers": 2,
    "lifecycle": "once",
    "assets": [
        "standard",
        "elasticsearch",
        "file"
    ],
    "apis": [
        {
            "_name": "elasticsearch_sender_api",
            "index": "result-index",
            "size": 10000
        }
    ],
    "operations": [
        {
            "_op": "s3_reader",
            "path": "test_bucket",
            "size": 10000,
            "format": "ldjson"
        },
        {
            "_op": "hash_router",
            "fields": [
                "ip",
                "uuid"
            ],
            "partitions": 4
        },
        {
            "_op": "routed_sender",
            "api_name": "elasticsearch_sender_api",
            "routing": {
                "0": "es-1",
                "1": "es-1",
                "2": "es-2",
                "**": "es-2"
            }
        }
    ]
    }
}

This resulted in four separate indices in two separate elasticsearch databases. The combined total records added up to the expected total of 10,000 records.

curling es-1 indices:

health status index                      uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   ts-dev1__ex                vSHIh5BsTCmLcq5pW80XwA   5   1          5          106    264.4kb        264.4kb
yellow open   ts-dev1__jobs              CaWHx099SFWl2-tACZycnQ   5   1          2            3     27.6kb         27.6kb
yellow open   ts-dev1__assets            sIPcJ2WcTUaA0mTjmc6pBg   5   1          4            0     24.9kb         24.9kb
yellow open   result-index-1             7r0DolBhSlGNcyS66lVAYQ   1   1       2575            0      2.1mb          2.1mb
yellow open   ts-dev1__state-2024.06     Qn1zeMEgTS2KdgKliIE9OQ   5   1       1458         2163    918.8kb        918.8kb
yellow open   result-index-0             x4Iil-EbSuiekJlClNU4yA   1   1       2506            0      2.1mb          2.1mb
yellow open   ts-dev1__analytics-2024.06 gkN1mdBAQqm8kyGzw58YHQ   5   1       1344            0    324.2kb        324.2kb

curling es-2 indices:

health status index          uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   result-index-3 fAVQut8pRpaIKL4AL023hw   1   1       2476            0      2.1mb          2.1mb
yellow open   result-index-2 BuuBA25PRZipifwhdRVCsQ   1   1       2443            0        2mb            2mb
@sotojn sotojn self-assigned this Jun 29, 2024
@sotojn sotojn added the bug Something isn't working label Jun 29, 2024
@sotojn sotojn mentioned this issue Jun 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant