Skip to content

rest_api: v1.8 GraphQL payload fails REST API config validation #2398

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
davidcotton opened this issue Mar 12, 2025 · 10 comments · Fixed by #2437
Closed

rest_api: v1.8 GraphQL payload fails REST API config validation #2398

davidcotton opened this issue Mar 12, 2025 · 10 comments · Fixed by #2437
Assignees
Labels
bug Something isn't working

Comments

@davidcotton
Copy link

dlt version

1.8.0, 1.8.1

Describe the problem

dlt version 1.8.0 introduced a new (and very useful) feature to extract query string and JSON parameters to define resource relationships.

A side-effect of that change is that it limits the JSON data we can send in the request body, for example a POST request to a GraphQL API, as this change now validates the content of the JSON body.

Prior to dlt 1.8 I could define a RESTAPIConfig like:

from dlt.sources.rest_api import RESTAPIConfig, rest_api_resources

config: RESTAPIConfig = {
    "client": {
        "base_url": "https://www.example.com",
    },
    "resources": [
        {
            "name": "artists",
            "endpoint": {
                "path": "",
                "method": "POST",
                "json": {
                    "query": """
                        query Artist {
                            artist(id: "123") {
                                id
                                name
                            }
                        }
                    """,
                },
                "data_selector": "data.artist",
            },
        },
    ],
}
yield from rest_api_resources(config)

However this now fails validation with a ValueError: Expression ' artist(id' defined in json is not valid. Valid expressions must start with one of: resources

I think this is caused by this change now validating the JSON param.

Is there a way we could either disable validation on a given resource - or provide a JSON body that isn't validated? Possibly this is just stretching the REST API source too far and I should just build a GraphQL source?

Expected behavior

No response

Steps to reproduce

from dlt.sources.rest_api import RESTAPIConfig, rest_api_resources

config: RESTAPIConfig = {
    "client": {
        "base_url": "https://www.example.com",
    },
    "resources": [
        {
            "name": "artists",
            "endpoint": {
                "path": "",
                "method": "POST",
                "json": {
                    "query": """
                        query Artist {
                            artist(id: "123") {
                                id
                                name
                            }
                        }
                    """,
                },
                "data_selector": "data.artist",
            },
        },
    ],
}
yield from rest_api_resources(config)

Operating system

macOS

Runtime environment

Local

Python version

3.12

dlt data source

REST

dlt destination

DuckDB

Other deployment details

No response

Additional information

No response

@burnash
Copy link
Collaborator

burnash commented Mar 13, 2025

Thanks for reporting this @davidcotton, and thanks for a clear example.
I was not fully aware of how rest_api could be used to query GraphQL. Yes, this issue is caused by your string interpolation. I need to ponder a bit on how to allow GraphQL expressions / disable interpolation for some cases.

@burnash
Copy link
Collaborator

burnash commented Mar 17, 2025

Hey @davidcotton how do you parametrize your GraphQL query? In you example I can see that there's a static id value set to 123 for demo reasons obviously, but how do you use it in real-life/prod code?

                "json": {
                    "query": """
                        query Artist {
                            artist(id: "123") {
                                id
                                name
                            }
                        }
                    """,
                },

e.g. is id a variable or is it a reference to another resource?
I need this input to figure out the optimal way to fix this interpolation issue.

@burnash burnash added the question Further information is requested label Mar 17, 2025
@burnash burnash moved this from Todo to In Progress in dlt core library Mar 17, 2025
@burnash
Copy link
Collaborator

burnash commented Mar 17, 2025

@davidcotton after thinking about it for a bit, I think the best way to fix this is to escape the curly braces in the strings:

{
    "json": {
        "query": """
            query Artist {{
                artist(id: "123") {{
                    id
                    name
                }}
            }}
        """,
    },
}

This would also allow to link parent resource from the query:

{
    "json": {
        "query": """
            query Artist {{
                artist(id: "{resources.artist_list.id}") {{
                    id
                    name
                }}
            }}
        """,
    },
}

Check out the example tests in the #2416. Sorry about this issue. Now I see that was actually a breaking change and we should've noted that in the dlt version upgrade.

@davidcotton
Copy link
Author

Hi @burnash, thanks for your quick responses and all your help looking into this issue!

I might be doing something wrong here, but I'm not sure your suggested solution will work.

Before I get into the details, unless this issue is troubling others, it might not be worth the effort to solve. I've been able to workaround this by extending the Github GraphQL client from one of the dlt examples, to create my own source.

The problem I've found with the suggested solution (dlt==1.8.1), is that if we escape the curly in the JSON body, e.g.

{
    "json": {
        "query": """
            query Artist {{
                artist(id: "123") {{
                    id
                    name
                }}
            }}
        """,
    },
}

this passes the validation check in the build_resource_dependency_graph(), but this json value is not "un-escaped" before sending.
I've put a debugger breakpoint where the dlt Requests wrapper sends the request and can see that the json variable sent in the request is {'query': '\nquery Artist {{\n artist(id: "123") {{\n id\n name\n }}\n}}\n'}.

@davidcotton
Copy link
Author

Apologies, your suggested solution does work if you link the parent resource, e.g.

{
    "json": {
        "query": """
            query Artist {{
                artist(id: "{resources.artist_list.id}") {{
                    id
                    name
                }}
            }}
        """,
    },
}

@burnash burnash added the bug Something isn't working label Mar 18, 2025
@burnash
Copy link
Collaborator

burnash commented Mar 18, 2025

@davidcotton thanks again for reporting this. You're right, the interpolation (and unescaping) happens only if a parent resource is linked. I've added a bug label to this issue.

@burnash burnash changed the title v1.8 GraphQL payload fails REST API config validation rest_api: v1.8 GraphQL payload fails REST API config validation Mar 18, 2025
@siraj101
Copy link

siraj101 commented Mar 18, 2025

Hello, I am having a similar issue with GraphQL,. In my case the Rest API resource sending the GraphQL query has a child relationship but no parent, so the above solution does not work for me.

@burnash any suggestions on how to make it work in my case?

@burnash
Copy link
Collaborator

burnash commented Mar 19, 2025

Hey @siraj101 we're going to release a fix for the top (parent) resources

@francescomucio
Copy link
Contributor

I have create a small PR for this:

  • it extracts only the placeholders surrounded by exactly { and }
  • but ignores if these placeholders are not resolved, because they could be correctly part of the GraphQL string

My suggestion would be to handle GraphQL requests differently from standard post requests, also because we need to handle errors differently.

@burnash
Copy link
Collaborator

burnash commented Mar 31, 2025

Hey @davidcotton @siraj101 @francescomucio
I've put together a proposal for a dlt GraphQL source that I'd love to get your thoughts on: dlt-hub/verified-sources#605
Would really appreciate your feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
4 participants