feat: move proxy logic into enclave #174

dangush · 2024-08-21T00:06:34Z

Summary

Move the listen.sh/listen.rs script logic into an enclave service.

Implementation

Per @hu55a1n1, this is done by creating a new service which will subscribe to a WS stream upon being requested to via grpcurl

The resulting UX is that it requires two processes / shells, just like how it was with the listen script. One needs to run the enclave which serves the tonic server, and one that makes a request to the Listen endpoint which runs the websocket subscriber and makes Run requests. IMO, this is undesirable UX even if having two processes is unavoidable, so we should add whatever tooling and syntactic sugar we can around it.

Since fully "running" the enclave logic now requires a way of making a grpcurl request, I'm thinking that we should provide a cleaner tool for this in the CLI. This command would basically do what running the listen script did before.

quartz enclave run --service "" --endpoint "" --request ""

Thoughts:

The 'service' argument is bloaty and IMO breaks abstractions. But if we want the user to design their quartz app enclaves with multiple tonic services chained together (like how the listen service is rn) then something like this would be necessary
Otherwise, it could be simplified to take no arguments and only provide access to a standardized, required listen endpoint defined in the additional service (which can be required from the quartz dev like Quartz Core). However, this makes the quartz framework more opinionated / less flexible (devs may not want listen functionality at all)
I don't think I can use the tonic client to dynamically call endpoints so this would have to be implemented with something like grpcurl under the hood. Idk I guess makes no real difference, just something I noticed

I'd appreciate feedback on the matter of the cli tool. Will implement after dev is merged

dusterbloom · 2024-08-21T08:56:13Z

I personally like the approach with --flags as it could be made to work with defaults (maybe set in a config file) and still allow developers to specify when running the enclave.
I am not sure about the naming of it since now it sounds like running is both running the enclave / starting a listening service. Maybe quartz enclave activate/connect --service "" --endpoint "" --request "" ?

On the grpcurl under the hood VS doing dynamic calls with tonic, my intern points out that it is indeed doable and would probably be better long term because we get more control than with grpcurl. He also suggest to choose depending on time as grpcurl is probably faster.

hu55a1n1 · 2024-08-21T15:46:52Z

Good points @dangush! After thinking about this for some time and looking at the current impl I feel like the better/easier short term impl would be to continue to support gRPC as the defacto quartz enclave protocol and write a separate WebSocket client/subscriber (non-tonic) service that calls into the relevant gRPC services based on the events that are relevant to them.

The reason to support gRPC is so we can still support an external 'host' who can trigger enclave code so we don't depend entirely on WebSockets (this is useful for debugging and gives us more time to think about protocol and software design before we undertake a more substantial refactoring effort). Also, the core service still depends on the 'host' to initiate the handshake and doesn't listen to websockets, so that's another reason to keep gRPC support.

So, this impl basically requires the following API ->

2 async protocol services/listeners operating concurrently. This is fairly easy to achieve with tokio-tungstenite (for websockets) and tonic for gRPC.
The ws service must have handles to the gRPC services to be able to call methods over them. This is achieved by storing an Arc<Mutex<Service>> wrapper per service in the ws service. Ideally, we design the API in a way that abstracts away all these details.
Every gRPC service must be able to specify relevant ws logic, i.e. which event do they want to listen to, what data to fetch per event, etc. Ideally this code is designed such that both the core and user defined services can reuse it. This could look like a trait or a closure.
We could also use the technique described in Reduce boilerplate in quartz app enclave #135 to hide all of the websocket related code.

dangush · 2024-08-21T18:06:56Z

my intern

is this intern's salary $20/mo? 🤣 @dusterbloom

dangush · 2024-08-21T18:34:55Z

@hu55a1n1 based on your last bullet point, am I right in understanding that you want the enclave to only be interfaced with via websocket, even for one-off requests to Core services?

Were you thinking that this websocket listener would be ran in the enclave's main.rs function, after serving the tonic server? In other words, we're using multithreading inside the enclave to present the user with only one enclave process to handle?

in this case, wouldn't there still be an auxiliary process needed to route websocket subscription data from the node to the enclave's websocket listener ?

Every gRPC service must be able to specify relevant ws logic, i.e. which event do they want to listen to, what data to fetch per event, etc. Ideally this code is designed such that both the core and user defined services can reuse it. This could look like a trait or a closure.

Why would this be specified by the gRPC service and not defined in the websocket logic? To clarify, when you say "every" here, that's because you're including core services too, right? So our websocket listener would have to differentiate between Core service requests and websocket data from the chain, and route various gRPC endpoints between core and application appropriately?

It seems difficult to effectively define all the parameters under which a quartz dev may want to trigger their gRPC service. Wouldn't it make more sense to decouple the gRPC logic with whatever triggers it? The MTCS and transfers quartz apps both use the pattern of watching for an event but doesn't necessarily have to hold for all quartz apps. This convo seems like one frustrating debate over flexibility vs simplicity haha.

Regardless, in the case of doing this, what are the parameters that we want to define for a gRPC service? Thinking through writing:

An obvious one is a tendermint-rpc::Query object to listen to, as listen.rs currently does:

    let mut subs = client
        .subscribe(Query::from(EventType::Tx).and_contains("wasm.action", "init_clearing"))

This subscription returns an iterator over Event types so presumably we have the gRPC logic implement a trait like validate_event(event: Event) -> bool which returns whether or not the event should trigger the rpc.
But then what if the data provided to the gRPC logic needs to be different based on what the event is? One approach is just giving the gRPC logic the event itself, and letting it generate whatever data inputs it needs. This may reduce unnecessary network calls to the gRPC endpoints, I guess, but it still crosses abstraction barriers as the gRPC logic now needs to transform/"validate" inputs as opposed to only operate on them.
So I guess the validate_event trait should return a request a Result<Request, Error> with the request object ready to submit to the endpoint

hu55a1n1 · 2024-08-21T20:22:52Z

based on your last bullet point, am I right in understanding that you want the enclave to only be interfaced with via websocket, even for one-off requests to Core services?

No, actually I think there should be two interfaces but the API should be such that specifying one should be sufficient (i.e. the websocket calls can be inferred from the gRPC ones - will try to add some example code to demonstrate this but it's more of a DevX improvement)

in this case, wouldn't there still be an auxiliary process needed to route websocket subscription data from the node to the enclave's websocket listener ?

I think we can use tokio's tasks for this. e.g. by @dusterbloom's intern 😅 ->

use tonic::transport::Server;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Initialize your gRPC service
    let my_service = MyGrpcService::default();

    // Clone the service if you need to share it across tasks
    let cloned_service = my_service.clone();

    // Start the WebSocket listener in a separate task
    tokio::spawn(async move {
        websocket_listener(cloned_service, "ws://example.com/socket").await;
    });

    // Start the gRPC server
    Server::builder()
        .add_service(MyGrpcServiceServer::new(my_service))
        .serve("127.0.0.1:50051".parse()?)
        .await?;

    Ok(())
}

Why would this be specified by the gRPC service and not defined in the websocket logic?

Good point! I should have worded by comment differently. What I meant was that ideally users only had to write one (gRPC) service, and then implement a trait or something to specify what must be done when a relevant websocket event was received. All this with good DevX and little boilerplate.

This convo seems like one frustrating debate over flexibility vs simplicity haha.

Interesting! I think I was working under the assumption that every app will want to react based on an event. Maybe that's not true.

I propose we work on an ADR to discuss this further. Or maybe we should do more sync brainstorming meetings.

hu55a1n1 · 2024-08-21T22:17:40Z

This is sort of what I had in mind ->

#[tokio::main(flavor = "current_thread")]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    /* ... */
    
    QuartzServer::new(config.clone(), sk.clone(), attestor.clone())
        .add_service(TransfersServer::new(TransfersService::new(
            config, sk, attestor,
        )))
        .serve(args.rpc_addr)
        .await?;

    Ok(())
}

pub struct QuartzServer {
    pub router: Router,
}

impl QuartzServer {
    pub fn new(
        config: Config,
        sk: Arc<Mutex<Option<SigningKey>>>,
        attestor: DefaultAttestor,
    ) -> Self {
        Self {
            router: Server::builder().add_service(CoreServer::new(CoreService::new(
                config,
                sk.clone(),
                attestor.clone(),
            ))),
        }
    }

    pub fn add_service<S>(mut self, service: S) -> Self
    where
        S: Service<Request<BoxBody>, Response = Response<BoxBody>, Error = Infallible>
            + NamedService
            + Clone
            + Send
            + 'static,
        S::Future: Send + 'static,
    {
        self.router = self.router.add_service(service);
        // add route to ws listener
        self
    }

    pub async fn serve(self, addr: SocketAddr) -> Result<(), Error> {
        // spawn ws listener as a tokio task
        self.router.serve(addr).await
    }
}

pub enum TmAbciEvent { /* ... */ }

pub struct WebsocketListener {
    handlers: Vec<Box<dyn WebSocketHandler<Event=TmAbciEvent>>>
}

And require all services to implement WebSocketHandler ->

pub trait WebSocketHandler {
    type Event;

    fn handle_event(&self, event: Self::Event) -> Result<(), Error>;
}

And that's how these services define how they want their relevant websocket events to be handled and how they'd call methods on themselves.

…mplementations

dangush · 2024-08-29T23:48:57Z

@hu55a1n1

here's how I've implemented it:

#[async_trait::async_trait]
pub trait WebSocketListener: Send + 'static {
    async fn listen(&self) -> Result<(), Error>;
}

pub struct QuartzServer {
    pub router: Router,
    ws_listeners: Vec<Box<dyn WebSocketListener>>,
}

impl QuartzServer {
    pub fn new(
        config: Config,
        sk: Arc<Mutex<Option<SigningKey>>>,
        attestor: DefaultAttestor,
    ) -> Self {
        let core_service = CoreServer::new(CoreService::new(config, sk.clone(), attestor.clone()));

        Self {
            router: Server::builder().add_service(core_service),
            ws_listeners: Vec::new(),
        }
    }

    pub fn add_service<S>(mut self, service: S) -> Self
    where
        S: Service<
                http::request::Request<BoxBody>,
                Response = http::response::Response<BoxBody>,
                Error = Infallible,
            >
            + WebSocketListener
            + Send
            + Clone
            + 'static
            + NamedService,
        S::Future: Send + 'static,
    {
        self.router = self.router.add_service(service.clone());
        self.ws_listeners.push(Box::new(service));

        self
    }
 }

I'd like to avoid cloning the service struct if possible. To do this I want to box just the future returned by the listener function as opposed to the whole service.
However, doing this requires me to clone the service because I would have to move the service object into an async block in order to then call the listen() function. Just to visualize:

pub struct QuartzServer {
    pub router: Router,
    ws_listeners: Vec<Pin<Box<dyn Future<Output = Result<(), Error>> + Send>>>,
}

pub fn add_service<S>(mut self, service: S) -> Self
where
    S: Service<
            http::request::Request<BoxBody>,
            Response = http::response::Response<BoxBody>,
            Error = Infallible,
        >
        + WebSocketListener
        + Send
        + 'static
        + NamedService,
    S::Future: Send + 'static,
{
    self.router = self.router.add_service(service.clone());
    self.ws_listeners.push(Box::pin(async move {
        WebSocketListener::listen(&service).await
    }));

    self
}

This clone defeats the purpose of boxing only the future and not the entire service.

So I think I arrived at the point where you said

The ws service must have handles to the gRPC services to be able to call methods over them. This is achieved by storing an Arc<Mutex> wrapper per service in the ws service. Ideally, we design the API in a way that abstracts away all these details.

I tried doing some stuff to make the QuartzServer add_service function take an Arc<Mutex<S>> parameter instead, but ultimately I'm not seeing how it would be possible to A) make that work B) make it work without cloning the service somewhere anyway

Is this just a useless pursuit?

…in a node url

core/quartz/src/server.rs

apps/mtcs/enclave/src/main.rs

…tz into enclave-dcap

dusterbloom · 2024-09-17T12:45:53Z

@dangush I was testing this to see if I could already upgrade to wasmd v.0.52.

The following happened:

run cli with

$ MOCK_SGX=1 && rm -rf ../apps/transfers/.cache && cargo run -- --mock-sgx --app-dir "../apps/transfers/" dev --unsafe-trust-latest  --contract-manifest "../apps/transfers/contracts/Cargo.toml"   --init-msg '{"denom":"ucosm"}'

all goes fine (with strange re-occuring of a pubkey across runs) up to npm run dev on the Front-end
FE seems to work on transfer not on query_balance
After slack exchange with Shoaib I tested with the wasmd commands from the terminal

execute transfer done from admin - I see the tx from explorer and websocat output too
deposit done - tx and websocat
query balance done - tx and websocat

I ran a query to see the balance of the admin after the transfer and got "" in return

wasmd query wasm contract-state smart $CONTRACT  '{"get_balance":{"address":"wasm1mkrm9m8g0dzv5z73xg8yzlj6srqc72qru5xfv3"}}' 
data: ""

Upon more digging, I found this error in the cli log right after Processing transfer event and 2 counts to 3 from block_waitoor :
Error in event handler: Failed to communicate to relayer. status: InvalidArgument, message: "Invalid message", details: [], metadata: MetadataMap { headers: {} }

hu55a1n1

Thanks @dangush! Great stuff!👌 Just did a code review, going to test now.

apps/transfers/enclave/proto/transfers.proto

apps/mtcs/enclave/src/wslistener.rs

hu55a1n1 · 2024-09-17T11:38:54Z

apps/mtcs/enclave/src/wslistener.rs

+
+    // Send setoffs to mtcs contract on chain
+    let output =
+        wasmd_client.tx_execute(contract, chain_id, 2000000, &sender, json!(setoffs_msg))?;


Any specific reason why we're using the sender of the init_clearing msg here and not the CLI arg tx_sender? 🤔

Also the gas is hardcoded which probably shouldn't be the case? (There are also concerns about non-determinism (retry, etc.) that this notion of sending txs from an enclave introduces - will open a separate issue to discuss them)

apps/transfers/enclave/src/cli.rs

apps/transfers/enclave/src/wslistener.rs

core/quartz/src/server.rs

hu55a1n1 · 2024-09-17T13:35:45Z

I'm seeing this error when I run cargo build from the repo root ->

$ cargo build
warning: profiles for the non root package will be ignored, specify profiles at the workspace root:
package:   /home/vboxuser/Downloads/cycles-quartz/apps/mtcs/enclave/Cargo.toml
workspace: /home/vboxuser/Downloads/cycles-quartz/Cargo.toml
warning: profiles for the non root package will be ignored, specify profiles at the workspace root:
package:   /home/vboxuser/Downloads/cycles-quartz/cosmwasm/packages/tcbinfo/Cargo.toml
workspace: /home/vboxuser/Downloads/cycles-quartz/Cargo.toml
error: checksum for `anyhow v1.0.88` changed between lock files

this could be indicative of a few possible errors:

    * the lock file is corrupt
    * a replacement source in use (e.g., a mirror) returned a different checksum
    * the source itself may be corrupt in one way or another

unable to verify that `anyhow v1.0.88` is the same as when the lockfile was generated

hu55a1n1

🎉

feat: move proxy logic into enclave

2d33346

dangush added 3 commits August 28, 2024 21:19

Merge branch 'main' into enclave-dcap

8f9c348

QuartzServer service

c9d1986

working listener traits. todo: slight restructure + fill full trait i…

acc5c5b

…mplementations

dangush added 4 commits August 29, 2024 22:21

changed websocketlistener trait definition to better support passing …

1a97d9d

…in a node url

update mtcs code to prev change

e840c95

Merge branch 'main' into enclave-dcap

b580686

brought in mtcs listener business logic

a1c987c

dangush mentioned this pull request Aug 30, 2024

Bug where cli config defaults is overwriting the quartz.toml #188

Closed

dangush added 4 commits September 3, 2024 23:21

working restructured listening script

871274e

get rid of unused contract field in ws_config

354cc90

fmt

7aaf074

Repalce tonic client usage with websocketlistener implementing type

7547ea6

dangush commented Sep 6, 2024

View reviewed changes

core/quartz/src/server.rs Outdated Show resolved Hide resolved

Merge branch 'main' into enclave-dcap

6752fb6

dangush requested a review from hu55a1n1 September 6, 2024 20:28

dangush mentioned this pull request Sep 6, 2024

Proposal: remove the proxy altogether for transfers and MTCS #100

Closed

llunaCreixent reviewed Sep 10, 2024

View reviewed changes

apps/mtcs/enclave/src/main.rs Outdated Show resolved Hide resolved

dangush added 4 commits September 11, 2024 19:29

listener rust impl for transfers

8f65095

Merge branch 'enclave-dcap' of github.com:informalsystems/cycles-quar…

3e44572

…tz into enclave-dcap

properly configure enclave node_url and tx_sender

eeff797

move Server trait bounds into QuartzServer func bounds

82f21ff

llunaCreixent mentioned this pull request Sep 12, 2024

Dirty changes that work for both FE apps #199

Closed

dangush added 3 commits September 12, 2024 12:00

Merge branch 'main' into enclave-dcap

7a82455

working transfers app without pop

15a005c

working towards working PoPs, tendermint light client async error

82fe70a

dusterbloom mentioned this pull request Sep 16, 2024

Upgrade transfers app to wasmd v0.51 or v0.52 #124

Closed

transfers e2e working with pop

66ebbb2

hu55a1n1 mentioned this pull request Sep 17, 2024

Prevent enclave from handling events until handshake is complete #209

Open

2 tasks

hu55a1n1 reviewed Sep 17, 2024

View reviewed changes

dangush force-pushed the enclave-dcap branch from c4bd111 to 66ebbb2 Compare September 17, 2024 20:48

dangush mentioned this pull request Sep 17, 2024

Devx improvements to WebSocketListener framework #208

Open

dangush added 2 commits September 17, 2024 14:46

fix compilation issue + address some pr review comments

598c015

fix query related bugs, add docs for websockethandler

8138f9b

dangush force-pushed the enclave-dcap branch from d554d2b to 8138f9b Compare September 18, 2024 02:21

dangush added 2 commits September 17, 2024 19:24

fmt

1d0f08c

Merge remote-tracking branch 'origin/main' into enclave-dcap

ec91be9

hu55a1n1 approved these changes Sep 18, 2024

View reviewed changes

misc cargo lock fixes and deleted redundant code from transfers.proto

b6085d0

dangush merged commit 69c1f63 into main Sep 18, 2024
7 checks passed

dangush deleted the enclave-dcap branch September 18, 2024 20:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: move proxy logic into enclave #174

feat: move proxy logic into enclave #174

dangush commented Aug 21, 2024 •

edited

Loading

dusterbloom commented Aug 21, 2024

hu55a1n1 commented Aug 21, 2024

dangush commented Aug 21, 2024 •

edited

Loading

dangush commented Aug 21, 2024 •

edited

Loading

hu55a1n1 commented Aug 21, 2024

hu55a1n1 commented Aug 21, 2024

dangush commented Aug 29, 2024 •

edited

Loading

dusterbloom commented Sep 17, 2024

hu55a1n1 left a comment

hu55a1n1 Sep 17, 2024

hu55a1n1 Sep 17, 2024

hu55a1n1 commented Sep 17, 2024

hu55a1n1 left a comment

feat: move proxy logic into enclave #174

feat: move proxy logic into enclave #174

Conversation

dangush commented Aug 21, 2024 • edited Loading

Summary

Implementation

dusterbloom commented Aug 21, 2024

hu55a1n1 commented Aug 21, 2024

dangush commented Aug 21, 2024 • edited Loading

dangush commented Aug 21, 2024 • edited Loading

hu55a1n1 commented Aug 21, 2024

hu55a1n1 commented Aug 21, 2024

dangush commented Aug 29, 2024 • edited Loading

dusterbloom commented Sep 17, 2024

hu55a1n1 left a comment

Choose a reason for hiding this comment

hu55a1n1 Sep 17, 2024

Choose a reason for hiding this comment

hu55a1n1 Sep 17, 2024

Choose a reason for hiding this comment

hu55a1n1 commented Sep 17, 2024

hu55a1n1 left a comment

Choose a reason for hiding this comment

dangush commented Aug 21, 2024 •

edited

Loading

dangush commented Aug 21, 2024 •

edited

Loading

dangush commented Aug 21, 2024 •

edited

Loading

dangush commented Aug 29, 2024 •

edited

Loading