Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: move proxy logic into enclave #174

Merged
merged 26 commits into from
Sep 18, 2024
Merged

feat: move proxy logic into enclave #174

merged 26 commits into from
Sep 18, 2024

Conversation

dangush
Copy link
Contributor

@dangush dangush commented Aug 21, 2024

Summary

Move the listen.sh/listen.rs script logic into an enclave service.

Implementation

Per @hu55a1n1, this is done by creating a new service which will subscribe to a WS stream upon being requested to via grpcurl

The resulting UX is that it requires two processes / shells, just like how it was with the listen script. One needs to run the enclave which serves the tonic server, and one that makes a request to the Listen endpoint which runs the websocket subscriber and makes Run requests. IMO, this is undesirable UX even if having two processes is unavoidable, so we should add whatever tooling and syntactic sugar we can around it.

Since fully "running" the enclave logic now requires a way of making a grpcurl request, I'm thinking that we should provide a cleaner tool for this in the CLI. This command would basically do what running the listen script did before.

quartz enclave run --service "" --endpoint "" --request ""

Thoughts:

  1. The 'service' argument is bloaty and IMO breaks abstractions. But if we want the user to design their quartz app enclaves with multiple tonic services chained together (like how the listen service is rn) then something like this would be necessary

  2. Otherwise, it could be simplified to take no arguments and only provide access to a standardized, required listen endpoint defined in the additional service (which can be required from the quartz dev like Quartz Core). However, this makes the quartz framework more opinionated / less flexible (devs may not want listen functionality at all)

  3. I don't think I can use the tonic client to dynamically call endpoints so this would have to be implemented with something like grpcurl under the hood. Idk I guess makes no real difference, just something I noticed

I'd appreciate feedback on the matter of the cli tool. Will implement after dev is merged

@dusterbloom
Copy link
Contributor

I personally like the approach with --flags as it could be made to work with defaults (maybe set in a config file) and still allow developers to specify when running the enclave.
I am not sure about the naming of it since now it sounds like running is both running the enclave / starting a listening service. Maybe quartz enclave activate/connect --service "" --endpoint "" --request "" ?

On the grpcurl under the hood VS doing dynamic calls with tonic, my intern points out that it is indeed doable and would probably be better long term because we get more control than with grpcurl. He also suggest to choose depending on time as grpcurl is probably faster.

@hu55a1n1
Copy link
Member

Good points @dangush! After thinking about this for some time and looking at the current impl I feel like the better/easier short term impl would be to continue to support gRPC as the defacto quartz enclave protocol and write a separate WebSocket client/subscriber (non-tonic) service that calls into the relevant gRPC services based on the events that are relevant to them.

The reason to support gRPC is so we can still support an external 'host' who can trigger enclave code so we don't depend entirely on WebSockets (this is useful for debugging and gives us more time to think about protocol and software design before we undertake a more substantial refactoring effort). Also, the core service still depends on the 'host' to initiate the handshake and doesn't listen to websockets, so that's another reason to keep gRPC support.

So, this impl basically requires the following API ->

  • 2 async protocol services/listeners operating concurrently. This is fairly easy to achieve with tokio-tungstenite (for websockets) and tonic for gRPC.
  • The ws service must have handles to the gRPC services to be able to call methods over them. This is achieved by storing an Arc<Mutex<Service>> wrapper per service in the ws service. Ideally, we design the API in a way that abstracts away all these details.
  • Every gRPC service must be able to specify relevant ws logic, i.e. which event do they want to listen to, what data to fetch per event, etc. Ideally this code is designed such that both the core and user defined services can reuse it. This could look like a trait or a closure.
  • We could also use the technique described in Reduce boilerplate in quartz app enclave #135 to hide all of the websocket related code.

@dangush
Copy link
Contributor Author

dangush commented Aug 21, 2024

my intern

is this intern's salary $20/mo? 🤣 @dusterbloom

@dangush
Copy link
Contributor Author

dangush commented Aug 21, 2024

@hu55a1n1 based on your last bullet point, am I right in understanding that you want the enclave to only be interfaced with via websocket, even for one-off requests to Core services?

Were you thinking that this websocket listener would be ran in the enclave's main.rs function, after serving the tonic server? In other words, we're using multithreading inside the enclave to present the user with only one enclave process to handle?

  • in this case, wouldn't there still be an auxiliary process needed to route websocket subscription data from the node to the enclave's websocket listener ?

Every gRPC service must be able to specify relevant ws logic, i.e. which event do they want to listen to, what data to fetch per event, etc. Ideally this code is designed such that both the core and user defined services can reuse it. This could look like a trait or a closure.

Why would this be specified by the gRPC service and not defined in the websocket logic? To clarify, when you say "every" here, that's because you're including core services too, right? So our websocket listener would have to differentiate between Core service requests and websocket data from the chain, and route various gRPC endpoints between core and application appropriately?

It seems difficult to effectively define all the parameters under which a quartz dev may want to trigger their gRPC service. Wouldn't it make more sense to decouple the gRPC logic with whatever triggers it? The MTCS and transfers quartz apps both use the pattern of watching for an event but doesn't necessarily have to hold for all quartz apps. This convo seems like one frustrating debate over flexibility vs simplicity haha.

Regardless, in the case of doing this, what are the parameters that we want to define for a gRPC service? Thinking through writing:

  1. An obvious one is a tendermint-rpc::Query object to listen to, as listen.rs currently does:
    let mut subs = client
        .subscribe(Query::from(EventType::Tx).and_contains("wasm.action", "init_clearing"))
  1. This subscription returns an iterator over Event types so presumably we have the gRPC logic implement a trait like validate_event(event: Event) -> bool which returns whether or not the event should trigger the rpc.

  2. But then what if the data provided to the gRPC logic needs to be different based on what the event is? One approach is just giving the gRPC logic the event itself, and letting it generate whatever data inputs it needs. This may reduce unnecessary network calls to the gRPC endpoints, I guess, but it still crosses abstraction barriers as the gRPC logic now needs to transform/"validate" inputs as opposed to only operate on them.
    So I guess the validate_event trait should return a request a Result<Request, Error> with the request object ready to submit to the endpoint

@hu55a1n1
Copy link
Member

based on your last bullet point, am I right in understanding that you want the enclave to only be interfaced with via websocket, even for one-off requests to Core services?

No, actually I think there should be two interfaces but the API should be such that specifying one should be sufficient (i.e. the websocket calls can be inferred from the gRPC ones - will try to add some example code to demonstrate this but it's more of a DevX improvement)

in this case, wouldn't there still be an auxiliary process needed to route websocket subscription data from the node to the enclave's websocket listener ?

I think we can use tokio's tasks for this. e.g. by @dusterbloom's intern 😅 ->

use tonic::transport::Server;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Initialize your gRPC service
    let my_service = MyGrpcService::default();

    // Clone the service if you need to share it across tasks
    let cloned_service = my_service.clone();

    // Start the WebSocket listener in a separate task
    tokio::spawn(async move {
        websocket_listener(cloned_service, "ws://example.com/socket").await;
    });

    // Start the gRPC server
    Server::builder()
        .add_service(MyGrpcServiceServer::new(my_service))
        .serve("127.0.0.1:50051".parse()?)
        .await?;

    Ok(())
}

Why would this be specified by the gRPC service and not defined in the websocket logic?

Good point! I should have worded by comment differently. What I meant was that ideally users only had to write one (gRPC) service, and then implement a trait or something to specify what must be done when a relevant websocket event was received. All this with good DevX and little boilerplate.

This convo seems like one frustrating debate over flexibility vs simplicity haha.

Interesting! I think I was working under the assumption that every app will want to react based on an event. Maybe that's not true.

I propose we work on an ADR to discuss this further. Or maybe we should do more sync brainstorming meetings.

@hu55a1n1
Copy link
Member

This is sort of what I had in mind ->

#[tokio::main(flavor = "current_thread")]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    /* ... */
    
    QuartzServer::new(config.clone(), sk.clone(), attestor.clone())
        .add_service(TransfersServer::new(TransfersService::new(
            config, sk, attestor,
        )))
        .serve(args.rpc_addr)
        .await?;

    Ok(())
}

pub struct QuartzServer {
    pub router: Router,
}

impl QuartzServer {
    pub fn new(
        config: Config,
        sk: Arc<Mutex<Option<SigningKey>>>,
        attestor: DefaultAttestor,
    ) -> Self {
        Self {
            router: Server::builder().add_service(CoreServer::new(CoreService::new(
                config,
                sk.clone(),
                attestor.clone(),
            ))),
        }
    }

    pub fn add_service<S>(mut self, service: S) -> Self
    where
        S: Service<Request<BoxBody>, Response = Response<BoxBody>, Error = Infallible>
            + NamedService
            + Clone
            + Send
            + 'static,
        S::Future: Send + 'static,
    {
        self.router = self.router.add_service(service);
        // add route to ws listener
        self
    }

    pub async fn serve(self, addr: SocketAddr) -> Result<(), Error> {
        // spawn ws listener as a tokio task
        self.router.serve(addr).await
    }
}

pub enum TmAbciEvent { /* ... */ }

pub struct WebsocketListener {
    handlers: Vec<Box<dyn WebSocketHandler<Event=TmAbciEvent>>>
}

And require all services to implement WebSocketHandler ->

pub trait WebSocketHandler {
    type Event;

    fn handle_event(&self, event: Self::Event) -> Result<(), Error>;
}

And that's how these services define how they want their relevant websocket events to be handled and how they'd call methods on themselves.

@dangush
Copy link
Contributor Author

dangush commented Aug 29, 2024

@hu55a1n1

here's how I've implemented it:

#[async_trait::async_trait]
pub trait WebSocketListener: Send + 'static {
    async fn listen(&self) -> Result<(), Error>;
}

pub struct QuartzServer {
    pub router: Router,
    ws_listeners: Vec<Box<dyn WebSocketListener>>,
}

impl QuartzServer {
    pub fn new(
        config: Config,
        sk: Arc<Mutex<Option<SigningKey>>>,
        attestor: DefaultAttestor,
    ) -> Self {
        let core_service = CoreServer::new(CoreService::new(config, sk.clone(), attestor.clone()));

        Self {
            router: Server::builder().add_service(core_service),
            ws_listeners: Vec::new(),
        }
    }

    pub fn add_service<S>(mut self, service: S) -> Self
    where
        S: Service<
                http::request::Request<BoxBody>,
                Response = http::response::Response<BoxBody>,
                Error = Infallible,
            >
            + WebSocketListener
            + Send
            + Clone
            + 'static
            + NamedService,
        S::Future: Send + 'static,
    {
        self.router = self.router.add_service(service.clone());
        self.ws_listeners.push(Box::new(service));

        self
    }
 }

I'd like to avoid cloning the service struct if possible. To do this I want to box just the future returned by the listener function as opposed to the whole service.
However, doing this requires me to clone the service because I would have to move the service object into an async block in order to then call the listen() function. Just to visualize:

pub struct QuartzServer {
    pub router: Router,
    ws_listeners: Vec<Pin<Box<dyn Future<Output = Result<(), Error>> + Send>>>,
}

pub fn add_service<S>(mut self, service: S) -> Self
where
    S: Service<
            http::request::Request<BoxBody>,
            Response = http::response::Response<BoxBody>,
            Error = Infallible,
        >
        + WebSocketListener
        + Send
        + 'static
        + NamedService,
    S::Future: Send + 'static,
{
    self.router = self.router.add_service(service.clone());
    self.ws_listeners.push(Box::pin(async move {
        WebSocketListener::listen(&service).await
    }));

    self
}

This clone defeats the purpose of boxing only the future and not the entire service.

So I think I arrived at the point where you said

The ws service must have handles to the gRPC services to be able to call methods over them. This is achieved by storing an Arc<Mutex> wrapper per service in the ws service. Ideally, we design the API in a way that abstracts away all these details.

I tried doing some stuff to make the QuartzServer add_service function take an Arc<Mutex<S>> parameter instead, but ultimately I'm not seeing how it would be possible to A) make that work B) make it work without cloning the service somewhere anyway

Is this just a useless pursuit?

core/quartz/src/server.rs Outdated Show resolved Hide resolved
@dusterbloom
Copy link
Contributor

@dangush I was testing this to see if I could already upgrade to wasmd v.0.52.

The following happened:

  1. run cli with
$ MOCK_SGX=1 && rm -rf ../apps/transfers/.cache && cargo run -- --mock-sgx --app-dir "../apps/transfers/" dev --unsafe-trust-latest  --contract-manifest "../apps/transfers/contracts/Cargo.toml"   --init-msg '{"denom":"ucosm"}'
  1. all goes fine (with strange re-occuring of a pubkey across runs) up to npm run dev on the Front-end

  2. FE seems to work on transfer not on query_balance

  3. After slack exchange with Shoaib I tested with the wasmd commands from the terminal

execute transfer done from admin - I see the tx from explorer and websocat output too
deposit done - tx and websocat
query balance done - tx and websocat

  1. I ran a query to see the balance of the admin after the transfer and got "" in return
wasmd query wasm contract-state smart $CONTRACT  '{"get_balance":{"address":"wasm1mkrm9m8g0dzv5z73xg8yzlj6srqc72qru5xfv3"}}' 
data: ""

Upon more digging, I found this error in the cli log right after Processing transfer event and 2 counts to 3 from block_waitoor :
Error in event handler: Failed to communicate to relayer. status: InvalidArgument, message: "Invalid message", details: [], metadata: MetadataMap { headers: {} }

Copy link
Member

@hu55a1n1 hu55a1n1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dangush! Great stuff!👌 Just did a code review, going to test now.

apps/transfers/enclave/proto/transfers.proto Outdated Show resolved Hide resolved
apps/mtcs/enclave/src/wslistener.rs Show resolved Hide resolved
apps/mtcs/enclave/src/wslistener.rs Show resolved Hide resolved

// Send setoffs to mtcs contract on chain
let output =
wasmd_client.tx_execute(contract, chain_id, 2000000, &sender, json!(setoffs_msg))?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason why we're using the sender of the init_clearing msg here and not the CLI arg tx_sender? 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the gas is hardcoded which probably shouldn't be the case? (There are also concerns about non-determinism (retry, etc.) that this notion of sending txs from an enclave introduces - will open a separate issue to discuss them)

apps/transfers/enclave/src/cli.rs Outdated Show resolved Hide resolved
apps/transfers/enclave/src/wslistener.rs Show resolved Hide resolved
apps/transfers/enclave/src/wslistener.rs Show resolved Hide resolved
apps/transfers/enclave/src/wslistener.rs Show resolved Hide resolved
apps/transfers/enclave/src/wslistener.rs Show resolved Hide resolved
core/quartz/src/server.rs Show resolved Hide resolved
@hu55a1n1
Copy link
Member

I'm seeing this error when I run cargo build from the repo root ->

$ cargo build
warning: profiles for the non root package will be ignored, specify profiles at the workspace root:
package:   /home/vboxuser/Downloads/cycles-quartz/apps/mtcs/enclave/Cargo.toml
workspace: /home/vboxuser/Downloads/cycles-quartz/Cargo.toml
warning: profiles for the non root package will be ignored, specify profiles at the workspace root:
package:   /home/vboxuser/Downloads/cycles-quartz/cosmwasm/packages/tcbinfo/Cargo.toml
workspace: /home/vboxuser/Downloads/cycles-quartz/Cargo.toml
error: checksum for `anyhow v1.0.88` changed between lock files

this could be indicative of a few possible errors:

    * the lock file is corrupt
    * a replacement source in use (e.g., a mirror) returned a different checksum
    * the source itself may be corrupt in one way or another

unable to verify that `anyhow v1.0.88` is the same as when the lockfile was generated

Copy link
Member

@hu55a1n1 hu55a1n1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@dangush dangush merged commit 69c1f63 into main Sep 18, 2024
7 checks passed
@dangush dangush deleted the enclave-dcap branch September 18, 2024 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants