Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support BigQuery exporting results to Parquet directly #155

Closed
Tracked by #150
kokokuo opened this issue Apr 6, 2023 · 0 comments
Closed
Tracked by #150

Support BigQuery exporting results to Parquet directly #155

kokokuo opened this issue Apr 6, 2023 · 0 comments

Comments

@kokokuo
Copy link
Contributor

kokokuo commented Apr 6, 2023

What’s the problem you're trying to solve

In #150, In order to enhance our query performance after users send the API request to run our data endpoint to get the result from the data source. We need to provide a Caching (pre-loading) Layer with the duckDB to enhance query performance.

Describe the solution you’d like

According to #153, we should make sure the DataSource have export method contract or not, if not please add it to make your BQDataSource could implement it.

export interface ExportOptions {
  // The sql query result to export
  sql: string;
  // The directory to export result to file
  directory: string;
  // The profile name to select to export data
  profileName: string;
  // export file format type
  type: CacheLayerStoreFormatType | string;
}

..... 

@VulcanExtension(TYPES.Extension_DataSource, { enforcedId: true })
export abstract class DataSource<
  C = any,
  PROFILE = Record<string, any>
> extends ExtensionBase<C> {
  private profiles: Map<string, Profile<PROFILE>>;

  constructor(
    @inject(TYPES.ExtensionConfig) config: C,
    @inject(TYPES.ExtensionName) moduleName: string,
    @multiInject(TYPES.Profile) @optional() profiles: Profile[] = []
  ) {
    super(config, moduleName);
    this.profiles = profiles.reduce(
      (prev, curr) => prev.set(curr.name, curr),
      new Map()
    );
  }
  ....

  /**
   * Export query result data to cache file for cache layer loader used
   */
  public export(options: ExportOptions): Promise<void> {
    throw new Error(`Export method not implemented`);
  }

  .....
}

Additional Context

BigQuery support export the result to parquet by SQL syntax, refer export data to Parquet format by sql, and use @google-cloud/bigquery to send a query.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In process
Development

No branches or pull requests

2 participants