Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dispatch hook to allow modifying API4 SQL #32466

Closed
wants to merge 1 commit into from

Conversation

braders
Copy link
Contributor

@braders braders commented Mar 23, 2025

Overview

Dispatch hook to allow modifying API4 SQL.

Before

Any API4 request gets boiled down to a SQL query, based on the fields being selected, joined, updated etc. Extension authors have little opportunity to modify this final SQL, which is needed for a small number of advanced use-cases.

After

A hook civi.api4.generatedSql has been introduced (Question: is there a better name for this?). The SQL query is passed in by reference, allowing it to be edited.

Comments

You might be wondering why I'd want this...

I've been experimenting with making WordPress tables available through API4/ SearchKit/ FormBuilder (in theory the same would be possible with Drupal or Joomla too). This leads to some really interesting use-cases. For example, here I've added a tab to the contact screen, showing WordPress usermetadata associated with the contact's user (via joining uf_match):
Screenshot 2025-03-23 at 11 19 43

There are any number of use-cases this unblocks. For example, showing WooCommerce orders, WordPress comments or Caldera Forms submissions against a CiviCRM contact summary, creating reports combining WooCommerce income with CiviCRM contribution income etc.

The catch is that the WordPress and CiviCRM database are often different. From the installation guide:

CiviCRM may be configured to use your existing WordPress database, or a separate (new) database. Using a separate database is generally preferred - as it makes backups and upgrades easier.

So whilst CiviCRM would usually create a query like this:

SELECT id, wp_users.id
FROM civicrm_contact a
LEFT JOIN (`civicrm_uf_match` `uf_match`) ON `uf_match`.`contact_id` = `a`.`id`
LEFT JOIN (`wp_users` `wp_users`) ON `uf_match`.`uf_id` = `wp_users`.`id`
WHERE (`a`.`is_deleted` = \"0\")
LIMIT 25
OFFSET 0

I need to be able to add the database name for the WordPress table(s):

SELECT id, wp_users.id
FROM civicrm_contact a
LEFT JOIN (`civicrm_uf_match` `uf_match`) ON `uf_match`.`contact_id` = `a`.`id`
LEFT JOIN (wordpress_db.`wp_users` `wp_users`) ON `uf_match`.`uf_id` = `wp_users`.`id`
WHERE (`a`.`is_deleted` = \"0\")
LIMIT 25
OFFSET 0

The addition of civi.api4.generatedSql allows for that. It does require doing some string manipulation, which isn't particuarly nice, but as this is a very niche requirement it's fine (I could have opened a PR allowing the API to natively add the database name, but that would have been a much bigger PR, and felt overkill for such a niche use-case).

There are likely to also be other use-cases for civi.api4.generatedSql that I've not thought of.

Copy link

civibot bot commented Mar 23, 2025

🤖 Thank you for contributing to CiviCRM! ❤️ We will need to test and review this PR. 👷

Introduction for new contributors...
  • If this is your first PR, an admin will greenlight automated testing with the command ok to test or add to whitelist.
  • A series of tests will automatically run. You can see the results at the bottom of this page (if there are any problems, it will include a link to see what went wrong).
  • A demo site will be built where anyone can try out a version of CiviCRM that includes your changes.
  • If this process needs to be repeated, an admin will issue the command test this please to rerun tests and build a new demo site.
  • Before this PR can be merged, it needs to be reviewed. Please keep in mind that reviewers are volunteers, and their response time can vary from a few hours to a few weeks depending on their availability and their knowledge of this particular part of CiviCRM.
  • A great way to speed up this process is to "trade reviews" with someone - find an open PR that you feel able to review, and leave a comment like "I'm reviewing this now, could you please review mine?" (include a link to yours). You don't have to wait for a response to get started (and you don't have to stop at one!) the more you review, the faster this process goes for everyone 😄
  • To ensure that you are credited properly in the final release notes, please add yourself to contributor-key.yml
  • For more information about contributing, see CONTRIBUTING.md.
Quick links for reviewers...

➡️ Online demo of this PR 🔗

@civibot civibot bot added the master label Mar 23, 2025
@colemanw
Copy link
Member

Interesting idea @braders. Some thoughts in no particular order:

It does require doing some string manipulation, which isn't particuarly nice

  1. Instead of converting $this->query object to a string before calling the hook, why not pass the object into the hook before calling $this->query->toSQL(). The object is much nicer to work with than a string.

  2. However, there are a number of other places where this function gets called so your customizations would be broken in places like this:

$sql = $this->query->toSQL();

$subquery = $this->query->toSQL();

  1. If the requirement here is simply to prefix fields with the name of a database, this hook shouldn't be needed. When adding your field, simply add a callback function with setSqlRenderer() and in that function add the prefix. You can reuse the same callback function for every field in your wp database, as the function receives $field['sql_name'] as an argument.

  2. The callback would look like this:

public static function renderWpFieldSql(array $field): string {
  if (... wordpress database is different) {
    return "`$wpDatabaseName`.{$field['sql_name']}";
  }
  return $field['sql_name'];
}

@braders
Copy link
Contributor Author

braders commented Mar 23, 2025

Thanks @colemanw. Good spot on the Api4SelectQuery->getCount function also calling toSQL. I'd overlooked that, but I have now confirmed it's an issue for what I'm trying to do. That said, the number of places calling toSQL seems low, so potentially the hook could just be repeated.

Adding the database name to sql_name is something I'd actually already tried, but it doesn't work nicely for joins because of how Civi forms the query:

$subjoinClause .= " INNER JOIN `{$subjoin['table']}` `{$subjoin['alias']}` ON (" . implode(' AND ', $subjoin['conditions']) . ")";

Because the backticks wrap {$subjoin['table']} SQL will treat the database name as part of the table name. A possibility is to allow passing in the database name to this function (or potentially checking if $subjoin['table'] contains a . within the function and infering that the first part is a DB name):

$subjoinClause .= " INNER JOIN `{$subjoin['database']}`.`{$subjoin['table']}` `{$subjoin['alias']}` ON (" . implode(' AND ', $subjoin['conditions']) . ")";

I don't think using setSqlRenderer will help when it comes to this join.

For context, this is the code I'm current using to test the new hook I'm suggesting:

public function add_sql_database_names($event) {
  global $wpdb;
  $dbname = $wpdb->dbname;
  $sql = $event->sql;

  foreach ($this->entities as $entity) {
    $sql = str_replace(
      sprintf('FROM %s ', $entity['table']),
      sprintf('FROM %s.%s ', $dbname, $entity['table']),
      $sql
    );
    $sql = str_replace(
      sprintf('(`%s`', $entity['table']),
      sprintf('(%s.`%s`', $dbname, $entity['table']),
      $sql
    );
  }

  $event->sql = $sql;
}

@seamuslee001
Copy link
Contributor

I agree with Coleman that this is an interesting idea but I wonder if it should be more that rather than operating on the generated SQL that the hook dispatches the Api4Query class instead

Then the listener could do like $query->getWhere() and loop through and call like $query->setWhere or something like that sort of pattern.

In this it would be a bit closer to how Drupal Views is modified by civicrm entity to deal with having civicrm tables in a separate schema

https://github.com/eileenmcnaughton/civicrm_entity/blob/4.0.x/src/Hook/QueryHooks.php#L27

@colemanw
Copy link
Member

@braders ok I see what you're talking about now with the joins.

I also get where you're coming from wrt adding a "quick fix" sort of hook that gets the job done without much fuss up front. However, we've been burned in the past by adding these sorts of overly-simplistic hooks that expose deeply internal functions. For example, our dedupe code is a right mess due to the sql queries all being exposed to hooks, which makes improving the queries nearly impossible as hooks out there in the wild could be doing literally anything and there's no way to know what changes would break them

IMO what's needed is for Api4 to understand which database each table belongs to, so it can properly prefix each one.

@colemanw
Copy link
Member

@braders what about #32474

@colemanw colemanw closed this Mar 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants