-
Notifications
You must be signed in to change notification settings - Fork 842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Failure to communicate with tenant in West US #2347
Comments
Hi @eisber Sorry to bother you here, but the Semantic Link is putting an odd dependency on Synapse ML. I was wondering if you have any knowledge about this dependency stack. I'm guessing that the problem I'm facing is a bug in the Semantic Link for spark (SparkLink). The relevant code is not found on the internet. ... If my problem was in SynapseML itself, then I'm guessing I would be able to find the callstacks here in this community. The problem may be related to the fact that our tenant lives in West US and the capacity lives in North Central. We are getting some basic timeout/connectivity issues. This is not an intermittent issue. I have a ticket open, but I'm worried that the Mindtree folks will take weeks to set up a repro and contact you. I don't think the scenario involves components that are still in "preview". |
can you find a RAID (root activity id) in the logs? |
Hi @eisber I can ask the engineer for his RAID. I was able to send a repro over to the Mindtree side of things. The full case is reported with the following title and Mindtree case number. Here is an example of a spark native connector query that errors (seemingly because of synapse.ml.powerbi): The main problems appear to be when introducing "WHERE" clauses. That part of the query appears to be parsed for syntax, but don't seem to have an impact on the SQL profiler queries in the PBI dataset. Moreover, in some cases I can omit the "WHERE" clause, as a way to avoid the error messages. FYI, Here is a comparable DAX that works great, when it is crafted by hand. Notice it should take ~5 ms and return under 200 rows. I'm having a hard time understanding the behavior of this "spark native connector", and I can't distinguish the functionality that will work reliably from the functionality that is broken. My biggest concerns are that "WHERE" clauses seem to be ignored. The secondary concern is that there is a restrictive "TOPN()" applied to the DAX query. That restriction rarely gets enough data from the dataset model, especially when WHERE clause is omitted: It sounds like you are encouraging us to use the sempy ("evaluate_whatever") methods on the driver as a fallback whenever the spark native connector is misbehaving. Is that so? Are both of them supported as GA features of Fabric? |
the spark native connector path you're using maps to the this function in sempy: https://learn.microsoft.com/en-us/python/api/semantic-link-sempy/sempy.fabric?view=semantic-link-python#sempy-fabric-evaluate-measure if you don't want to joined queris over semantic model and spark, I'd strongly recommend to use the python API on the driver node. |
BTW, should we move this discussion to a different github? Seems like this is only loosely related to the open source synapse.ml. I think I understand that the (A) spark native connector is using the (B) evaluate_measure ... but perhaps it is doing so on a remote executor rather than a driver. So I think you are saying that a problem in one of these (A or B) will always affect the other and I can simplify the repro by swapping one for the other? I have a tendency to gravitate towards spreading requests out to the executors, given my past experiences with Apache Spark. If any given spark guy is doing everything on the driver, then people will tell us we are doing it wrong (ie. why use a cluster at all). The hope is that some day there will be optimizations or query hints that allow the work to be distributed across executors, and thereby improve the overall execution time. Of course the bottleneck will ultimately just move. Slow operations on the Spark cluster will ultimately be made faster but the bottleneck will end up at the PBI dataset model.... so it is really doubtful that it makes any difference if queries are submitted from executors or drivers. At the end, the only real benefit I expected to get out of the spark native connector is to avoid as much DAX as possible. ;) I love MDX and SQL but have some love and some hate for DAX. Is the spark native connector at least supported? ... I have started getting doubts about that, given the obscure error: ... resulting from a slight change in query syntax. As per your strong recommendation, I have no doubt that I would be able to get the python API working on the driver, by hook or by crook. My only question at this point is whether to avoid the spark native connector for future workloads in pyspark. |
I don't know your dataset size, but from past experiments for most standard semantic model size you won't see any improvement by using spark or even trying to optimize by moving compute to executors. In general the recommendation to perform computation on the executors for spark jobs is reasonable, but that's for datasets of multiple GB/TB. If your dataset fits into memory on the driver node, you're probably even faster as you don't have any distributed system overhead. |
Right. Most of my PBI datasets are small. At my company I'm guessing that 99% of our PBI datasets are under 5 GB, (and could fit easily in duckdb or sqlite). Still when running a solution on a Spark cluster, people expect to follow Spark patterns. I'm assuming that is why Microsoft created the spark native connector in the first place. Using SQL statements against PBI datasets is also appealing. Based on your recommendation, I'll start using sempy on the driver ... in the pandas ecosystem ... and subsequently build the spark frame after the fact when I need one. Eg. via: In the future we may need to combine this data (PBI models) with some other pre-existing spark solution, or delta-table or whatever. Whenever that happens it feels a bit "dirty" if one piece of data forces the whole business to be collect()'ed up to the driver. To avoid that dirty operation, a spark developer would typically push even the small datasets down to the workers. (And the native spark connector would theoretically save us from writing that code ourselves.) |
SynapseML version
Fabric 1.3 (com.microsoft.azure.synapse.ml.powerbi.measure.PBIMeasurePartitionReader)
SynapseML '1.0.8-spark3.5'
System information
Describe the problem
This library (Synapse ML) is causing problems inside of Fabric.
It appears to be running inside of Fabric, while executing Spark SQL statements against a semantic model.
com.microsoft.azure.synapse.ml.powerbi.measure.PBIMeasurePartitionReader
We already turned off the automatic logging of ML for experiments and models. (That had been causing problems for us in the past. Hopefully it is not a problem to turn that stuff off.)
The errors in my spark job are meaningless, and seems to be unrelated to the actual work that I'm doing. The errors appear to be related to some perfunctory interaction with our Fabric tenant hosted in West US.
Here are the details:
Here is a screenshot of the query and the error:
Notice that I'm simply using "semantic-link" to run a query against a PBI dataset. I'm guessing that 95% of the work is primarily performed on a driver.
I'm hoping I will get some support from here. The error seems related to this community, and not so much related to Fabric. I will otherwise wait a couple of weeks for Mindtree to respond (pro support). At the end, they would probably need help from this community to understand the behavior of SynapseML in Fabric.
Any tips would be very much appreciated.
Code to reproduce issue
Other info / logs
None
What component(s) does this bug affect?
area/cognitive
: Cognitive projectarea/core
: Core projectarea/deep-learning
: DeepLearning projectarea/lightgbm
: Lightgbm projectarea/opencv
: Opencv projectarea/vw
: VW projectarea/website
: Websitearea/build
: Project build systemarea/notebooks
: Samples under notebooks folderarea/docker
: Docker usagearea/models
: models related issueWhat language(s) does this bug affect?
language/scala
: Scala source codelanguage/python
: Pyspark APIslanguage/r
: R APIslanguage/csharp
: .NET APIslanguage/new
: Proposals for new client languagesWhat integration(s) does this bug affect?
integrations/synapse
: Azure Synapse integrationsintegrations/azureml
: Azure ML integrationsintegrations/databricks
: Databricks integrationsThe text was updated successfully, but these errors were encountered: