Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It seems SparkCompare object has no attribute 'sample_mismatch` ? #283

Closed
pangjac opened this issue Mar 22, 2024 · 2 comments
Closed

It seems SparkCompare object has no attribute 'sample_mismatch` ? #283

pangjac opened this issue Mar 22, 2024 · 2 comments
Labels
question Further information is requested spark

Comments

@pangjac
Copy link

pangjac commented Mar 22, 2024

Hi,

I am currently using 0.8.4. For a certain column, I am trying to print a sample_mismatch to check what is the value different for this column between two pyspark dataframe : It seems SparkCompare object has no attribute 'sample_mismatch` ?
image

Wondering if this is the version issue or not. However, the latest documentation does not list sample_mismatch in datacompy.spark module as well.

If confirmed, could you provide a quick poke on the reason why this method is not inherited. If this is no specific blockers, I'd happy to contribute to dev this method under spark module.

Thanks for this wonderful package!

@fdosani
Copy link
Member

fdosani commented Mar 22, 2024

Hey @pangjac first off thank you for supporting the package!

sample_mismatch doesn't exist for the SparkCompare class in that version of datacompy. We have a branch which is waiting review where we are shifting to pandas on pyspark if you are ok using that instead. v0.8.4 is fairly old so I'd highly recommend bumping up if you are able to. That old version of SparkCompare doesn't inherit from the base class as it was built aside from it. It has been something which has been bugging me hence the new branch waiting review and deprecating the old Spark class.

If you look at the new implementation (which aligns better to the pandas, polars, and fugue logic) we will have that function natively for Spark.

Alternatively I wonder if the internal dataframe: _all_rows_mismatched would give you what you need. you can filter on the column you are interested in since its just a Spark DF.

@fdosani fdosani added question Further information is requested spark labels Mar 22, 2024
@fdosani
Copy link
Member

fdosani commented Apr 8, 2024

@pangjac Just wanted to follow up and see if this was solved for you? Thanks!

@fdosani fdosani closed this as completed Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested spark
Projects
None yet
Development

No branches or pull requests

2 participants