Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mongodb backend uses wrong pipeline order, sums aggregate field name instead of measure (physical) field name #2

Open
Stiivi opened this issue Jun 3, 2016 · 3 comments

Comments

@Stiivi
Copy link
Member

Stiivi commented Jun 3, 2016

Moved from DataBrewery/cubes#336 by @miraculixx


Based on the tutorial I converted the tutorial dataset to MongoDB, using the cube with the mongo backend. After fixing the mongo backend it almost works, except that the MongoBrowser creates a pipeline that returns wrong results:

from cubes import Workspace
workspace = Workspace(config="/home/patrick/projects/cubestest/slicer-mongo.ini")
result = browser.aggregate(drilldown=["year"])
for record in result:
    print record
=>
{u'record_count': 31, u'amount_sum': 0, u'year': 2009}
{u'record_count': 31, u'amount_sum': 0, u'year': 2010}

As you can see the amount_sum fields are 0, which is wrong. Looking at the pipeline generated reveals why:

[{"$match": {"year": {"$exists": 1}}}, 
 {"$project":        <<< wrong order, project pipeline goes after group
        {"record_count": "$record_count", 
        "amount_sum": "$amount_sum", "year": 1}
 }, 
 {"$group": {"record_count": {"$sum": 1}, 
         "_id": {"year": {"$year": "$year"}}, 
         "amount_sum": {"$sum": "$amount_sum"}}},    <<< should be "$sum:" : "$amount"
  {"$sort": {"_id.year": 1}}
]

when the two problems are fixed manually and the aggregation directly run against mongodb the results are correct:

db.ibrd.aggregate([{ "$match": { "year": { "$exists": 1 } } }, 
{ "$group": { "record_count": { "$sum": 1 }, "_id": { "year": { "$year": "$year" } }, "amount": { "$sum": "$amount"} } }, 
{ "$project": { "record_count": "$record_count", "amount": "$amount", "year": 1 } }, 
{ "$sort": { "_id.year": 1 } }])
=>
/* 1 */
{
    "_id" : {
        "year" : 2009
    },
    "record_count" : 31,
    "amount" : 550840
},
/* 2 */
{
    "_id" : {
        "year" : 2010
    },
    "record_count" : 31,
    "amount" : 566020
}

The problem probably occurs within the mapping code. Any hints?

Data ready to import into mongodb: https://gist.github.com/miraculixx/5ffc89241478e8b09883
MongoDB: 3.2
cubes: release-1.0.1 with fix for mongodb applied as per above

#slices.ini
[store]
type: mongo
url: mongodb://localhost:27017/
database: cubes
collection: ibrd
[models]
main: tutorial_model.json
@robin900
Copy link

robin900 commented Jun 3, 2016

I believe the fix should happen at https://github.com/DataBrewery/cubes-mongo/blob/master/cubes-mongo/browser.py#L406

Instead of

escape_level(agg.ref())

you want

escape_level(agg.measure.ref())

I can't make these changes myself and test; I have too much other work to do at the moment.

@Stiivi
Copy link
Member Author

Stiivi commented Jun 4, 2016

Thanks @robin900. I added the change. Requires testing. CC: @miraculixx

@voldesh
Copy link

voldesh commented Jun 15, 2016

"unicode type has no attribute ref" I got this error when updated the statement to agg.measure.ref()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants