-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebuild cohort processing #72
Conversation
|
||
private Object[] tuple; | ||
|
||
private HashMap<String, Integer> schema2Index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I just want a unified structure which can regarded as a input for all processor unit. Any suggestion ?
* | ||
* @return the layout of this ProjectedTuple | ||
*/ | ||
public String[] getSchemaList(){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reserved Interface
cool-core/src/main/java/com/nus/cool/core/cohort/refactor/utils/DateUtils.java
Outdated
Show resolved
Hide resolved
cool-core/src/main/java/com/nus/cool/core/cohort/refactor/utils/TimeUtils.java
Show resolved
Hide resolved
cool-core/src/main/java/com/nus/cool/core/cohort/refactor/valueSelect/ValueSelection.java
Outdated
Show resolved
Hide resolved
cool-core/src/main/java/com/nus/cool/core/io/readstore/HashMetaFieldRS.java
Show resolved
Hide resolved
cool-core/src/main/java/com/nus/cool/core/io/readstore/HashMetaFieldRS.java
Show resolved
Hide resolved
* @return | ||
* @throws IOException | ||
*/ | ||
public synchronized boolean isCubeExist(String cube) throws IOException{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add on to previous feedback. it will still take some time for me to read the birthSelect and cohortSelect folders..
After reading the filters and aggregators, I think we need to discuss about the plan on how PorjectedTuple is being used. it seems that the end consumers, aggregators only cares about one field. Then there are two fields, user id
and action time
in specific, added to help selection. we could have dedicated variables in ProjectedTuple
for these two fields, then keep a list of value fields that will be aggregated. let CohortProcessor
to use aggregator to update the corresponding intermediate result of retunit during processing.
cool-core/src/main/java/com/nus/cool/core/cohort/refactor/birthSelect/EventSelection.java
Show resolved
Hide resolved
case "RANGE": return Range; | ||
case "SET": return Set; | ||
default: | ||
throw new IllegalArgumentException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unhandled exception? Is @JsonCreator going to handle it? Actually in upstream function calls we need to do error handling gracefully instead of stopping the whole system. Just leaving a notes here. It can be a target in our next phase of development.
cool-core/src/main/java/com/nus/cool/core/cohort/refactor/filter/RangeFilter.java
Outdated
Show resolved
Hide resolved
cool-core/src/main/java/com/nus/cool/core/cohort/refactor/ageSelect/AgeSelection.java
Show resolved
Hide resolved
cool-core/src/main/java/com/nus/cool/core/cohort/refactor/aggregate/AverageAggregate.java
Show resolved
Hide resolved
cool-core/src/main/java/com/nus/cool/core/cohort/refactor/aggregate/DistinctCountAggregate.java
Show resolved
Hide resolved
* Provide get and set interface | ||
*/ | ||
@Data | ||
public class RetUnit{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this where we are to support max, min, etc for the cohort result? if so, we should leave some notes here.
Leaving a note here: If we have separate variables tracking min, max, etc. we needs not have getters to expose the internal attributes. Instead we have a set of operations like max()
, addToDistinct()
, etc.
* Provide get and set interface | ||
*/ | ||
@Data | ||
public class RetUnit{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And Aggregator::calculate simply calls the respective methods.
cool-core/src/main/java/com/nus/cool/core/cohort/refactor/aggregate/AggregateFunc.java
Show resolved
Hide resolved
cool-core/src/main/java/com/nus/cool/core/cohort/refactor/valueSelect/ValueSelection.java
Outdated
Show resolved
Hide resolved
cool-core/src/main/java/com/nus/cool/core/cohort/refactor/CohortProcessor.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CohortSelect contains wrappers of filters. If we avoid the ProjectedTuple encapsulating the schema and indirection from schema to value, we could remove those wrappers.
I have finished my review. This is the last part. We need to increase test coverage. Especially, unit tests for selection context. I also noticed that filtering using metachunk has not been implemented in CohortProcessor.
Regarding birth sequence support. the old implementation benefited from the assumption that all records of a user are contiguous and guaranteed to be in a data chunk. The birthEvents are checked one by one in a loop, meaning all records of a user are available during processing. That is in conflict with update handling. Currently we support one entry in birthSequence, to support more, the selection context needs considerable modifications to support the tracking of multiple partial matching of birthSequence. (old implementation does not face this issue because it scans the all records of a user for every entry in birthSequence iteratively).
cool-core/src/main/java/com/nus/cool/core/cohort/refactor/birthSelect/BirthContextWindow.java
Outdated
Show resolved
Hide resolved
cool-core/src/main/java/com/nus/cool/core/cohort/refactor/birthSelect/BirthContextWindow.java
Outdated
Show resolved
Hide resolved
cool-core/src/main/java/com/nus/cool/core/cohort/refactor/birthSelect/EventSelection.java
Show resolved
Hide resolved
cool-core/src/main/java/com/nus/cool/core/cohort/refactor/CohortProcessor.java
Show resolved
Hide resolved
cool-core/src/main/java/com/nus/cool/core/cohort/refactor/CohortProcessor.java
Show resolved
Hide resolved
cool-core/src/test/java/com/nus/cool/core/refactor/JsonMapper.java
Outdated
Show resolved
Hide resolved
BTW, please remove commented-out codes (including |
This PR has made the following change:
Overall, the struct and logic are clear. Currently, for each query, the system stores all related fileds' values into memory and are not released after the query is finished. Although this can facilitate the further query, we cannot predict the visiting frequency of each field since there is no workload. If the system runs as a service, all fields will eventually be loaded into memory. This may be inefficient. Is it better to delete all cached data after finishing a query? |
The main modification is to divide the query parser and query logic processing. And rename some method, add some doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick action. The decoupling made it much cleaner. I have no further comments. Please do a rebase against dev. That would exclude those olap related commits from this branch. Easier for others to view and for future references.
cool-core/src/main/java/com/nus/cool/core/cohort/refactor/cohortSelect/CohortSetSelector.java
Outdated
Show resolved
Hide resolved
|
||
if (!(userField.getValueVector() instanceof RLEInputVector)) { | ||
totalCorruptedUsers++; | ||
LOG.info("The user record corrupted: " + totalDataChunks); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just leave "user record corrupted". totalDataChunk does not give much info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion is to directly merge and solve these conflict (which is not part of the main logic of this PR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KimballCai @NLGithubWP please compile at your side to make sure it is working as well. Let's merge this PR and move on to develop in separate PRs to address its problems.
I have checked the codes and can run the codes successfully. |
71349cf
to
c184a57
Compare
move these aftercare work in issue #83 |
Please update the details here when reviewing the codes. (issues to be addressed)
Create an issue and submit a corresponding PR for each one.
loadattr
that mutates internal data.