Conceptual: Am I forced to perform a reduce on-the-fly for object with scores grouped by "uncountable" elements?

streemo · May 29, 2015, 2:38am

Hi,

I have a potentially interesting question for the computer scientists in the room, and the answer will help me implement an efficient pub/sub for my data.

Brief:

A User can post Posts. Users can vote on Posts in the form of up, U, or down, D, which are variables used in the function s(U,D), which is the score of the Post. Suppose s(U,D) is linear in U, D.

So far, easy.

The caveat here is this: Suppose that we have introduced a real number associated with each score; score.realNumber. Suppose that we want to calculate the score of a user based on only posts whose score.realNumber lies between a specific range, like [a,b]. Suddenly it is not as easy as keeping track of a user’s score on a top-level document field, because the score is dependent on the query itself. Further, it is unfeasible to try to count something that is uncountable. If each score was associated with either Apples or Oranges, it would be easy to do.

IMAGE DESCRIPTION: http://postimg.org/image/4bdqqxyez/

Possible Solutions:

Non-reactive Aggregating using arunoda’s package. Does the job quickly, results are nonreactive…
Using publish function this.added/changed/etc… API on server to aggregate/group and calculate user score’s on the fly, using the given subscription data. (input document:output document !== bijection, thus I am forced to group documents in the publish function.)
Denormalization of post score information into the user documents, combined with using $elemMatch, and then using low level publish API to reduce scores on the fly using subscription data (input document:output document IS A bijection, don’t need to worry about grouping elements…)
BAD SOLUTION: sending all post documents to client (using distributed computing principles) to do an aggregate grouping/score reduction in the browser into a null collection. I have implemented this for N<200 documents, otherwise this solution is incredibly silly at scale.

Any help on an efficient implementation would be much appreciated. See picture for better idea of what I am trying to accomplish.

nathan_muir · May 29, 2015, 3:22am

Hi @streemo - Will the range [a,b] vary as a parameter, or will it be reasonably static eg, tied to the user or post semi-permanently

streemo · May 29, 2015, 3:31am

@nathan_muir [a,b] is entirely decoupled from posts. it is coupled to the client - see the last paragraph. Further, in a single session, the parameter [a,b] on the client will vary regularly.

A client can request the top users with [4,6.7] and that list of top users may differ from the list of top users requested with [2.4535,3.4565432] because different scores are being summed over.

Meteor.subscribe(‘scores’,[a,b]) will get a cursor of users with user.score field which is a score that depends on [a.b], namely, a linear sum of the scores for posts authored by user_i IFF posts.score.constant lies in [a,b]

thanks for you reply :>

streemo · May 29, 2015, 3:37am

the constant in this case is post.score.constant, which is used as a parameter to determine whether or not the score in question will be used to account for the user’s score.

post1 <== user ==> post2
post1 ==> score ==> c: 5.666, U: 51votes, D:3votes
post2 ==> score ==> c:.3333 U: 23votes, D:45votes

if range == [0,6], the user’s score will be s(51,3) + s(23,45).
if range == [0,.5], the users score will be s(23,45)
if range == [.11111,.23234], the user’s score will be undefined, or default to zero.

A user can be the top user AND the worst user, depending on how the data is reduced.

So, the user’s score very much so depends on the range defined on the client.

nathan_muir · May 29, 2015, 5:28am

Hmm… Since the calculation seems very ad-hoc, I’d probably implement first as a Meteor.method.

If you later decide to add reactivity, you can always re-implement as a publication.

streemo · May 29, 2015, 6:02am

@nathan_muir would you be able to expand a bit on that?

TLDR: Thanks for the input nathan, I think you’ve given me enough feedback to try out a pseudo-reactive CPU forgiving solution. I think your solution would be similar to using meteorhacks:aggregate and meteor.publish (a combination which is also not reactive, but comes with a few bonus features like this.ready()) I do think I’m super close to figuring out a suitable and not-costly solution for the reactive version: keep a server-side in-memory global cache per session which is written to from the current publication in an observe cursor, and once new writes are done, send over the sorted/limited data to the client.

Do you mean a method which performs a query, groups and reduces the results according to my needs, and returns an array of documents? Or are you referring to something more elaborate? I kinda wanna avoid having to recompute the entire aggregation - so you’re suggesting to run the calculation once per session? I could cache the data in a server side global object, StaticHandles = {}, so I would be able to quickly paginate without recomputing.

for getting first twenty, method call, check if StaticHandles.var exists, if not, do the computation set results to StaticHandles.var, return _.first(StaticHandles.var,20). For next 20, make the same method call.

Thoughts?
think I could control “ready” by making use of the method’s callback and the Session.
So, then the template would look a bit like: {{#if ready}}, do this: {{#each arrayReturnedByMethod}} ?
ready = function, return Session.get(‘methodHasReturnedBoolean’)

Better Idea?:

If im gonna give up reactivity, Why not just use meteorhacks:aggregation, which minimally exposes some deeper mongo methods, and run a fast aggregate query on the server? Would result in a non-reactive publish instead of doing the Method call. Bonus, ready() functionality is out-of-the-box, and I could use the same pattern above in the publish function with the global var to limit multiple aggregate queries per session.

In both cases: I could check if range_current - range_previous < acceptable_delta_range_to_jutsify_new_query. Essentially, it would be like polling the server until the reactive range on the client justifies a new computation.

well i think that fixes the pagination issue that sprang up too. thnks a ton for the bouncing. it really helps. please let me know if that sounds good to you. I think im gonna get to work on it first thing in the morning.

jay