var url = 'mongodb://ec2-XX-XX-XXX-XX.us-west-2.compute.amazonaws.com:27017/storedata';
MongoClient = require('mongodb').MongoClient, assert = require('assert');
MongoClient.connect(url, function (err, db) {
assert.equal(null, err);
console.log("Connected successfully to energy server!");
var col = db.collection('energydata');
// Get the results using a find stream
var cursor = col.find({});
cursor.on('data', function (doc) {
console.log('data is : ', doc);
});
cursor.once('end', function () {
db.close();
});
});
But this is a one time call to get all available data. My question is what is the best practice so that when new data comes in, I will be able to retrieve only the new data and not having to retrieve all the data that’s already read in, as in typical time series visualization application say using Plotly etc?
I’ve done something similar to this (also for energy data). I used the timestamp (the time field in your example) as the MongoDB _id and ignored errors on attempts to re-insert duplicates. That way gave me a “free” index on the timestamp and made chart creation and live updating really easy by using a sort/limit in my find.
Check the code in the repo I linked to. Apart from the charting package, the only real difference between that Blaze solution and a React one is that you render the chart in componentDidMount instead of Template.xxx.onCreated.
So do I. The url is used only to insert data into Mongo. Thereafter, whenever new data is inserted into the DB I use this to get it on the client via pub/sub:
And set reactiveVars which were used by the chart’s setData method to dynamically update.
$ meteor npm install
npm WARN enoent ENOENT: no such file or directory, open '/Users/seb/WebstormProjects/ukgrid/package.json’
npm WARN ukgrid No description
npm WARN ukgrid No repository field.
npm WARN ukgrid No README data
npm WARN ukgrid No license field.
Yeah - it’s pretty old and needs updating for npm and import/export. I was suggesting reading the relevant code (there’s not that much) rather than installing it.
So whenever working with external MongodB (or whatever dB), the backend is always calling and building the entire database? For example say I have initial 1GB of data, and using MongodB driver
server.js
MongoClient = require('mongodb').MongoClient, assert = require('assert');
var url = 'mongodb://ec2-XX-XX-XXX-XX.us-west-2.compute.amazonaws.com:27017/storedata';
MongoClient.connect(url, function (err, db) {
assert.equal(null, err);
console.log("Connected successfully to energy server!");
var col = db.collection('energydata');
// Get the results using a find stream
var cursor = col.find({});
cursor.on('data', function (doc) {
console.log('data is : ', doc);
});
cursor.once('end', function () {
db.close();
});
});
and then 30 mins later inside cron job I have new data so now dataset is 1.000001 GB, so I again need to run the above code to update the database, even though it is just 0.000001 GB of new data? What I am trying to say is that is there no way of just retrieving data that is not already loaded into app instead of loading the entire database each cron job?
Just did update since you mentioned it was updated but still getting:
A MongoDB stream is just an efficient mechanism for permuting a cursor without having to dump the entire result set into memory. It’s just a Node stream under the hood.
Since your dataset only updates every 30 minutes, it’d be much more practical if you just query every N minutes and use a well-covered selector to restrict to documents inserted since time X. If you need to synchronize existing docs as well, you’d either need to track lastModified times as well, or tail the oplog. Meteor’s mongo driver implements oplog tailing for you, or you could roll your own solution.
I recently implemented a Node server which replicates and syncs large collections to elasticsearch in realtime – streams were useful to scalably permute each collection, and to tail the oplog to catch the diffs.
I think the part you want to focus on in your code is this:
// Get the results using a find stream
var cursor = col.find({});
Your find selector tells it to get everything. So, that’s what it does each time it is run.
You want to change your selector so it only gets the data added since the last time you received data.
I noticed you said “and then 30 mins later inside cron job”. So, if you’re running this script once every 30 minutes then you only need to change this code’s find selector to get data added after 30 minutes ago.