"No primary found in set" error when deploying my app on EC2


#1

Hi,

I already deployed my app on a private server with mongo. So everything is on the same machine.
Cluster is enabled and set to auto.

My MONGO_OPLOG_URL is this :
export MONGO_OPLOG_URL=‘mongodb://user:pwd@localhost:27017,localhost:27018,localhost:27019/local?authSource=admin’

And this works.

Now I tried to deploy my app on amazon EC2 and to connect to the same Mongo which is still on my private server.

My new MONGO_OPLOG_URL is this :
export MONGO_OPLOG_URL=‘mongodb://user:pwd@privateHost:27017,privateHost:27018,privateHost:27019/local?authSource=admin’

But this fails. I got a first error “No primary found in set” and then “No valid replicaset instance servers found”.
I reduced CLUSTER_WORKERS_COUNT to 1, it worked but it’s useless if I can’t get more…

Does someone have an idea what I forgot to do ?


#2

The MONGO_OPLOG_URL requires an oplog, which is provided by a replica set. The message about not finding a primary means that meteor can’t find a replica set and oplog to attach to. You need to set up a replica set.

https://github.com/awatson1978/meteor-cookbook/blob/master/cookbook/replica-sets.md


#3

I understand that I need a replicaSet to use oplog but I already have it and Kadira says that I’m using it.

I’m able to connect to privateHost:27017,privateHost:27018 and privateHost:27019 and each of my mongod instance has the replSet and keyFile options.


#4

Did you actually initialize the replica set? Which one is your primary? Are they maybe all in secondary status? They’ll vote among themselves and swap roles if one of them isn’t available. Gotta keep an eye on that. Log into the mongo shell on each, and check what the instance state is. You may need to use the rs.reconfigure() command.

http://docs.mongodb.org/manual/tutorial/force-member-to-be-primary/
http://docs.mongodb.org/manual/tutorial/reconfigure-replica-set-with-unavailable-members/


#5

On localhost:27017

  • My oplog user has read access to local.
  • db.version()

2.4.9

  • db.isMaster()

{
“setName” : “testingRplSet”,
“ismaster” : true,
“secondary” : false,
“hosts” : [
“localhost:27017”,
“localhost:27019”,
“localhost:27018”
],
“primary” : “localhost:27017”,
“me” : “localhost:27017”,
“maxBsonObjectSize” : 16777216,
“maxMessageSizeBytes” : 48000000,
“localTime” : ISODate(“2015-04-21T06:18:32.252Z”),
“ok” : 1
}

-> I have a master and it’s localhost:27017

  • rs.conf()

{
"_id" : “testingRplSet”,
“version” : 3,
“members” : [
{
"_id" : 0,
“host” : “localhost:27017”
},
{
"_id" : 1,
“host” : “localhost:27018”
},
{
"_id" : 2,
“host” : “localhost:27019”
}
]
}

-> conf seems OK, however I didn’t give any priority value

  • db.runCommand( { replSetGetStatus : 1 } )

{
“set” : “testingRplSet”,
“date” : ISODate(“2015-04-21T06:15:55Z”),
“myState” : 1,
“members” : [
{
"_id" : 0,
“name” : “localhost:27017”,
“health” : 1,
“state” : 1,
“stateStr” : “PRIMARY”,
“uptime” : 3949189,
“optime” : Timestamp(1429596954, 1),
“optimeDate” : ISODate(“2015-04-21T06:15:54Z”),
“self” : true
},
{
"_id" : 1,
“name” : “localhost:27018”,
“health” : 1,
“state” : 2,
“stateStr” : “SECONDARY”,
“uptime” : 3948879,
“optime” : Timestamp(1429596954, 1),
“optimeDate” : ISODate(“2015-04-21T06:15:54Z”),
“lastHeartbeat” : ISODate(“2015-04-21T06:15:55Z”),
“lastHeartbeatRecv” : ISODate(“2015-04-21T06:15:55Z”),
“pingMs” : 0,
“syncingTo” : “localhost:27017”
},
{
"_id" : 2,
“name” : “localhost:27019”,
“health” : 1,
“state” : 2,
“stateStr” : “SECONDARY”,
“uptime” : 3948939,
“optime” : Timestamp(1429596952, 1),
“optimeDate” : ISODate(“2015-04-21T06:15:52Z”),
“lastHeartbeat” : ISODate(“2015-04-21T06:15:54Z”),
“lastHeartbeatRecv” : ISODate(“2015-04-21T06:15:55Z”),
“pingMs” : 0,
“syncingTo” : “localhost:27017”
}
],
“ok” : 1
}

->Sync seems Ok too

Are there other things to check ?


#6

Hmmm… reading through the original problem again… have you checked all your firewalls allow bidirectional access to port 27017?


#7

Oh god! I found something …

It’s nothing to do with firewall or open ports… Few Days ago I installed mongo MMS to monitor/backup but somehow it fails to work with meteorhacks:cluster. I deactivated it and now it works.

I will see tomorrow if it was really this.


Edit : It still doesn’t work. Then, I did some tests.

On EC2
I created a small app that does not more than this :

if (Meteor.isServer) {
Meteor.startup(function () {
console.log(“startup”)
t = new Mongo.Collection(“myItems”)
console.log(“starting to count()”, new Date())
console.log(“tasks”,t.find().count())
console.log(“ended to count()”, new Date())
});
}

Results :

  • no CLUSTER_WORKERS_COUNT : everything is fine
  • CLUSTER_WORKERS_COUNT=<20 : everything is fine
  • CLUSTER_WORKERS_COUNT>20 : “no primary found in set” and “No valid replicaset instance servers found” error

Then, with my actual app, results :

  • no CLUSTER_WORKERS_COUNT : everything is fine
  • CLUSTER_WORKERS_COUNT=<2 : everything is fine
  • CLUSTER_WORKERS_COUNT>2 : “no primary found in set” and “No valid replicaset instance servers found” error

I checked the network and it looks very bad…

Each peak is when I started my app…

On privateHost (my EC2 instance is off)
With my actual app

  • no CLUSTER_WORKERS_COUNT : everything is fine
  • CLUSTER_WORKERS_COUNT>0 : “no primary found in set” and “No valid replicaset instance servers found” error

I restarted my replicaSet :

  • CLUSTER_WORKERS_COUNT=auto (=4) : everything is fine.

Now, I’m trying to find where this huge amount of data comes from…


#8

@awatson1978 I’m so near to resolve it :

  1. The huge data amount came from CollectionFS. I don’t know why but after a while it stopped from loading stuff. To be sure that there is nothing more I removed it from my project.

  2. About “No primary found in set”.
    I stopped every thing my app and my replicaSet.
    I started my replicaSet and then my app with 20 workers on EC2 -> that was surprisingly OK.
    Then I stopped my app and started with 25 workers on EC2 -> that didn’t work.
    And I stopped it again and started with 1 workers on EC2 -> that didn’t work either.

But this time I could see “Error: connection closed”.
So I stopped every thing again.
I started my replicaSet and run serverStatus

sudo mongod --fork --syslog --port 27017 --dbpath /data/db/27017 --replSet testingRplSet --keyFile keyFile
sudo mongod --fork --syslog --port 27018 --dbpath /data/db/27018 --replSet testingRplSet --keyFile keyFile
sudo mongod --fork --syslog --port 27019 --dbpath /data/db/27019 --replSet testingRplSet --keyFile keyFile

testingRplSet:PRIMARY> db.serverStatus().connections
{ “current” : 409, “available” : 410, “totalCreated” : NumberLong(486) }

Where do the 409 connections come from ? It should be 3, my mongo shell and the replicaset

I started my app on EC2 with 25 workers:

testingRplSet:PRIMARY> db.serverStatus().connections
{ “current” : 773, “available” : 46, “totalCreated” : NumberLong(1444) }

It should be 26 connexion more than before but here are 364 new connections…

If I now stop my app and run it again there are 0 available connections left and I get this “Error: connection closed” . Why are the connection still there ? I have no clue yet.

I stopped everything again.
Lunched my replicaSet with bind_ip.

sudo mongod --fork --syslog --port 27017 --dbpath /data/db/27017 --replSet testingRplSet --keyFile keyFile --bind_ip 127.0.0.1
sudo mongod --fork --syslog --port 27018 --dbpath /data/db/27018 --replSet testingRplSet --keyFile keyFile --bind_ip 127.0.0.1
sudo mongod --fork --syslog --port 27019 --dbpath /data/db/27019 --replSet testingRplSet --keyFile keyFile --bind_ip 127.0.0.1

serverStatus gives this :

testingRplSet:PRIMARY> db.serverStatus().connections
{ “current” : 253, “available” : 566, “totalCreated” : NumberLong(400) }

I started my app on localhost with 4 workers :

testingRplSet:PRIMARY> db.serverStatus().connections
{ “current” : 288, “available” : 531, “totalCreated” : NumberLong(2141)}

I conclude that something somewhere takes my connections away and doesn’t give it back :confused:


Solution :

I didn’t find why I had so much connections at start but you can change the connection limit.
I transfered my data to compose.io and everything was fine now.