Pro tip: MongoDB cursors return duplicate records

If you are iterating through a data set, you should switch from:

Object.each { |o| ... }

to

Object.extras(hint: { _id: 1 }).each { |o| ... }

That example uses Mongoid, but this is a MongoDB problem, not a Mongoid one. It’s likely it affects other MongoDB libraries.

Students get too many awards

Each week, our systems distribute awards to students if they achieve certain goals. We started doing this naively: iterating through each record and calling a method on the object. This has a few benefits:

  1. the code is really easy to follow
  2. for now, it runs quickly enough for our needs
  3. it works

At least, we thought it worked. We’d started to notice more and more reports of people getting two awards where they should only be getting one. The awards were effectively identical, with only slight differences in created_at times to tell them apart.

We ruled out some of the obvious potential causes: our crons were only firing once, the code wasn’t accidentally creating two awards.

The only other possible solution we could see is if our iteration was somehow returning the same object twice. Nah, that can’t be it… that’d be insane… that’d be impossible…

Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth.

Arthur Conan Doyle

The Internet to the rescue!

We stumbled across this post on stack overflow which suggested that was it. By default, mongoDB returns documents more than once if a write operation moves the data.

A few test iterations through “all” records in our db confirmed that iterating without a hint resulted in duplicates. Adding the hint meant we only got each record once. As far as we can tell, this is a MongoDB problem, not a Mongoid problem.

Rant

It’s not impossible that there is somebody out there has a need to return records twice in one iteration. Maybe. Regardless, there’s no way that is most common use case. MongoDB really doesn’t have that pit of success stuff down. This just seems like another time when MongoDB defaults are either wrong, misleading, or useless.

:sigh:

(Thanks to jimoleary for the helpful Stack Overflow comment)

(Originally posted at on the Blake Dev Team Blog)

blog comments powered by Disqus