Download Files from MongoDB GridFS
Well you’ve gone and done it now. You were so clever and used GridFS to store large files in MongoDB. It worked awesome, and you’ve had zero problems. But now you have a problem: Let’s say you want a hardcopy of those files — for reasons ranging from archiving, to migrating to a different database system. Well I have bad news for you, because if your files collection count is more than a few dozen, there is no convenient way to retrieve the actual files from that filesystem except rolling your own — or following the instructions here, that is.
A quick review of the file system I am working with: Let’s say you have a People collection and a Files collection. People might look something like:
> db.people.findOne() { "_id" : "XXXXXXXXXXXXXXXXX", "status" : "Active", "name" : "Joe Smith", "files" : [ { "file" : "aaaaaaaaaaaaaaaaa" }, { "file" : "bbbbbbbbbbbbbbbbb" }, { "file" : "ccccccccccccccccc" } ] }
and Files is a little more complex, but the aspect we care about might look something like:
> db.cfs.files.filerecord.findOne() { "_id" : "YYYYYYYYYYYYYYYYY", "original" : { "name" : "puppies.png", "updatedAt" : ISODate("2017-08-09T21:45:31Z"), "size" : 15645, "type" : "image/png" }, "copies" : { "filesStore" : { "name" : "puppies.png", "type" : "image/png", "size" : 15645, "key" : "zzzzzzzzzzzzzzzzzzzzzzzz", "updatedAt" : ISODate("2017-08-09T21:45:31Z"), "createdAt" : ISODate("2017-08-09T21:45:31Z") } } }
The important things to note are the _id
of the file, as well as copies.filestore.key
— that’s what we’ll be using to access & associate our files.
One of the great things about GridFS and the way it stores files is that there is no restriction on file names. If you want to have 30,000 files all named puppies.png
you can do that.
MongoDB also has an excellent utility for file access & manipulation, mongofiles
. It gives you complete CRUD control over the files in GridFS from the command line. However, if you do have 30,000 files all named puppies.png
, mongofiles
won’t be a ton of help as it’s primary features focus on using the name of the file for access. You could, for example,
mongofiles get puppies.png
and the resulting puppies.png
file would be thusly exported to your local filesystem — but only one. In order to get a specific puppies.png
file, you would need to pass the ObjectId of the file you want like so:
mongofiles -d db get_id 'ObjectId("56feac751f417d0357e7140f");
But this only works once you have the ObjectId of the file in question. There is a mongofiles
command for listing all the files — but it only works to return you all of their names. Furthermore, the ObjectId you need to pass it is not the _id
from the above example, but the copies.filestore.key
node mentioned earlier (eg, zzzzzzzzzzzzzzzzzzzzzzzz
).
Of course, all of this is compounded by the fact that we don’t just want to download all 30,000 puppies.png
files, every one of them holding a special and irreplaceable puppies image. We also want to maintain association with the Person object they are attached to, which does reference the _id
of the File object.
The solution I’ve come up with was based originally on this gist, which I’ve modified to include the _id
as a parent directory, with the file (under it’s original name) contained within. When run inside a backup_files
dir, it will create a directory structure somewhat like this:
backup_files ├── YYYYYYYYYYYYYYYYY │ └── puppies.png ├── IIIIIIIIIIIIIIIII │ └── puppies.png ├── NNNNNNNNNNNNNNNNN │ └── puppies.png └── MMMMMMMMMMMMMMMMM └── puppies.png
The file’s copies.filestore.key
is used for retrieval via mongofiles get_id
, the file’s _id
param is used as the directory name, and the file’s original name is preserved inside that dir. The dir’s name can then be referenced from the People collection to maintain a reference to a Person’s files (of puppies).
Note that you will need to replace the name of your database (meteor
in my case) on lines 10 & 12.
PS if you know of an easier way of doing this please tell me.
Share Your Thoughts