Trouble Running Bulk Loader through a Docker Container

I’m having trouble understanding how to run the bulk loader, I have a python script that creates a new dgraph-zero and creates a schema. From there, I’m creating a volume. That is my current understanding of how I should be setting up for the bulk loader.

Finally, I’m using a docker run for the bulk loader. I was wondering what the proper format was for the bulk loader and if I was missing anything?

Hey @kbchoi,

Welcome to the community. That commands looks correct. If there are no errors, then the posting files created the ./out folder are ready to be put in place when you start your cluster. Have a read of this document, it should clear up next steps: Initial Import (Bulk Loader) - Hypermode

Hi @matthewmcneely,

Thank you for confirming. I keep getting an error when I run this command through a powershell command line. I tried looking up the issue and most resources are saying that the error is existing here:

2025/07/28 01:03:44 unlinkat //out: device or resource busy

I was wondering where my most likely points of error would be! Would you have some insight?

(linked the full error for context)

The bulk loader creates and deletes a lot of tmp files. Docker on Windows seems to have issues when using “Host bind mounts”. Try using a docker volume instead:

docker volume create dgraph_out
docker run -v dgraph_out:/out <rest of commands>

Then you’ll have to copy the bulk loader output files (posting, etc) from the docker volume to your host system.

Side note: If you’re ultimately going to run the graph on a Linux box, you might consider just running the bulk loader on that system (where you can install the Dgraph CLI) and skip all the Windows quirks.

I see, is there a loading limit for bulk loader? cause I know live loader tries not to go over 15MB per batch, I’m assuming that isn’t the case with Bulk Loader?

You’re right. The bulk loader is really only limited by the amount of disk space you have. Here’s a post about it when it was first introduced: Loading close to 1M edges/sec into Dgraph - Dgraph Blog

It seems that every time I run my script I’m getting the same error where I get stuck on my bulk command…

subprocess.CalledProcessError: Command ‘[‘docker’, ‘run’, ‘–rm’, ‘–name’, ‘dgraph-bulk’, ‘–network’, ‘dgraph-net’, ‘-v’, ‘C:\Users\VAIV\Desktop\GraphDB\data\output_nq:/data:ro’, ‘-v’, ‘dgraph_out:/dgraph_out’, ‘dgraph/dgraph:latest’, ‘dgraph’, ‘bulk’, ‘–zero’, ‘dgraph-zero:5080’, ‘–format’, ‘rdf’, ‘-f’, ‘/data/all_data.nq’, ‘-s’, ‘/data/schema.txt’, ‘–out’, ‘/dgraph_out’, ‘–cleanup_tmp=false’, ‘–reduce_shards’, ‘1’, ‘–map_shards’, ‘1’]’ returned non-zero exit status 1.

Do you know if this is like a specific error that I can fix?

Again, I think using proper Docker volumes, as opposed to try to mount a dir on your filesystem, is the way to prevent this error. See above.