Uploading Files in Chunks
When uploading files larger than 100MB, you should divide them into 100MB pieces and upload them in chunks. There's several benefits to this including being able to resume uploads by only re-uploading missing chunks from a larger upload, to bypassing old but still present 2GB limitations in things like web proxies.
Another benefit from a development point of view is that some frameworks will load the file it's working with into memory so if you're sending a 2GB file, the application can use 2GB+ of RAM as it's just loaded the entire file into memory. By splitting the files in 100MB chunks the RAM consumption will never exceed that (plus the RAM requirement for the script itself). Just make sure you don't load the entire file into memory to split it.
Example Request using curl
In this example, we're taking bigfile.zip, and splitting into two files: bigfile.zip.00 and bifile.zip.01 and sending them individually like this:
curl -X POST -H "Accept: application/json" \
-H "Content-Type: application/zip" \
--user "Y9fdTmZdv0THButt5ZONIY:x" \
--data-binary "@bigfile.zip.00" \
https://liquidfiles.example.com/message/attachments/uploads?filename=bigfile.zip&chunk=0&chunks=2
curl -X POST -H "Accept: application/json" \
-H "Content-Type: application/zip" \
--user "Y9fdTmZdv0THButt5ZONIY:x" \
--data-binary "@bigfile.zip.01" \
https://liquidfiles.example.com/message/attachments/uploads?filename=bigfile.zip&chunk=1&chunks=2
In this example there are a few things to highlight:
- The individual chunk sizes doesn't matter. If you're sending three chunks it can be two big ones and one small, three of equal size or one big, one medium and one small. It doesn't matter.
- You can send chunks in any order that you want. as long as you number the chunks correctly. You can for instance begin by sending the second chunk with chunk=1, followed by the first one with chunk=0. It's the chunk number that will order the pieces correctly on the server.
- You will need to make sure that you get a http response code 200 (success) after each chunk, and resend any chunks that fail accordingly. It's only when all pieces are uploaded that the server will rebuild the attachment and give you the attachment id.
- The "filename" parameter needs to be unique for that user until the entire file has been uploaded.
For authenticated uploads (Secure Message attachments), we assume that the filename is unique for that
user. If you try to send multiple files with the same name for the same user at the same time, (the
user with the api key:
Y9fdTmZdv0THButt5ZONIY
in this example) there will be a right mess on the server as it has no way of distinguishing between the two different files if the name is the same.
When you're sending unauthenticated files, for instance uploading to a Filedrop, then chunks are unqique per session.
Checking what chunks have been uploaded
When chunks are uploaded, they are stored for a week before being removed (if the file was not uploaded completely). Another thing to note is that a chunk (or a file for that matter) is interrupted during transit, the half uploaded chunk (or file) will be discarded. If you want to resume uploads, you can query the server to see what chunks are available and upload any missing chunks.
Also, a final thing to note is that chunks are unique per user and per filename. We don't have an Attachment ID (that would otherwise be unique) until the file has been completely uploaded. This means that if the same user starts uploading a new file using chunks with the same filename as the old one, and if you use this API call to check what's already uploaded you may well end up completing the old file with the remains of the new file.
Checking Individual Chunks before each upload
This is the method that the LiquidFiles web interface is using. Before a chunk is uploaded, we query the server and if we get a "partial content" response (HTTP reponse code 206) that means that the chunk doesn't exist and we can go ahead and upload it. If the server response with an "ok" message (HTTP response code 200) that means that the chunk already exist and we can move to the next one.
Info | Value |
---|---|
Request URL | /message/attachment/uploads |
Request VERB | GET |
Parameter | Type | Description |
---|---|---|
filename | String | The filename of the uploaded file. |
chunk | Integer | The individual chunk you want to test. |
chunks | Integer | The total number of chunks. |
Example Request using curl
The following example will check if the first chunk (chunk 0) already exist on the system. If the reponse is a 206 response the chunk doesn't exist, if the response is 200, the chunk already exists.
curl -w "%{http_code}\n" \
--user "Y9fdTmZdv0THButt5ZONIY:x" \
-H "Accept: application/json" \
-H "Content-Type: application/zip" \
https://liquidfiles.company.com/attachments/available_chunks?filename=bigfile.zip&chunks=10&chunk=0
206
Or a more complete (bash) example how this would actually be used:
if [[ `curl -w "%{http_code}\n" \
--user "ojubiigauS2TxTy4ovk6Ph:x" \
-H "Accept: application/json" \
-H "Content-Type: application/zip" \
https://liquidfiles.company.com/attachments/available_chunks?filename=bigfile.zip&chunks=10&chunk=0` -eq 206 ]]; then
curl -X POST --user "ojubiigauS2TxTy4ovk6Ph:x" \
-H "Accept: application/json" \
-H "Content-Type: application/zip" \
--data-binary "@bigfile.zip.00" \
https://liquidfiles.company.com/attachments/binary_upload?filename=bigfile.zip&chunks=10&chunk=0
fi
So we first do a GET request for the filename, chunks and chunk to see if the chunk exist and if it doesn't we upload it.