import-command icon indicating copy to clipboard operation
import-command copied to clipboard

Improve speed of import when uploads are available locally

Open TimidRobot opened this issue 2 years ago • 2 comments

Feature Request

Describe your use case and the problem you are facing

I am importing a site with almost 4GB of media and almost 10,000 posts and it takes longer than I'd like.

Describe the solution you'd like

Potential solutions:

  • Check to see if file exists in wp-content/uploads/ already. If it does, use it instead of fetching it
    • To be fast this would need to be naive (matching only file name) and should probably only be enabled with an option
  • As above, but using --source-dir= or similar option
  • Allow attachements to be imported with file:// URLs.
    • Currently, file:// URLs result in errors like:
      Failed to import Media “media_name”: Request failed due to an error:
      A valid URL was not provided. (http_request_failed)<br />
      
      • The error remains when the http_request_host_is_external is updated to be __return_true

TimidRobot avatar Aug 15 '23 19:08 TimidRobot

Thanks for the suggestion, @TimidRobot !

This sounds like a great idea. We'd want to make it an opt-in behavior via flag or something similar. Here's another approach I've used in the past: https://danielbachhuber.com/two-wordpress-migration-performance-tips/

Feel free to submit a pull request, if you'd like. Here is some guidance on our pull request best practices.

danielbachhuber avatar Aug 21 '23 18:08 danielbachhuber

@danielbachhuber thank you for the link!

The method we're currently using is:

  1. Copy wp-content/uploads/ from source using rsync
  2. Modify web hosting to serve the copy
  3. Modify the WXR attachment URLs so they reference localhost (where the local web host is serving the copy)
  4. Temporarily modify http_request_host_is_external to be __return_true to allow localhost URL during import

This seems to have largely removed the attachment portion of the import as a performance bottleneck for us. However it's a solution that is very specific, complex, and unattainable in a lot of hosting environments.

TimidRobot avatar Aug 21 '23 22:08 TimidRobot