Exporting WordPress Posts

At dodify we are currently working on migrating Olbia.it from WordPress to our home grown CMS. And of course the first challenge is migrating all the content. There are many plugins and options available that will export the WordPress posts in various formats but it immediately felt that converting from the XML (or similar) output of sed plugins to the format we require was more work than just being able to get access to the raw data! So SQL to the rescue. After a little fiddling and some help from Google we get the following query:

SELECT post.id                              AS `ID`, 
       -- Author 
       user.user_login                      AS `Author Username`, 
       user.user_email                      AS `Author Email`, 
       user.display_name                    AS `Author Name`, 
       -- Post 
       post.post_date                       AS `Date`, 
       post.post_title                      AS `Title`, 
       post.post_excerpt                    AS `Short Title`, 
       post.post_name                       AS `URL`, 
       post.post_content                    AS `Content`, 
       -- Post meta 
       (SELECT thumb.guid 
        FROM   wp_posts thumb 
               LEFT JOIN wp_postmeta meta 
                      ON meta.meta_value = thumb.id 
        WHERE  meta.meta_key = '_thumbnail_id' 
               AND meta.post_id = post.id 
        LIMIT  1)                           AS `Thumbnail`, 
       (SELECT GROUP_CONCAT(wp_terms.name SEPARATOR ', ') 
        FROM   wp_terms 
               INNER JOIN wp_term_taxonomy 
                       ON wp_terms.term_id = wp_term_taxonomy.term_id 
               INNER JOIN wp_term_relationships wpr 
                       ON wpr.term_taxonomy_id = 
                          wp_term_taxonomy.term_taxonomy_id 
        WHERE  taxonomy = 'category' 
               AND post.id = wpr.object_id) AS `Categories`, 
       (SELECT GROUP_CONCAT(wp_terms.name SEPARATOR ', ') 
        FROM   wp_terms 
               INNER JOIN wp_term_taxonomy 
                       ON wp_terms.term_id = wp_term_taxonomy.term_id 
               INNER JOIN wp_term_relationships wpr 
                       ON wpr.term_taxonomy_id = 
                          wp_term_taxonomy.term_taxonomy_id 
        WHERE  taxonomy = 'post_tag' 
               AND post.id = wpr.object_id) AS `Tags` 
FROM   wp_posts post 
       LEFT JOIN wp_users user 
              ON post.post_author = user.id 
WHERE  post.post_type = 'post' 
       AND post.post_status = 'publish';

The result will likely have most of the information you would care about from each WordPress post, namely:

  • the post ID;
  • the author username;
  • the author email;
  • the author name;
  • the publishing date;
  • the title;
  • the excerpt if available;
  • the URL. Note this will normally be directly on the domain/path WordPress is hosted under and can be used as the source of any redirects if the new system has different links;
  • the HTML content;
  • the thumbnail link;
  • the categories as a comma separated list
  • the tags as a comma separated list.

To get all the information the query joins on/uses quite a few tables: wp_posts, wp_terms, wp_postmeta, wp_term_taxonomy, wp_term_relationships and wp_users.

With the results from the query parsing the HTML content and downloading all linked images locally as well as the thumbnail images quickly becomes the next challenge. Hopefully it can be helpful to someone else on the web!