WordPress Static Site Generator – The PHP Script for Pages and Posts

WordPress Static Site Generator I call this script my WordPress Static Site Generator, WPSSG for short. The script file name is “wpssg.php”.

I’ll explain the the script section by section. First, we set up the default variables and make them work for both Windows and Linux. The script will probably work on any UNIX variant.

I’ve never claim to be a PHP guru of any sort. I use the code I know will work and I don’t worry about how clean it is. If you know what DRY means, you should probably know I don’t worry about that either.

Static File Locations

You want to keep your root directory on the server uncluttered. Use subdirectories for assets like CSS and JavaScript files that can’t be replaced by CDN versions. Everything else goes in the root directory.

You won’t have a problem with the number of files in the root directory until you have millions of files in it. So… don’t worry about it.

The Helper Function and the Default Variables

The function augments the “file_put_contents” function. Windows will ignore the extra code for permissions but Linux needs it.

function my_file_put_contents($fname, $fcontent, $user, $group) {
  file_put_contents($fname, $fcontent);
  chmod($fname, 0644);
  chown($fname, $user);
  chgrp($fname, $group);
}
$path          = '/home/rtcx/wordpress';   // WordPress document root
$master        = $path . '/static_master'; // the master directory of pages
$new           = $path . '/static_new';    // the new and changed pages
$real_site_url = 'https://www.rtcx.net/';  // the real site URL
$timezone      = 'America/Los_Angeles';    // this is for the sitemap
$user          = 'rtcunningham';           // for Linux
$group         = 'rtcunningham';           // for Linux
$ext           = '.html';                  // permalink file name extension
if (isset($argv[1]) && $argv[1] == '-new')
  $pagesdir = $new;
if (isset($argv[1]) && $argv[1] == '-master') 
  $pagesdir = $master;

The variables should be obvious, except for the last two. It’s so you can run “php wpssg.php -new” for generating new files or “php wpssg.php -master” for regenerating all of them. If you’re afraid to run commands from a terminal command line, then you probably shouldn’t be doing this at all.

If you choose to forego a file extension, remove “.html” from the variable, so it’s just “” or “.

Load WordPress and Queue the Pages and Posts

There’s more than one way to load WordPress, but this is what I recommend. While we’re at it, we’ll set the site URL as it is on the local machine. We’ll be replacing it with the real site URL later.

include $path . '/wp-load.php';
$site_url = home_url('/');
$results = get_posts(array('numberposts' => -1, 'post_type' => array('page', 'post')));
foreach ($results as $post) {
  $slug = $post->post_name;
  //if ($post->ID == $home_page_id)
    $slug = 'index';
  $queue[] = $slug;
}
$queue[] = 'index';

The commented out line bears an explanation. If you use a static home page instead of a posts page, you have to uncomment that line and remove the last line in the section.

The Page and Post Loop

I run this from a terminal command line and I don’t like seeing a command just sitting there after I hit enter. So… I added a progress indicator to every section.

The foreach loop is where I make the changes. If you have any changes to make, that’s where you do it.

Since the trailing slash is forbidden, categories and tags use a dash instead of it. The plugin that adds an extension only does so with pages. I have to do it manually, here, and that’s why you see “$ext” in the loop. See Add File Name Extensions to WordPress Pages, Category and Tag Pages.

echo "\n"; // progress indicator starts
foreach($queue as $slug) {
  $url = $site_url . $slug . $ext;
  if ($slug == 'index')
    $url = $site_url;
  $page = file_get_contents($url);
  $temp = explode("n", $page);
  $page = '';
  foreach ($temp as $line) {
    $line = str_replace(trim($site_url, '/'), trim($real_site_url, '/'), $line );
    $page .= $line . "n";
  }
  $page = trim($page) . "n";
  $file = $pagesdir . '/' . $slug . $ext;
  if ($pagesdir == $master) {
    my_file_put_contents($file, $page, $user, $group);
  } else {
    if (file_exists($master . '/' . $slug . $ext)) {
      $master_page = file_get_contents($master . '/' . $slug . $ext);
    } else {
      $master_page = '';
    }
    if ($page != $master_page) {  // compare new file (in memory) to original file
      my_file_put_contents($new . '/' . $slug . $ext, $page, $user, $group);
    }
  }
  if ($slug == 'about') 
    $saved = $page; // (404 page to be based on it)
}

The last part is for constructing a 404 page that looks like the rest of the pages. Of course, you have to tell the web server to use this 404 page. It isn’t necessary as a generic 404 page will still be served if you don’t use it. It’s just a nice to have type of thing.

Much of this is specific to my site. You’ll need to go through it and adjust it for your uses.

The 404 Page

If you want to use a custom 404 page, you need this section. Otherwise, you don’t.

echo "."; // progress indicator continues
$new_content = "<div class="notebox">
The page you are trying to reach does not exist, or has been moved. Please use the menus or the search box to find what you are looking for.

</div>";
$page = $saved; // (based on about page)
$page = str_replace('<title>About RTCXpression</title>', '<title>Page Not Found - RTCXpression</title>', $page);
$page = str_replace($real_site_url . 'about.html', $real_site_url . '404.html', $page);
$page = str_replace('<h2>The X Stands for Expression</h2>
', '<h2>Error 404. <span>Page not found!</span></h2>
', $page);
$page = my_str_replace('<article class', '</article>', $new_content, $page);
$temp = explode("n", $page);
$page = '';
foreach ($temp as $line) {
  if (strstr($line, '<meta name="desc'))
    continue;
  if (strstr($line, '<meta property'))
    continue;
  if (strstr($line, '>About</a>'))
    $line = '<li id="menu-item-3160" class="menu-item menu-item-type-post_type menu-item-object-page menu-item-3160"><a href="https://www.rtcx.net/about.html">About</a></li>';
  if (stristr($line, '<link rel="canonical"'))
    $line = '<meta name="robots" content="noindex,follow">';
  $page .= $line . "n";
}
$file = $pagesdir . '/404' . $ext;
if ($pagesdir == $master) {
  my_file_put_contents($file, $page, $user, $group);
} else {
  if (file_exists($master . '/404' . $ext)) {
    $master_page = file_get_contents($master . '/404' . $ext);
  } else {
    $master_page = '';
  }
  if ($page != $master_page) {  // compare new file (in memory) to original file
    my_file_put_contents($new . '/404' . $ext, $page, $user, $group);
  }
}

In the loop, I remove the original meta description, the original Open Graph tags and replace the “canonical” line with a meta robots line. A 404 page shouldn’t be indexed so it doesn’t need a canonical link.

More to Come

Since this article is already longer than I like, I’ll continue with the script in the next article.

Articles in this Series

This is a list of all the articles in this series. You should read each article in the order they’re presented. You could miss something important if you skip around.

Share this: