Jekyll minification optimization

By Marshall Whittaker

Posted Aug 30, 2022 9 min read

Jekyll Theme

Jekyll minify intro

So as you can see, I build websites with Jekyll static site generator a lot. The problem with this is the jekyll implementation is usually used on GitHub for internal sites, so I couldn’t find a tool that did some things that I wanted to increase page load speed, clear offsite cache servers, and reduce bandwidth use before actually making a page live on the internet. I have two servers that run oxasploits.com, one under a staging domain, so it makes it easier to manage posts that are semi-live for testing, but are not yet propogated to the main server where they will display on oxasploits.com.

I wanted to increase my page rank, and one of the things you can do, is host a faster website. As much as I like Jekyll, it however does not reduce the number of files loaded by inlining Javascript and CSS into the HTML file, minify HTML files, compress images nor does it have any utilities for pushing the page live anywhere.

People just don’t like slow websites.

The entire script is hosted here, and on GitHub.

Next I’m going to go over the functions and give quick rundown on the necessecivity and logic behind each one.

Checking then building with Jekyll

So here we are just verifying that the necessary dependancies are installed before we start moving things around and rebuilding code, not knowing if it will even come to frutition.

function check_ok()
{
  echo "Checking if script requriements are met..."
  if [[ $(npm list inline-critical | grep inline-critical -c) -eq 0 ]]; then
    echo "Jekyll project pusher requires inline-critical out of npm!"
    exit 1
  fi
  if [[ $(npm list html-minifier | grep html-minifier -c) -eq 0 ]]; then
    echo "Jekyll project pusher requires html-minifier out of npm!"
    exit 1
  fi
  if [[ $(test -f /usr/bin/mogrify) -eq 1 ]]; then
    echo "Jekyll project pusher requires ImageMagick to compress images!"
    exit 1
  fi
  if [[ $(test -f /usr/bin/jo) -eq 1 ]]; then
    echo "Jekyll project pusher requires jo to buil json!"
    exit 1
  fi
  if [[ $(pwd) != ${buildroot} ]]; then
    echo "Sorry you need to be in ${buildroot}"
    exit 1
  fi
  new_site=0
  if [[ $(ls _site 2>/dev/null | wc -l) -ne 0 ]]; then
    new_site=1
  fi
  echo "Envionment:"
  echo "Ezoic API Key: ${apikey}"
  echo "Threads: ${procs}"
  echo "Compression Level: ${clvl}"
  echo "Domain: ${domain}"
  echo "Webroot: ${webroot}"
  echo "Buildroot: ${buildroot}"
  echo "Upload user: ${user}"
}

Then we have our build function below, which just locally builds the site, and checks to make sure it was properly built before continuing.

Image Compression

So one of the primary, and easier things you can do to reduce total page size, is to simply compress your images. Most JPG images are set at a default of around 70 as their compression level, which is great for say, your home image galary, it is balanced between looking good, and saving a little storage space. This however isn’t great for being loaded on the fly from a website… it’s just too much data, especially if your images are higher resolution.

function compress_img()
{
  echo "Compressing jpg to $clvl ..."
  rm ${new_images}/*.webp ${new_images}/*.webp 2>/dev/null
  cp -r ${original_images}/* ${new_images}
  mogrify -format jpg ${new_images}/*.webp
  mogrify -resize "630^x485" -quality ${clvl} -path "$new_images" "$new_images/*.webp"
}

So as you can see, this is my pretty simple compression function. It just takes a user defined, or default compression level (set lower) of 40. The smaller the number the smaller the file, but the worse it looks… so be cautious not to set this variable too low, or your images will look very pixelated, users don’t really appriciate that either. I also resize the image to slightly larger than the viewport in the DOM where the image is displayed in my Jekyll theme to have a maximum width of 630 pixels. We preserve the original images, instead of overwriting, in case something goes wrong… like the quality was sub par.

Inlining of CSS

So the theory of having external CSS files, is it’s just easier to manage, and you do not need to write the same or similar CSS code over and over in each page. However, it’s actually more efficient for the browser to take as much as it can in, in as few files possible. Also CSS can block page loading, so for example the DOM is ready and waiting, and the HTML has been sucessfully loaded by the browser, however it is waiting on a large CSS file to be loaded so that it has an idea of how the page should look, as far as colors, shapes, font faces, etc. The last reason we inline CSS is because the CSS inlining parser can decide what pieces of the then .css file will be used by the browser in each specific spot, so we coul have been loading dormat code that applied to a different page… that’s no good, just slowing things down and making the page bulkier and the process more memory intensive.

function inline_css()
{
  echo "Fixing CSS..."
  npx sass _sass/jekyll-theme-chirpy.scss -I _sass 2>/dev/null >css/style.css
  echo "Done merging css from scss..."
  (
    for f in $(find _site/ -type f -name '*.html'); do
      ((p = p % procs))
      ((p++ == 0)) && wait
      echo "Inlining .css to .html: $f"
      cp "$f" "$f.bak"
      npx inline-critical "$f" -c css/style.css -b _site/ >"$f.2" 2>/dev/null || mv "$f.bak" "$f" &
    done
  )
  for f in $(find _site/ -type f -name '*.html.2'); do
    mv "$f" $(echo "$f" | sed -e 's/.2$//')
  done
}

So our first objective in this code block is to generate the correct CSS file that would have been applied, by running the .scss through sass to merge the pieces, as they are designed to be dynamic. Then for each HTML file that jekyll build has genearted, we will open 8 threads, as the process is a little bit process intensive. Each HTML file then has CSS merged itno it, leaving only an HTML file left, with only what CSS is necessary for that page in it. We then, again, loop over each HTML file and move what we originally saved it as, back to what the filename should be. This process is to automatically correct if, say, there was a merge error in one of the CSS or HTML files, and the output was never generated, in that case, the original file still remains.

Minifying HTML

This function is similar to the inlining function above, and works nearly the same way, except that it’s primary function is to take the HTML and remove uncessary to it’s functionality things, such as whitespace, comments, fragments, empty elements.

function minify_html()
{
  echo "Minifying HTML"
  (
    for f in $(find _site/ -type f -name '*.html'); do
      ((p = p % procs))
      ((p++ == 0)) && wait
      echo "Minifying .html: $f"
      npx html-minifier --collapse-whitespace --process-conditional-comments --minify-css --minify-js --minify-html --remove-tag-whitespace --trim-custom-fragments --remove-comments --remove-empty-attributes --remove-empty-elements --minify-urls --continue-on-parse-error "$f" --output "$f.3" || mv "$f.bak" "$f" &
    done
  )
  for f in $(find _site/ -type f -name '*.html.3'); do
    mv "$f" $(echo "$f" | sed -e 's/.3$//')
  done
  find _site -type f -name "*.html.bak" -delete
  echo
  echo "Done minifying..."
}

This makes the page considerably smaller, and thus less data to transfer, and load, without changing the way the page acts in practice. Many, even very high traffic sites do this, including google.com, because the optimization, while not strictly necessary, does improve load time. It also is a CPU intensive process, so it is done in parallel.

Pushing it to the internet or staging server

So the only, athough somewhat anticlimactic, funtion necessary to the project is to push the contense of _site/ to our webserver. Here I do this with rsync, though in an earlier version it was scp. The rsync way seems to work better though becuase we can run a checksum on each file, and then when it comes time for it, only clear what parts of the cache that changed, instead of the entire site, hence increasing our cache hit rate.

function push_site()
{
  branch=$(git branch | tail -n 1 | cut -d ' ' -f 2)
  pushes=$(git rev-list HEAD --count)
  git rev-list HEAD --count | sed -e 's/^/v/' >version.txt
  rm _site/push.sh _site/push.err
  echo "Pushing your" ${pushes} "commit of" ${branch} "to" ${domain} "!!!"
  rsync -vrX _site/ ${user}@${server}:${webroot} | tee rsync.txt
  grep -E '.xml|.html|.webp|.webp|.js|.css|.txt' rsync.txt | sed -e 's|^|https://oxasploits.com/|' >modified.txt
  echo "https://oxasploits.com/" | tee -a modified.txt
  echo "Website is live on staging server!"
}

Clear the Cache and Housekeeping

Since I have a caching server in front of my webserver, and would otherwise need to manually clear the cache for each page I changed, to keep my cache hit rate high, I parse the rsync output from the previous section, and save it to a file called modified.txt, which we then run through jo, an on the fly json generator. Once we have our json output we cna then make an API call to my caching servers at Ezoic. An API key for this is ncessary, and can be specified through a command line argument. Clearing the cache is optional though, as the servers will hit it eventaully by design, but you wouldn’t be gaurenteed an up to date page otherwise.

function clear_cache()
{
  jo -p urls=$(jo -a <modified.txt) >cache_clear.txt
  echo "Trying to clear the cache using API key ${apikey}..."
  if [[ ${newsite} -eq 1 ]]; then
    curl -X POST "https://api-gateway.ezoic.com/gateway/cdnservices/purgecache?developerKey=${apikey}" -H "Content-Type: application/json" --data "{\"domain\":\"$domain\"}" && echo "\nAll cache cleared!"
  else
    curl -X POST "https://api-gateway.ezoic.com/gateway/cdnservices/bulkclearcache?developerKey=${apikey}" -H "Content-Type: application/json" --data "@cache_clear.txt" && echo "\nSelective cache cleared!"
  fi
  cleanup
  exit 1
}

and then clean up our temporary files…

function cleanup()
{
  rm cache_clear.txt rsync.txt modified.txt
  find _site -type f -name "*.html.bak" -delete
  find _site -type f -name "*.html.2" -delete
  find _site -type f -name "*.html.3" -delete
}

We then run a cleanup function, that removes all our temporary files used, the html.2 and html.3 files, modified.txt, rsync.txt, as well as cache_clear.txt.

Conclusion

So this little utilitiy has already considerably incrased my page load speed, as well as pushed me higher in the pageranks with a more responsive website. It also saves me a ton of time, and is designed to be easily modifiable to be used with other projects, for example Hugo, different types of cache server, etc.

Hope you have had as much fun reading, as I have writing this. Thank you!!

If you enjoy my work, sponsor or hire me! I work hard keeping oxasploits running!
Bitcoin Address:
bc1qclqhff9dlvmmuqgu4907gh6gxy8wy8yqk596yp

Thank you so much and happy hacking!

tools

This post is licensed under CC BY 4.0 by the author.

Trending Tags

exploit vulnerabilities PoC 0day code-injection config perl RCE walkthrough bitcoin

Jekyll minification optimization

Further Reading

Lock binaries in memory using vmtouch cache

The importance of autonomous backups

Backdoors embedded along side installers