others-Jekyll unexpectedly return empty page and http status code 200 other than custom 404 page and code

1. The purpose of this post

After setup a website by using jekyll and nginx, sometimes, you would get a soft 404 issue from google search console like this:

New issue found:
Submitted URL seems to be a Soft 404

How to debug and fix this? Just follow me.

2. Environments

  • Jekyll
  • Nginx

3. Add the custom 404 page to your jekyll site

First of all, you should add your custom 404 page to your jekyll website like this:

3.1. Create a custom 404 page

You should create a file named 404.html or 404.md in the root of your website’s directory.

The 404 file in my directory is like this: 404 file

The content of 404.html or 404.md is as follows:

---
layout: default
permlink: /404.html
---

page not found! :(

rebuild and restart your jekyll, use curl to visit a page that is not existed:

curl http://yoursite.com/page_not_exist.html

We only got an empty page and http 200 code. It should be 404 page, What happened?

4. Soft 404 solution

3.1 Debug the soft 404 issue

At first ,let check what is a soft 404, according by google ,it’s :

A soft 404 is a URL that returns a page telling the user that the page does not exist and also a 200-level (success) code. In some cases, it might be a page with little or no content–for example, a sparsely populated or empty page.

If users visit a page that is not exist in your website, the returned http code should be 404 not 200.

You can verify the issue by execute this command:

curl -v http://yoursite.com/page_not_exist.html

you would get this:

 HTTP/1.1 200

And you get an empty page content. This is a soft 404 issue.

And because we use the nginx and jekyll, you should debug to check who cause the issue like this:

Check if it’s jekyll’s problem:

Because jekyll’s default port is 4000, we check as follows:

curl -v http://127.0.0.1:4000/page_not_exist.html

you would get this:

...
 HTTP/1.1 404 not found
...

That’s correct, then the problem must be in Nginx. However, if you get a 200 status code, then you should check your start script of jekyll, avoid using –detach, you can use this command to start your site:

nohup bundle exec jekyll serve > nohup.out &

Then ,retry to curl command, make sure you get a 404 status code.

3.2 Check Nginx configurations

Our default nginx configuration about 404 is as follows:

error_page 404 /404.html;
	location = /40x.html {
}

You can see that there is no 40x.html in our site, so we change it to this configuration:

error_page 404 /404.html;
location = /404.html {
	internal;
}

Here we use the internal to indicate that the url /404.html can only be used by internal ,not for users. If user input the url in the browser, it would not be returned.

Then restart your nginx service. You should get a 404 code.

6. Conclusion

When you use jekyll and nginx, there would be some tricks to do the 404 configurations. Make sure if you have any question, refer to the jekyll’s official document at first.