Fresh Content and Crawl Budget: What You Need to Know

Does fresh content help SEO? Is it gonna increase your crawl budget?

What I think is important for SEOs to check is how many other websites and how many other web pages are on the same IP address, the same web server as their websites especially if they’ve got cheap hosting (you can find web hostings for 2$/month).

So you’re super happy because you found THE deal, the cheapest hosting on the market, and then you find out that there are 200 other websites on the same server.

You’ve got ten pages on your website, and then there’s this other one that’s got a full copy of Wikipedia on his website, something like millions and millions of pages. So what happens then? Google is sitting there, crawling this other website so much so there’s a moment where it won’t have enough time to hit your pages as well.

Majestic has a neighbourhood checker tool that lets you see what other websites are on the same IP number, so it’s worth going have a look

Cloaking: Black Hat Technique?

Supposedly, cloaking is a Black Hat SEO technique.

But, when you think about it, it’s also very white hat to try and get rid of all those bots; Majestic included. They are taking all the bandwidth from your server and of those bots are aggressive.

Majestic is pretty aggressive, but it’s not the worst.

I like to send them to a 200, so they land on a 200 header, nothing to see, let’s check somewhere else. I don’t want my stuff to be on all those tools.

Of course, if you’re doing this wearing the Black Hat that’s a red flag but if you are clean, you shouldn’t have any problem.

How Many Bots in Your Traffic?

Some website does a study every year of how much traffic is human.

About 50% of your website traffic is non-human

You might want to block Majestic or Ahrefs because you want to stop your competitors from seeing your links, for example. Be careful if you do that because it can be a red flag.

Also, if you start blocking all the bots, don’t be surprised if you’re getting a lot less traffic from human beings, as a result of the fact that your information isn’t getting syndicated to places that human beings need to see it.

Well, I tell you everything like it is but if you try anything shady, it’s at your risks.

Googlebot Does Not Obey Crawl Delay

Googlebot doesn’t obey craw delay in robots.txt.

That’s why fresh content is probably not going to increase the crawl budget.

Googlebot will probably work out that the site is getting updated faster and go to see the fresh content more regularly, but that could be at the expense of recruiting the old content.

The power of the page, the PageRank is more of a signal than the freshness of the content.

Maybe the freshness of the content is more a lousy habit of crawling for Googlebot. If it ever figures out that every day at 7 a.m. you publish new content, it’s going to register this information, and it’s going to come back every day at 7 a.m.

So, you have two types of “Fresh Content”:

The fresh content that you’ve never seen before, a brand new article that Googlebot has to search for and find,
Fresh content in the way where you modify a content that already exists

And that two types of “fresh content” are blending and they become an infinite number of new pages coming up on the web.

The question is how much of blend do you want? Do 50% of new crawl and 50% of re-crawl?

It’ll work for a while, but at a point, you will have so much stuff to re-crawl that 50/50 won’t work anymore.

That balance is always a challenge for a crawler. From the owner of the website point of view, you have to understand that if you translate the time that Googlebot is going to spend on your site into a number of pages; for example, you have 1000 pages and the crawl time allocated to your website is 1000 pages. The bot is probably going to crawl 800 times the home page and 200 other pages.

So this is undoubtedly another reason why Google wants your website to be fast.

Log Analysis

Checking out the logs, checking out what Googlebot does on your website is probably more important from an SEO perspective than the analytics.

Your principal visitor is Googlebot.

Back in the days, we had terrible tools to check the files, and if you had a big website, you needed a big powerful computer.

Now, we get a lot of powerful tools to check the logs; for auditing at one point and for monitoring the logs.

You can also crawl your website with a crawler like Screaming Frog or Zennoposter, and you see how your crawler sees your site.

It’s all about logs. If you are an SEO, you have to check out and monitor the logs of your website.

Googlebot & Small Websites

If you have a small website, you shouldn’t be having any crawl problem. Google should see your sites and pages pretty quickly.

Google should be easily able to crawl small sites within a short period, every time new content gets up.

But it won’t go faster because you’re putting up new pages just for Google.

So, in conclusion:

Fresh content might help the crawl habit; not the crawl budget

Next week, July 13th, we’re going to talk about the Google guidelines: are they the law of SEO?

White Hat vs Black Hat edition, we had fun on that one!

See you next week, thanks for watching and don’t forget to subscribe to my newsletter where I’m going to explain, step by step, how I build a brand new website, the perfect site, with the Topical Mesh => I’m Joining The Adventure.

Listen to the podcast

Watch the video

Latest Blogs

Cloaking: Black Hat Technique?

How Many Bots in Your Traffic?

Googlebot Does Not Obey Crawl Delay

Log Analysis

Googlebot & Small Websites

Related Posts