Googlebot: SEO Mythbusting

Myth busting With me today is Suz Hinton from Microsoft. Suz. What do you do at work And what is your experience with front-end and SEO So right now, I'm doing less front-end these days, I focus more on IOT. So in the time you were front-end developer yeah.

I was a friend of development, for I think, 12 or 13 years, and so I got to sort of work on lots of different Contexts in front-end development, different websites, things like that cool. Today I wanted to like just address like a bunch of stuff about Google, but specifically and nerd out about Googlebot, because That was the side of things that I was sort of the most confused about at the time. So Googlebot is basically a program that we run.

That does three things. The first thing is it crawls and it indexes and then last but not least, there's another thing. That is not really Googlebot anymore. That is the ranking bits. So we have to basically grab the content from the internet and then we have to figure out what is this content about? What is the is the stuff that we can put out to users looking for these things, And then last but not least, is which of the many things that we picked for the index is the best thing for this particular query in this particular time right, Yeah.

So, but the ranking that the last bit where we like move things around that is Informed by Googlebot, that it's not part of Googlebot. Is that? Because, like there's this bit in the middle of the indexing like the Googlebot is responsible for the indexing? Yes and making sure that that content is useful for the ranking engine to kind of Absolutely you can't imagine like someone has to in the library Someone has to like figure out what the books are about, and I get the index of the bits and a catalog.

The catalog being our index really and then someone else is using that index to make informed decisions And and like going like here. This book is what you're looking for. I'M really glad you use that analogy, because I worked in the library for four years And I was that person people be like. I want Italian cookbooks and I'm like well at 641.5495. You just say If I would come to you as a librarian and ask a very specific question like so: what is the best book on Making apple pies? Really quick? Would you be able to like figure out from the index of you probably have lots of cookbooks.

..? We did Yeah, we had a lot, but given that I also put lots of books back on the shelf. I knew which ones were popular I've, no idea. If we can link this back to Googlebot, but it does it's, it's the yeah, It's pretty much, so you have the index that yeah probably doesn't really change that much unless you add new books to new edition Right, exactly Yeah So you have this index, which Googlebot provides you with, but then we have the second, the librarian.

The second part that basically based on how the Interactions with the index work figure out, which books to recommend to someone asking for it. So that's that's it Pretty much! The exact same thing there, like someone figures out what goes into the catalogue and then someone uses their. I love this. This Makes total sense to me, but I guess that's still, not necessarily all the answers. You need right Yeah.

I just want to know like what does it actually do Like? How often does it crawl sites Like what does it do when it gets there Like? What is it sort of? How is it generally behaving like? Does it behave like a web browser Like it was a good question Yeah? Generally speaking, It behaves a little bit like a browser. At least part of it does so the very first step. The crawling bit is pretty much browser coming to your page either because we found a link somewhere or you submit a sitemap or There's something else that basically fed that into our systems.

You can use search console to give us a hint and ask for reenacting and that triggers a crawl before done that We ask for it to be done, and that is perfectly fine, But the problem then, obviously is is how often do you crawl things And how Much do you have to crawl and how much can the server bear right if you're, on the back-end side, You know that you have a bunch of load and that might not be always the same thing.

If it's like Black Friday, Then the load is probably higher. Then, on any other day, So what Googlebot does is it tries to figure out from what we have in the index already? Is that something that looks like we need to check it? More often, Does that probably like a newspaper or something got it Yeah Or is that something like a retail site that does have offerings that change every couple of weeks Or even do not change at all, because this is actually the site of a museum That changes very rarely like for the for the exhibitions? Maybe but like a few bits and pieces, don't change that much so we try to like Segregate our index data into something that we call Daily or fresh and that gets crawled relatively Frequently, and then it becomes less and less frequent as we discover and if it's Like something that is super spammy or super broken, We might not crawl it as often or if you specifically tell us.

Oh, don't no do not do not Index. This. Do not put this in the index. This is something that I don't want to Show up in the search results And we don't come back every day and check right, So you might want to use the reindex feature if that changes, you might have a page that you go like No. This shouldn't be here And then once it has to be there, you want to make sure that we are coming back and next thing again.

So that's the that's the browser bit. That'S the crawler part, but then a whole slew of stuff happens in between that happening. Us fetching the content from your server and The index having the data that is then being served and ranked So. The first thing is: we have to make sure that we discover if you have any other resources on your page Right. The crawling cycle is very important, so what we do is the moment we have some HTML from you.

We check if we have any links in there or images for that matter or article, something that we want to want to crawl as well, and That feeds right back into the the crawling mechanism. Now, if you have a gigantic Retail site, Let's say Just hypothetically speaking: We can't just like crawl, all the pages at once, both for our restorative resource constraints, But also we don't want to overwhelm your service, so we basically Try to figure out how much we can Put how much strain we can put on your service and how much resources we've got available As well, and that's called the crawl budget oftentimes.

But it's pretty tricky to determine so one thing that we do is we crawl a little bit and then basically ramp it up and when we start seeing errors, we Ramp it down a little bit more. So like? Oh sorry for that, Oh So, whenever your service serves us 500 errors and there are certain tools in search console that allow you to say like hey, Can you can you maybe like chill out a little bit, But generally we don't try to get all of it At once, and then then ramp down We're trying to like carefully ramp up rent down again ramp up again run down like answer, it fluctuates a little bit.

There'S a lot more detail in there than I was even expecting like. I didn't even know that I guess I never considered that a Googlebot, like sort of crawling event, could put strain on somebody's website like That sounds like it's a lot more common than I even thought it does. It does happen, Especially if we discover Say a page that has like lots of links to sub pages. Then all of these go into the crawling queue got it and then you might like these have links to.

Let'S say you have like a 30 different categories of stuff and each of these have A few thousand products and then a few thousand pages of products. So we might go like oh cool Crawl and then we might crawl like a few hundred thousand pages. And if we don't Spread that out a little bit, So it's a weird balance right on one hand, If you add a new product, you want that to be surfaced in Search as quickly as possible.

On the other hand, You don't want us to take all the bandwidth that you serve. I mean cloud computing makes that a little less scary. I guess, but I remember the days, I'm not sure if you remember the days, But you had to like call someone And they asked you to send a form or fax a form and then, like two weeks later, you get the confirmation Letter that you server has Been stuck Yes, I remember the days when we would have to call and then we would basically pay $ 200 to have a human like go down.

The aisles like push the physical reset button on the server So yeah those those times And then imagine you basically renting five servers somewhere in the data center Yeah and that taking a week and then we come in and scoop up all your bandwidth, Hey we're offline. Today, because Google has its crawl day that that's not what we want Yeah these days, it's more of like hacker news kind of moment, waiting Yeah exactly So, I feel, like you, have much more Considerate and yeah.

We try to not overwhelm anyone and we respect the robots.Txt So that works within the crawl step as well, And once we have the content, we can't put strain on your infrastructure anymore, So that's fantastic, But modern web apps being mostly JavaScript. We then put that in a queue And then once we have it, we have the resources to render it. We actually use another headless browser kind of thing. We call that the web rendering service then there's other crawlers as well, That might not have the capacity or the need to run JavaScript.

This is like social media butts, for instance, They they come and look for metadata. If that meta tag is coming into the JavaScript, you usually have a bad time and they're, just like Sorry yeah, so that's always been a big myth is, and I remember, when single page applications or SPAs really came into vogue. A lot of People were really concerned. There'S a lot of FUD around if Crawlers in general, don't execute JavaScript, then they're going to see a blind page, and how do you get around that So so contextually within Googlebot? It sounds like Googlebot executes JavaScript, Even if it does do it at a later point.

Yes, so that's Good, that's good, But like is there anything that people need to be aware of beyond just oh well, It'll just run it and then it'll see exactly the same thing as like a human with a phone or a desktop. Let'S see, There's a bunch of things that you need to be aware, So the the most important thing is again, as you said, it's deferred, It happens at a later point. So if you want us to crawl your stuff as quickly as possible, That also means we have to wait to find these links.

That JavaScript injects Wait, they're. Basically, we crawl. We have to wait until javascript is executed. Then we get the rendered HTML and then we find the link So there's a nice little short loop that finds these links very relatively quickly. Right after crawling will not work right, So we will only see the links after we render it, and this rendering can take a while, because the web is surprisingly big yeah, just a little bit like 30 trillion ducks in 2016, so I'll say now, there's way more Than that, Yes more than that, so so Robots.

Txt is very effective at being able to sort of tell much how to do a certain thing. But in this scenario like how do you tell that, like it's Googlebot visiting your site? Yes, question? Yes, So as we are basically using a browser in two steps, one of the crawling and one is the the actual rendering At both of these moments. We do give you the user, agent header, But basically there's the string. It'S literally the string Googlebot in there.

That'S so right, straightforward, Yes, and you can actually use that to help with your SPA Performance as well. So, okay, as you can detect on the server side. Oh, this is a Googlebot user agent requesting you might consider sending us a pre-rendered static, HTML Version, and you can do the same thing for the others. Like all the other search engines and and social media Bots, have a specific string saying that they are a robot.

Okay, so you can then basically go like oh in this case, I'm not I'm not giving you the real deal that the single page app I'm giving you this HTML that we pre-rendered for you, That's called dynamic, rendering we have ducks on that as well. The one thing that still doesn't quite make sense to me is: Does the Googlebot kind of have different contexts like? Does it sometimes pretend that it's sort like I, I think of it as this little mythical creature? That'S pretending to do certain things so like does it pretend to be on a mobile and then desktop like? Are they different sort of, I guess, like user agents, Even though it still says Googlebot, and can you differentiate between them you're asking great questions because, yes, we have different user agents.

So I'm not sure if you heard about more by first indexing being rolled out and happening. I'Ve heard that, like it's going to affect like how you're ranked Potentially That's two different things that get conflated so often So mobile- first indexing is about us discovering your content using a mobile user agent and a mobile viewport. So we are using mobile user agents and, and the user agent string says so if it says something about Android and the name And then you're like aha.

So this is the mobile Googlebot. You have documentation on there, There's literally a Help Center article that lists all these things. So we try to index mobile content to make sure that we have something nice to serve for people who are on mobile, But we're not pretending like random user-agents or anything that we stick to the user agent strings that we have documented as well. And that's more, my first indexing where we try to get your mobile content into the index rather than the desktop content, huh And then there's mobile readiness or mobile friendliness.

If your page is mobile-friendly, it makes sure that everything is within viewport and you have large enough tap targets and all these kind of lovely things and that just Is a quality indicator. We call these signals. We have over 200 of them. That'S a lot So Googlebot collects all these signals and then stuffs them as metadata into the index And then when ranked we're like okay, so this user's on mobile.

So maybe this thing that has a really good mobile friendliness Signal attached to it might be a better one and the thing where they have to like pinch, zoom, all the way out to be Able to read anything and then can't actually deal with the different links. Because they're too close to each other, so that's one of the many it's not the signal it's one of the many signals is one of the over 200 signals to to deal with.

I Had no idea They were 200 right. That'S like me. I know that you're not allowed to like share what they all out because, like there has to be a certain mystique around it, because I guess, like a lot of SEOs abused, that in the past Yeah yeah. Unfortunately, that is a game that is still being played and people are doing like weird stuff to try to game us, And the interesting thing with this is with the 200 signals, It's really hard to say which one gets you would like weights And they keep moving And they keep changing So it's I love when people are like no, let's do this and then look my my rank changes like yeah for this one query, But you lost on all the other queries because you did like really weird and funky stuff for that.

So just Build good content for the users and then you'll be fine. I feel like that. It feels like less effort as well and like constantly trying to yeah Yeah, but it's not an easy answer. Right. You'Ll pay me to make you more successful on on Search engines, and I come to you and say like so who are your users and what do they need and how could you express That so that they know that it's what they need? That'S a hard one because that means I basically bring the ball back to you and you have to think about stuff and figure out strategically, whereas if I'm like, okay, I'm just going to you know, Get you get you links or do some funky tricks here and Then you'll be ranking number one, That's an easier answer.

It'S the wrong answer, but it's the easier answer so people are like and links are the most important metric ever is I'm like no, we have over 200 and it's important, but it's not that important and Chill out everybody, but this still happens Yeah, I'm so glad It'S better now Like, I feel I feel actually we're at peace in general with SEO as well Suz. Thank you so much for being with me here and has been a great pleasure.

You know Thanks for I, like answering all of my weird and wonderful questions about the Googlebot. Did we bust some myths? I feel like we did Fantastic. I think that's worth a high five. I say Thanks. Thanks join us again for the next episode of SEO, myth-busting, where Jamie Alberico and I will discuss if JavaScript and SEO can be friends and how to get there.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Leave a Comment Cancel Reply