Canonical links with Blocs

With the new clean urls we have nice tidy looking web addresses, however you still see the same page at example.com/about/ and example.com/about/index.html. From what I’ve been reading this is not great for SEO and we should tell Google to use one or the other. They are actually making some changes in April Official Google Webmaster Central Blog: Consolidating your website traffic on canonical URLs

This can be done in the header info by creating a meta tag e.g or via a 301 redirect and a blog article I read on Yoast suggested we should always use a redirect where possible and I wondered if others have been tried this and whether you have any recommended htaccess for it?

my question to you is. you have generated sitemap that says example.com/folder/ but does not have index at the end. Normally if the sitemap is uploaded ones to search console it will still index all changes in it. so what we are doing and that was with bloc2 am always removing .html extension and using a htaccess script like

Remove .html filenames

RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([^.]+)$ $1.html [NC,L]

Everything that google index comes from sitemap. so do you still upload a sitemap on search console for better indexing. Do you clean your sitemap from .html extension if you are not using the clean feature n blocs.

We never had a problem with indexing in this way. The only thing with this kind of usage is that you need to clean .html extension from all link pages manually. So the new feature do that for you.

Am still in rewriting the old page with blocs3 but after this we will have seo audience form independent company and they using a log of different tools. after last audience we had 46% better SEO.

Clearing those broken links from blogs/internal/external. adjusting img size, better H1 positioning, better keywords usage etc. it cost you a lot money to fix everything but chrome has nice tools to in inspect mode.

and sure if you call your domain as example.com/folder/index.html it will be always present.

In my sitemap I have it as https://example.com/about/ and I already use htaccess so that everything 301 redirects to https:// with no option for http or www. I also use an app called Scrutiny that creates more detailed sitemaps than Blocs.

All my sites are verified with Google using a site meta tag and looking inside the search console it is following what has been indicated in the sitemap, so that’s good, but if it can be further improved by adding a meta tag or edited htaccess that is worth doing.

When I checked at https://sitechecker.pro/ recently on a Blocs site it came up various warnings but the big one in red referred to the canonical links. It said we need to tell Google explicitly which is the canonical link for each page.

All of the internal links currently have 3XX status code yellow warnings. Sites I produced without the clean URLs in Blocs are not showing this and it shouldn’t be a problem. I don’t see this with my Rapidweaver sites.

I just tried your htaccess code but it didn’t work on my site. I may need to clear the cache with Litsespeed and Cloudflare.

The clean url structure is a mess. I would not use the clean url on any projects that you want to rank. You can always add a rewrite in the htaccess file but with most people on mobile you do not even see the .html anyway. I will be exporting all my sites again without the clean url checked.

@cableguy30 If I look at the structure on the server with an FTP client it’s essentially the same with Blocs and Rapidweaver. As confirmation the URL looks the same in the browser window. The only difference is that I have chosen not to consolidate the CSS and JS with my RW projects, because it’s counterproductive with http/2, so the relevant files appear inside each page folder. I also have QUIC enabled on the server, which is due to be renamed as http/3

All I can think of is that Blocs is not correctly communicating how those pages are linked and identified. In other words there is some difference between the internal linking and the exported version that is throwing up these warning errors. In RW I have file links set as “Relative to DOCROOT” and I suspect that is where Blocs is going wrong. If you know what is happening here it would be good to know and perhaps @Norm can then fix it.

The problem with the old way of doing page URLs is that it’s no good for campaigns and marketing.

Why is the old way bad for marketing?

When was the last time you saw or heard a link for a professionally run campaign on TV, radio, print or web that wasn’t a clean URL?

If you have a competition, sales promotion or anything similar the URL is always something like example.com/signup and never example.com/signup.php or example.com/signup/index.php. Not having clean URLs was a major impediment previously that made me reluctant to use Blocs for anything but smaller websites where I know they would never do any real marketing.

Clean URLs are the way to go and if there is a small bug in the current setup, I am sure Norm will find a solution.

I did have this issue, but solved it. Can’t remember how but I spent a lot of time working on my hatchess. My htaccess combines forcing https and www in one redirect with is standard but I’m sure there was something else. I have a canonical for my home page but don’t have it for any other pages.

However, I doubt anyone has the index.html indexed or in a sitemap, so Google shouldn’t know about it due to not being instructed to crawl it. Google hates duplication.

1 Like

Gotcha but can’t you use a rewrite in the htaccess for this to hide the extension?

@cableguy30 In theory yes, but I’ve yet to find one that works with my other htaccess without jamming and rendering the site inoperable. Done properly it should also include a / after the page name in the browser and that seems to be missing with these htaccess adjustments.

I was also reading something a while back that advised against doing this for SEO purposes, while I gather it can confuse some browsers or cause problems with a CMS. Basically, it’s all a whole lot easier if Blocs just exports clean urls properly with no bugs and I’m pretty sure it’s just something minor that needs an adjustment.

@Brocky120 At the moment I’m in the painful process of rebuilding my web design site using Blocs. It’s mainly painful because there will likely be more than 100 pages, so it’s just a lot of work that I’m having to squeeze in alongside client jobs. Yesterday I updated the canonical metadata on each page, one by one, because I want to be certain the site performs really well in the rankings if I’m going to do this much work.

" Yesterday I updated the canonical metadata on each page, one by one, " How do you do this?

1 Like

Go the page settings and click on add code. In the header area you add a line like this:

<link rel="canonical" href="https://example.com/about/"/>

The url should be the full url as you want Google to see the site. It should then ignore other variants in their index, even if it comes from an external linking pointing to your site. Just make sure it matches what you have in the sitemap.

1 Like

So this will get rid of the 301 redirect error on all my pages?

That’s strange if you are seeing a 301 error, because that would take you to an unknown page in theory. Are you sure it’s not a softer redirect error?

I don’t know if this metadata will get rid of the error, but it should tell Google to ignore any unwanted page versions.

I dont get an error when viewing site just the sitechecker gave me about 40 warnings and would like them gone

Yeah that’s understandable. I just checked again and I have four red warnings.

  1. Multiple H1 tags due to creating different hero blocs for each breakpoint. That’s something I can fix hopefully.

  2. Title check length. I can fix that.

  3. It shows the index page is incorrectly configured, so they don’t know if my index page is example.com or example.com/index.html. That is something that either a 301 redirect or an adjusted canonical meta tag should resolve.

  4. Bot check: This is strange, but it says the page name is not the same on either Google or Yandex. It basically says I am cloaking, which is a “violation of Google’s Webmaster Guidelines because it provides users with different results than they expected.”

@Norm When I look at internal links I can see where Blocs is going wrong compared to Rapidweaver. In Blocs all my pages appear example.com/about without the railing slash and this leads to a 301 code error before it provides a 200 OK code. In Rapidweaver the internal pages all include that trailing slash, so you would have example.com/about/ and there is no 301 error.

When I check a Blocs site without the clean URL this 301 error goes away, so I officially declare this a BUG. It also mentions that I should sort out the canonical link though, which can be done via that metadata adjustment.

Hi @Flashman
Yeah, normally the trailing slash is needed as it tells the browser to look in the folder for either an index.html file or an index.php file Most servers will default to look for index.html. Just be sure you don’t have both.

And that canonical stuff is going to be read as a redirect, so Google probably won’t like it :wink:

Bill
BricsDesign

1 Like

@Bill at the moment it looks like this is a bug with Blocs when selecting clean urls. I am pretty sure Norm could fix this.

If it goes to the right place in Blocs, those canonical metadata changes will be correct and shouldn’t represent any problem. I get the same canonical metadata warning on the Rapidweaver sites, but they don’t have the 301 errors.

So @norm needs to add the trailing slash or is there a way we can? So we can avoid the warnings

If you are talking about auto-generated links in the navigation or in the sitemap, then yes, Norm should patch those :wink: