Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Why google stubbornly keeps indexing my http urls instead of the https ones?
-
I moved everything to https in November, but there are plenty of pages which are still indexed by google as http instead of https, and I am wondering why.
Example: http://www.gomme-auto.it/pneumatici/barum correctly redirect permanently to https://www.gomme-auto.it/pneumatici/barum
Nevertheless if you search for pneumatici barum: https://www.google.it/search?q=pneumatici+barum&oq=pneumatici+barum
The third organic result listed is still http.
Since we moved to https google crawler visited that page tens of time, last one two days ago. But doesn't seems to care to update the protocol in google index.
Anyone knows why?
My concern is when I use API like semrush and ahrefs I have to do it twice to try both http and https, for a total of around 65k urls I waste a lot of my quota.
-
Thanks again Dirk! At the end I used xenu link sleuth and I am happy with the result.
-
Hi Massimiliano,
In Screaming Frog there is the option: Bulk Export > All inlinks -> this generates the full list of all your internal links with both source & destination. In Excel you just have to put a filter on the "Destination" column - to show only the url's starting with "http://" and you get all the info you need. This will probably not solve the issues with the images. For this the next solution below could be used.
The list can be quite long depending on the total number of url's on your site. An alternative would be to add a custom filter under 'Configuration>Custom' - only including url's that contain "http://www.gomme-auto.it" or "http://blog.gomme-auto.it" in the source, but in your case this wouldn't be very helpful as all the pages on your site contain this url in the javascript part. If you change the url's in the Javascript to https this could be used to find references to non https images.
If you want to do it manually, it's also an option - in the view 'internal' of the crawler you put "http://" in the search field - this shows you the list of all the http:// url's. You have to select the http url's one by one. For each of the url's you can select "Inlinks" at the bottom of the screen & then you see all the url's linking to the http version. This works for both the html & the images.
Hope this helps,
rgds
Dirk
-
Forgot to mention, yes I checked the scheme of the serp results for those pages, is not just google not displaying it, it really still have the http version indexed.
-
Hi DC,
in screaming frog I can see the old http links. Usually are manually inserted links and images in wordpress posts, I am more than eager to edit them, my problem is how to find all the pages containing them, in screaming frog I can see the links, but I don't see the referrer, in which page they are contained. Is there a way to see that in screaming frog, or in some other crawling software?
-
Hi,
First of all, are you sure that Google didn't take the migration into account?I just did a quick check on other https sites. Example: when I look for "Google Analytics" in Google - the first 3 results are all pointing to Google Analytics site, however only for the 3rd result the https is shown, even when all three are in https. So it's possible it is just a display issue rather than a real issue.
Second, I did a quick crawl of your site and I noticed that on some pages you still have links to the http version of your site (they are redirected but it's better to keep your internal links clean - without redirections).
When I checked one of these pages (https://www.gomme-auto.it/pneumatici/pneumatici-cinesi) I noticed that this page has some issues as it seems to load elements which are not in https - possible there are others as well.
example: /pneumatici/pneumatici-cinesi:1395 Mixed Content: The page at 'https://www.gomme-auto.it/pneumatici/pneumatici-cinesi' was loaded over HTTPS, but requested an insecure image 'http://www.gomme-auto.it/i/pneumatici-cinesi.jpg'. This content should also be served over HTTPS.
The page you mention as example: the http version still receives two internal links from https://www.gomme-auto.it/blog/pneumatici-barum-gli-economici-che-assicurano-ottime-prestazioni and https://www.gomme-auto.it/pneumatici/continental with anchor texts 'pneumatici Barmum' & 'Barum'
Guess google reasons, if the owner of the site is not updating his internal links, I'm not going to update my index
On all your pages there is a part of the source which contains calls to the http version - it's inside a script so not sure if it's really important, but you could try to change it to https as well
My advice would be to crawl your site with Screaming Frog, and check where links exist to http versions and update these links to https (or use relative links - which is adviced by Google (https://support.google.com/webmasters/answer/6073543?hl=en see part 'common pitfalls')
rgds
Dirk
-
Mhhh, you are right theoretically could be the crawler budget. But if that is the case I should see that from the log, I should miss crawler visits on that page. Instead the crawler is happily visiting them.
By the way, how would you "force" the crawler to parse these pages?
I am going to check the sitemap now to remove that port number and try to split them. Thanks.
-
Darn it, you are right, we added a new site, not a change of address, sorry about that. Apparently my coffee is no longer effective!
-
As far as I know the change of address for http to https doesn't work, the protocol is not accepted when you do a change of address. And somewhere I read google itself saying when moving to https you should not do a change of address.
But they suggest to add a new site for the https version in GWT, which I did, and in fact the traffic slowly transitioned from the http site to the https site in GWT in the weeks following the move.
-
Are you sure? On https://support.google.com/webmasters/answer/6033080?hl=en&ref_topic=6033084 it says: "No need to submit a change of address if you are only moving your site from HTTP to HTTPS."
I dont think you are given the option to select the same domain for change of address in GWT.
-
Looks like you are doing everything right (set up 301 redirects, updated all links on the site, updated canonical urls) - just need to force the crawlers to parse those pages more. perhaps crawler is hitting its budget before it gets to recrawl all of your new urls?
You should also update your sitemap as it contains a bunch of links that look like: https://www.gomme-auto.it:443/pneumatici/estivi/pirelli/cinturato-p1-verde/145/65/15/h/72
I recommend creating several sitemaps for different sections of the site and seeing how they are indexed via GWT.
-
Did you do a change of address in Google Webmaster Tools? Http and Https are considered different URLs, and you will have to do a change of address if you switched to a full https site.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Whatstuffwherebot user agent messing up Google Analytics
Starting yesterday, Aug 26, 2020, I noticed a new bot crawling our site with user agent whatstuffwherebot. Google Analytics is counting these hits as human traffic, completely throwing off my numbers - yesterday, Analytics reported nearly triple my typical number of visitors. As of now, Search Console only shows data through Aug 25 so I don't know if Search Console is also affected. Is anybody else seeing something similar? Does anybody know what the whatstuffwherebot bot is? I don't get any results when I search on Google or Bing. For what it's worth, the traffic is coming from Columbus, OH, running over Amazon AWS via 278 different IP addresses so far. Also, WordFence (my WordPress security plugin) correctly identifies these hits as bot traffic.
Reporting & Analytics | | ahirai0 -
What is Local SEO in Google Analytics (Organic Source)
Recently, I saw "Local SEO" is mentioned as the organic source. Can someone please tell what is this and from where Google is fetching data for this source?
Reporting & Analytics | | Kevin.Monks0 -
% Change - Google analytics - how to calculate?
Hi All,
Reporting & Analytics | | JohnPalmer
I have two dates with two different numbers I want to calculate the "% Change" like google analytics, The numbers of June 2015 - 127,931 sessions
The numbers of June 2014 - 90914 sessions please tell me what is the %Change. Best. J0 -
Referral Traffic from Google
Hello, I have a question about my company's new website. I've worked in SEO and studied Google Analytics results for a few years now but have never really come across something like this. I started in this position in January of this year and when I started breaking down the traffic sources in Google Analytics, I noticed most of the traffic was coming from Google.com as a referral source. I had never seen Google.com as a referral source before so I looked into options for what it could be. It was not a paid ad and our organic traffic was coming through in Analytics, Before I could get any further, our new website was launched (we switched CRM's to WordPress) and the referral traffic from google went from 2,966 in January of 2015 to 22 in February 2015. for more comparison, in February of 2014, the referral traffic from Google was 2,496. I expected a drop when we switched CRM's but we correctly re-directed all pages and created a new sitemap and our organic traffic is up since the switch (not enough to cover drop in referral). I thought at first this had to do with our Google sellers account being de-activated when we made the switch, but I quickly fixed this over a month ago and no change. I'm wondering if anyone has ever seen Google.com come through as a referral source in Google Analytics and if they we're able to figure out what it actually was. This would be a great help! Thank you, Alex
Reporting & Analytics | | RASEO1 -
Getting google impressions for a site not in the index...
Hi all Wondering if i could pick the brains of those wise than myself... my client has an https website with tons of pages indexed and all ranking well, however somehow they managed to also set their server up so that non https versions of the pages were getting indexed and thus we had the same page indexed twice in the engine but on slightly different urls (it uses a cms so all the internal links are relative too). The non https is mainly used as a dev testing environment. Upon seeing this we did a google remove request in WMT, and added noindex in the robots and that saw the index pages drop over night. See image 1. However, the site still appears to getting return for a couple of 100 searches a day! The main site gets about 25,000 impressions so it's way down but i'm puzzled as to how a site which has been blocked can appear for that many searches and if we are still liable for duplicate content issues. Any thoughts are most welcome. Sorry, I am unable to share the site name i'm afraid. Client is very strict on this. Thanks, Carl image1.png
Reporting & Analytics | | carl_daedricdigital0 -
Google as referring domain
Hi all, a colleague asked a question, which I could not answer (never even noticed this "problem") 😞 When we are logged into our GA account and go the referring domains section, we find Google. I always thought that these visitors came via Google Image Search, but not all of them do. Most of them come via "/imgres", but some come via "/" (always thought that "/" was the homepage?), "/url" and "//" Maybe I am just stupid, but honestly I could not explain what these strings mean... or how these visitors landed on our site... Can you help me???
Reporting & Analytics | | accessKellyOCG0 -
Localhost:4444 Showing Up in Google Analytics
Hello All, Lately in my Google Analytics account I have noticed a referral source labelled: localhost:4444 The number of visits is really high from this source, but I have no idea (no clue!) what it actually means. Can anyone shed some light on what this is about? Should I be creating some sort of filter to screen out this as a referral source (assuming it is not legitimate)? Many thanks in advance. Cheers!
Reporting & Analytics | | Robert-B0 -
How to remove unwanted dynamic parameters from a URL in Google Analytics
Hi, Would really appreciate some help with this. I have been experimenting with RegEx to achieve this but as I’ve never used it before am currently failing miserably. We have conversion pages i need to set goals for that are formatted as below: https://www.domain.co.uk//Application_Form/(S(ewhbqp5cki0mppuzukunkqno))/enterCardDetails.aspx I need to remove the (s(xxx)) section from the URL as rather than one pages i currently have thousands of unique URL's. What’s catching me out is that as it’s not a URL parameter I can’t discount and as half way through can’t just do head matches etc to /entercarddetails Help would be much appreciated. Thanks.
Reporting & Analytics | | Sarbs0