Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
ScreamingFrog won't crawl my site.
-
Hey guys,
My site is Netspiren.dk and when I use a tool like Screaming Frog or Integrity, it only crawls my homepage and menu's - not product-pages.
Examples
A menu: http://www.netspiren.dk/pl/Helse-Kosttilskud-Blandingsolie_57699.aspx
A product: http://www.netspiren.dk/pi/All-Omega-3-6-9-180-kapsler_1412956_57699.aspxIs it because the products are being loaded in Javascript?
What's your recommendation?All best,
Fred. -
Hi,
Thank you for this question and the responses because we encountered the same issue; Screaming Frog was only crawling a handful of products out of hundreds, because of JS. We made significant changes to the redirect rules on our dev site, and we want to make sure that the changes will not cause any crawling errors before we deploy to the live site. Is there any way to disable JS just for the purpose of a Screaming Frog crawl?
Our dev site is: https://msc-nop.com
Our regular site is: https://medicalscrubscollection.com
Thanks in advance!
-
I'm not sure if this has been fixed already, and thank you for Dan for chiming in, but I was able to crawl around 700 URLs.
-
Cheers @Andy & @Patrick
Hi Fred,
I haven't performed an extensive check, but the SEO Spider crawls around 35 URLs with /pi/ in the string, which is presumably not all the products on the site
Patrick actually mentions the issue in one of his points above. Essentially it looks like the site uses JavaScript on category pages for products, example - http://www.netspiren.dk/pl/Helse-Homøopati-Allergica-Ron-serien_58721.aspx
If you disable JS in your browser, you'll see a blank page where the products were. Our tool doesn't execute JS, although Google is much smarter and often can.
However, I'll leave you to verify that -
Hope that helps!
Cheers
Dan
-
I have sent Dan from Screaming Frog a tweet for you Fred. I'm sure he will be along presently
-Andy
-
Hi there
It's crawling for me. Here are a list of reasons why ScreamingFrog won't crawl your site:
- The site is blocked by robots.txt. A count of pages blocked by robots.txt is shown in the crawl overview pane on top right hand site of the user interface. You can configure the SEO Spider to ignore robots.txt by going to the “Basic” tab under Configuration->Spider.
- The site behaves differently depending on User Agent. Try changing the User Agent under Configuration->User Agent.
- The site requires JavaScript. Try looking at the site in your browser with JavaScript disabled.
- The site requires Cookies. Can you view the site with cookies disabled in your browser? Licenced users can enable cookies by going to Configuration->Spider and ticking “Allow Cookies” in the “Advanced” tab.
- The ‘nofollow’ attribute is present on links not being crawled. There is an option in Configuration->Spider under the “Basic” tab to follow ‘nofollow’ links.
- The page has a page level ‘nofollow’ attribute. The could be set by either a meta robots tag or an X-Robots-Tag in the HTTP header. These can be seen in the “Directives” tab in the “Nofollow” filter.
- The website is using framesets. The SEO Spider does not crawl the frame src attribute.
- The Content-Type header did not indicate the page is html. This is shown in the Content column and should be either text/html or application/xhtml+xml.
Run through your settings and check and see if you may have turned something on inadvertently that you didn't mean to. One thing you can try, is goto Configuration > Spider and then goto the last option Ignore robots.txt. Click the checkbox and try running it again.
It could just be a slow connection on your end. Give it a few minutes and see if any of the above suggestions work.
Hope this helps! Good luck!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does Google ignore content styled with 'display:none'?
Do you know if an H1 within a div that has a 'display: none' style applied will still be crawled and evaluated by Google? We have that situation on this page on line 136: view-source:https://www.junk-king.com/services/items-we-take/foreclosure-cleanouts Of course we also have an H1 up at the top of the page and are concerned that the second one will cause interference with our SEO efforts. I've seen conflicting and inconclusive information on line - not sure. Thanks for any help.
Intermediate & Advanced SEO | | rastellop0 -
Google doesn't index image slideshow
Hi, My articles are indexed and images (full size) via a meta in the body also. But, the images in the slideshow are not indexed, have you any idea? A problem with the JS Example : http://www.parismatch.com/People/Television/Sport-a-la-tele-les-femmes-a-l-abordage-962989 Thank you in advance Julien
Intermediate & Advanced SEO | | Julien.Ferras0 -
Why Google isn't indexing my images?
Hello, on my fairly new website Worthminer.com I am noticing that Google is not indexing images from my sitemap. Already 560 images submitted and Google indexed only 3 of them. Altough there is more images indexed they are not indexing any new images, and I have no idea why. Posts, categories and other urls are indexing just fine, but images not. I am using Wordpress and for sitemaps Wordpress SEO by yoast. Am I missing something here? Why Google won't index my images? Thanks, I appreciate any help, David xv1GtwK.jpg
Intermediate & Advanced SEO | | Worthminer1 -
Help! The website ranks fine but one of my web pages simply won't rank on Google!!!
One of our web pages will not rank on Google. The website as a whole ranks fine except just one section...We have tested and it looks fine...Google can crawl the page no problem. There are no spurious redirects in place. The content is fine. There is no duplicate page content issue. The page has a dozen product images (photos) but the load time of the page is absolutely fine. We have the submitted the page via webmaster and its fine. It gets listed but then a few hours later disappears!!! The site has not been penalised as we get good rankings with other pages. Can anyone help? Know about this problem?
Intermediate & Advanced SEO | | CayenneRed890 -
Moving to a new site while keeping old site live
For reasons I won't get into here, I need to move most of my site to a new domain (DOMAIN B) while keeping every single current detail on the old domain (DOMAIN A) as it is. Meaning, there will be 2 live websites that have mostly the same content, but I want the content to appear to search engines as though it now belongs to DOMAIN B. Weird situation. I know. I've run around in circles trying to figure out the best course of action. What do you think is the best way of going about this? Do I simply point DOMAIN A's canonical tags to the copied content on DOMAIN B and call it good? Should I ask sites that link to DOMAIN A to change their links to DOMAIN B, or start fresh and cut my losses? Should I still file a change of address with GWT, even though I'm not going to 301 redirect anything?
Intermediate & Advanced SEO | | kdaniels0 -
Does Google Read URL's if they include a # tag? Re: SEO Value of Clean Url's
An ECWID rep stated in regards to an inquiry about how the ECWID url's are not customizable, that "an important thing is that it doesn't matter what these URLs look like, because search engines don't read anything after that # in URLs. " Example http://www.runningboards4less.com/general-motors#!/Classic-Pro-Series-Extruded-2/p/28043025/category=6593891 Basically all of this: #!/Classic-Pro-Series-Extruded-2/p/28043025/category=6593891 That is a snippet out of a conversation where ECWID said that dirty urls don't matter beyond a hashtag... Is that true? I haven't found any rule that Google or other search engines (Google is really the most important) don't index, read, or place value on the part of the url after a # tag.
Intermediate & Advanced SEO | | Atlanta-SMO0 -
Should I 'nofollow' links between my own sites?
We have five sites which are largely unrelated but for cross-promotional purpose our company wishes to cross link between all our sites, possibly in the footer. I have warned about potential consequences of cross-linking in this way and certainly don't want our sites to be viewed as some sort of 'link ring' if they all link to one another. Just wondering if linking between sites you own really is that much of an issue and whether we should 'nofollow' the links in order to prevent being slapped with any sort of penalty for cross-linking.
Intermediate & Advanced SEO | | simon_realbuzz0 -
My website hasn't been cached for over a month. Can anyone tell me why?
I have been working on an eCommerce site www.fuchia.co.uk. I have asked an earlier question about how to get it working and ranking and I took on board what people said (such as optimising product pages etc...) and I think i'm getting there. The problem I have now is that Google hasn't indexed my site in over a month and the homepage cache is 404'ing when I check it on Google. At the moment there is a problem with the site being live for both WWW and non-WWW versions, i have told google in Webmaster what preferred domain to use and will also be getting developers to do 301 to the preferred domain. Would this be the problem stopping Google properly indexing me? also I'm only having around 30 pages of 137 indexed from the last crawl. Can anyone tell me or suggest why my site hasn't been indexed in such a long time? Thanks
Intermediate & Advanced SEO | | SEOAndy0