Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Can you use Screaming Frog to find all instances of relative or absolute linking?
-
My client wants to pull every instance of an absolute URL on their site so that they can update them for an upcoming migration to HTTPS (the majority of the site uses relative linking). Is there a way to use the extraction tool in Screaming Frog to crawl one page at a time and extract every occurrence of _href="http://" _?
I have gone back and forth between using an x-path extractor as well as a regex and have had no luck with either.
Ex. X-path: //*[starts-with(@href, “http://”)][1]
Ex. Regex: href=\”//
-
This only works if you have downloaded all the HTML files to your local computer. That said, it works quite well! I am betting this is a database driven site and so would not work in the same way.
-
Regex: href=("|'|)http:(?:/{1,3}|[a-z0-9%])|[a-z0-9.-]+.
This allows for your link to have the " or ' or nothing between the = and the http If you have any other TLDs you can just keep expanding on the |
I modified this from a posting in github https://gist.github.com/gruber/8891611
You can play with tools like http://regexpal.com/ to test your regexp against example text
I assumed you would want the full URL and that was the issue you were running into.
As another solution why not just fix the https in the main navigation etc, then once you get the staging/testing site setup, run ScreamingFrog on that site and find all the 301 redirects or 404s and then use that report to find all the URLs to fix.
I would also ping ScreamingFrog - this is not the first time they have been asked this question. They may have a better regexp and/or solution vs what I have suggested.
-
Depending on how you've coded everything you could try to setup a Custom Search under Configuration. This will scan the HTML of the page so if the coding was consistent you could put something like href="http://www.yourdomain.com" as the string it's looking for and in the Custom tab on the resulting pages it'll show you all the ones that match the string.
That's the only way I can think of to get Screaming Frog to pull it but looking forward to anyone else's thoughts.
-
If you have access to all the website's files, you could try finding all instances in the directory using something like Notepad++. Could even use find and replace.
This is how I tend to locate those one-liners among hundreds of files.
Good luck!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How can I stop a tracking link from being indexed while still passing link equity?
I have a marketing campaign landing page and it uses a tracking URL to track clicks. The tracking links look something like this: http://this-is-the-origin-url.com/clkn/http/destination-url.com/ The problem is that Google is indexing these links as pages in the SERPs. Of course when they get indexed and then clicked, they show a 400 error because the /clkn/ link doesn't represent an actual page with content on it. The tracking link is set up to instantly 301 redirect to http://destination-url.com. Right now my dev team has blocked these links from crawlers by adding Disallow: /clkn/ in the robots.txt file, however, this blocks the flow of link equity to the destination page. How can I stop these links from being indexed without blocking the flow of link equity to the destination URL?
Technical SEO | | UnbounceVan0 -
Can you use multiple videos without sacrificing load times?
We're using a lot of videos on our new website (www.4com.co.uk), but our immediate discovery has been that this has a negative impact on load times. We use a third party (Vidyard) to host our videos but we also tried YouTube and didn't see any difference. I was wondering if there's a way of using multiple videos without seeing this load speed issue or whether we just need to go with a different approach. Thanks all, appreciate any guidance! Matt
Technical SEO | | MattWatts1 -
What punctuation can you use in meta tags? Are there any Google does not like?
So I know you can use dashes and | in meta tags, but can anyone tell me what other punctuation you can use? Also, it'd be great to know what punctuation you can't use. Thanks!
Technical SEO | | Trevorneo1 -
301 redirect relative or absolute path?
Hello everyone, Recently we've changed the URL structure on our website, and of course we had to 301 redirect the old urls to the coresponding new ones. The way the technical guys did this is: "http://www.domain.com/old-url.html" 301 redirect to "/new-url.html"
Technical SEO | | Silviu
meaning as a relative redirect path, not an absolute one like this:
"http://www.domain.com/old-url.html" 301 redirect to "http://www.domain.com/new-url.html" This happened for few thousands urls, and the fact is the organic traffic dropped for those pages after this change. (no other changes were made on these pages and the new urls are as seo friendly as possible, A grade on On-Page Grader). The question is: does the relative redirect negatively affects seo, or it counts the same as an absolute path redirect? Thanks,
S.0 -
Links from the same server has value or not
Hi Guys, Sometime ago one of the SEO experts said to me if I get links from the same IP address, Google doesn't count them as with much value. For an example, I am a web devleoper and I host all my clients websites on one server and link them back to me. Im wondering whether those links have any value when it comes to seo or should I consider getting different hosting providers? Regards Uds
Technical SEO | | Uds0 -
Does anyone use buzzfeed to creat links traffic and increase brand
Hi i would like to know if anyone uses http://www.buzzfeed.com to create links, gain traffic and increase brand awareness. I have signed up for an account but cannot get it to work and would like some help. I can get my content on there but cannot manage to get the links to work I signed up for this account a while back and a friend shown me how to use it but i have forgotten. here is my page http://www.buzzfeed.com/lifestylemagazine some links work and some links do not. what i am trying to do is to publish stories from my site as well as other sites and have the link included where you press the title and it goes to the site any help would be great
Technical SEO | | ClaireH-1848860 -
How Can I Block Archive Pages in Blogger when I am not using classic/default template
Hi, I am trying to block all the archive pages of my blog as Google is indexing them. This could lead to duplicate content issue. I am not using default blogger theme or classic theme and therefore, I cannot use this code therein: Please suggest me how I can instruct Google not to index archive pages of my blog? Looking for quick response.
Technical SEO | | SoftzSolutions0 -
How to find a specific link on my website (currently causing redirects)
Hi everyone, I've used crawlers like Xenu to find broken links before, and I love these tools. What I can't figure out is how to find specific pieces of code within my site. For example, Webmaster Tools tells me there are still links to old pages somewhere on my website but I just can't find them. Do you know of a crawler that can search for a specific link within the html? Thanks in advance, Josh
Technical SEO | | dreadmichael0