I’ve had a wget in process for a couple of days. Judging from the file count its downloaded so far, I suspect it may be crawling domains outside what I had required.
This is despite me having specified limits in my command line (below). Is there something wrong with my command line that would cause wget to look outside the --domain
I specified?
Here’s my wget command:
wget --mirror -l12 --page-requisites --convert-link --no-clobber --adjust-extension -e robots=off --exclude-domains www.a.org,www.b.org,www.c.org,www.d.org --domains=mysitename.org http://mysitename.org/subdir/index.php/Main_Page
The a,b,d,c.org
links exist in mysitename.org
, hopefully the exclude will cause wget to not crawl them.
Its the -l12
I’m worried about – in case its forcing an external crawl. Also, I have not put a --no-parent
in because I want it to crawl subdir2
etc.
I don’t want to stop my wget process yet, but am concerned that it might be downloading more than I’d want.
I have seen a few examples with --domains
with and without an =
sign, so I trust my syntax is OK.
Thats my question: is there anything in my wget command line that may be triggering an external crawl of other domains (e.g. -l12
)?