用正则表达式自动下载网页中的图片

原文网址:http://blog.csdn.net/yizhiduxiu11/archive/2010/09/13/5881442.aspx

首先获得网页的Html代码,然后用正则表达式分析其中图片的下载地址,最后自动逐个下载。

Code

usingSystem.Net;

/**/////

///DownloadImagefromwebsite

///bettertoputintothreads

///

privatevoidDownloadImage()

{

WebClientc=newWebClient();

//Gethtmlcode

stringcontent=c.DownloadString(Uri);

Collectionaddress=newCollection();

Collectionname=newCollection();

//Analysehtmlcodetogetimagesaddress(Uri)and(Name)list

ParseHtml(content,Prefix,PicUriPrefix+@"(?[^""]*?)"">(?[^",address,name);

if(address.Count>0&&name.Count>0&&address.Count==name.Count)

{

if(Directory.Exists(folder)==false)Directory.CreateDirectory(folder);//Createfolder

foreach(stringaddinaddress)

{

//Downloadimagesonebyone

c.DownloadFile(add,Path.Combine(folder,name[address.IndexOf(add)]+".jpg"));

}

}

}

/**////

///ParseHtmlusingregularexpressions

///

///Htmlcontent

///Uriprefix

///regularexpression

///Imageaddressescollection

///Imagenamescollection

privatevoidParseHtml(stringcontent,stringprefix,stringexpression,Collectionaddress,Collectionname)

{

if(String.IsNullOrEmpty(expression)||address==null||name==null)return;

Regexre=newRegex(expression,RegexOptions.IgnoreCase|RegexOptions.Singleline|RegexOptions.RightToLeft);

MatchCollectionmc=re.Matches(content);

if(mc==null||mc.Count==0)return;

foreach(Matchminmc)

{

address.Add(prefix+m.Groups["Uri"].Value);

name.Add(m.Groups["Name"].Value);

}

}

Html部分代码如下:

Htmlcode

alt=""src="/docfile/dyn/12345678LANGCCCCDDDDEEEEEEEERT09">BMW3-series\par

alt=""src="/docfile/dyn/12345678LANGCCCCDDDDEEEEEEEEEE01">Toyota\par

alt=""src="/docfile/dyn/12345678LANGCCCCDDDDEEEEEEEEJY25">Polocross\par

alt=""src="/docfile/dyn/12345678LANGCCCCDDDDEEEEEEEEMO02">Golf4.5\par

要用正则表达式从中获得图片下载地址(例如:/docfile/dyn/12345678LANGCCCCDDDDEEEEEEEERT09)和图片名称(例如:BMW3-series)

正则表达式部分如下:

Regularexpression

@"/docfile/dyn/(?[^""]*?)"">(?[^"

注意其中group的用法(?*?),和引号的匹配用法,匹配双引号前面需要再带一个双引号。其中Uri这个group在dyn/后面,直到双引号结束;后面跟随着双引号和>,然后就是Name这个group;它以。

本文来自CSDN博客,转载请标明出处:http://blog.csdn.net/yizhiduxiu11/archive/2010/09/13/5881442.aspx

相关推荐