用正则表达式自动下载网页中的图片
原文网址:http://blog.csdn.net/yizhiduxiu11/archive/2010/09/13/5881442.aspx
首先获得网页的Html代码,然后用正则表达式分析其中图片的下载地址,最后自动逐个下载。
Code
usingSystem.Net;
/**/////
///DownloadImagefromwebsite
///bettertoputintothreads
///
privatevoidDownloadImage()
{
WebClientc=newWebClient();
//Gethtmlcode
stringcontent=c.DownloadString(Uri);
Collectionaddress=newCollection();
Collectionname=newCollection();
//Analysehtmlcodetogetimagesaddress(Uri)and(Name)list
ParseHtml(content,Prefix,PicUriPrefix+@"(?[^""]*?)"">(?[^",address,name);
if(address.Count>0&&name.Count>0&&address.Count==name.Count)
{
if(Directory.Exists(folder)==false)Directory.CreateDirectory(folder);//Createfolder
foreach(stringaddinaddress)
{
//Downloadimagesonebyone
c.DownloadFile(add,Path.Combine(folder,name[address.IndexOf(add)]+".jpg"));
}
}
}
/**////
///ParseHtmlusingregularexpressions
///
///Htmlcontent
///Uriprefix
///regularexpression
///Imageaddressescollection
///Imagenamescollection
privatevoidParseHtml(stringcontent,stringprefix,stringexpression,Collectionaddress,Collectionname)
{
if(String.IsNullOrEmpty(expression)||address==null||name==null)return;
Regexre=newRegex(expression,RegexOptions.IgnoreCase|RegexOptions.Singleline|RegexOptions.RightToLeft);
MatchCollectionmc=re.Matches(content);
if(mc==null||mc.Count==0)return;
foreach(Matchminmc)
{
address.Add(prefix+m.Groups["Uri"].Value);
name.Add(m.Groups["Name"].Value);
}
}
Html部分代码如下:
Htmlcode
alt=""src="/docfile/dyn/12345678LANGCCCCDDDDEEEEEEEERT09">BMW3-series\par
alt=""src="/docfile/dyn/12345678LANGCCCCDDDDEEEEEEEEEE01">Toyota\par
alt=""src="/docfile/dyn/12345678LANGCCCCDDDDEEEEEEEEJY25">Polocross\par
alt=""src="/docfile/dyn/12345678LANGCCCCDDDDEEEEEEEEMO02">Golf4.5\par
要用正则表达式从中获得图片下载地址(例如:/docfile/dyn/12345678LANGCCCCDDDDEEEEEEEERT09)和图片名称(例如:BMW3-series)
正则表达式部分如下:
Regularexpression
@"/docfile/dyn/(?[^""]*?)"">(?[^"
注意其中group的用法(?*?),和引号的匹配用法,匹配双引号前面需要再带一个双引号。其中Uri这个group在dyn/后面,直到双引号结束;后面跟随着双引号和>,然后就是Name这个group;它以。
本文来自CSDN博客,转载请标明出处:http://blog.csdn.net/yizhiduxiu11/archive/2010/09/13/5881442.aspx