<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>等待喝彩_OnEcho &#187; 采集</title>
	<atom:link href="http://www.onecho.com/tag/%e9%87%87%e9%9b%86/feed" rel="self" type="application/rss+xml" />
	<link>http://www.onecho.com</link>
	<description>http://www.onecho.com  回声的启示</description>
	<lastBuildDate>Fri, 16 Jul 2010 14:18:16 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.5</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>最近做PHP采集，发几个实用的函数</title>
		<link>http://www.onecho.com/2008-12-04/441.html</link>
		<comments>http://www.onecho.com/2008-12-04/441.html#comments</comments>
		<pubDate>Wed, 03 Dec 2008 16:09:57 +0000</pubDate>
		<dc:creator>Kenami</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[采集]]></category>

		<guid isPermaLink="false">http://www.onecho.com/2008-12-04/441.html</guid>
		<description><![CDATA[最近做PHP采集，发几个实用的函数]]></description>
			<content:encoded><![CDATA[
<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">//获得当前的脚本网址</span>
<span style="color: #000000; font-weight: bold;">function</span> get_php_url<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
<span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #339933;">!</span><span style="color: #990000;">empty</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$_SERVER</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">&quot;REQUEST_URI&quot;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
<span style="color: #000088;">$scriptName</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$_SERVER</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">&quot;REQUEST_URI&quot;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$nowurl</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$scriptName</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span><span style="color: #b1b100;">else</span><span style="color: #009900;">&#123;</span>
<span style="color: #000088;">$scriptName</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$_SERVER</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">&quot;PHP_SELF&quot;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">empty</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$_SERVER</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">&quot;QUERY_STRING&quot;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #000088;">$nowurl</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$scriptName</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">else</span> <span style="color: #000088;">$nowurl</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$scriptName</span><span style="color: #339933;">.</span><span style="color: #0000ff;">&quot;?&quot;</span><span style="color: #339933;">.</span><span style="color: #000088;">$_SERVER</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">&quot;QUERY_STRING&quot;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
<span style="color: #b1b100;">return</span> <span style="color: #000088;">$nowurl</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
<span style="color: #339933;">&lt;</span>span id<span style="color: #339933;">=</span><span style="color: #0000ff;">&quot;more-441&quot;</span><span style="color: #339933;">&gt;&lt;/</span>span<span style="color: #339933;">&gt;</span><span style="color: #666666; font-style: italic;">//把全角数字转为半角数字</span>
<span style="color: #000000; font-weight: bold;">function</span> GetAlabNum<span style="color: #009900;">&#40;</span><span style="color: #000088;">$fnum</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
<span style="color: #000088;">$nums</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;０&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;１&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;２&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;３&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;４&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;５&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;６&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;７&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;８&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;９&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$fnums</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;0123456789&quot;</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">for</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$i</span><span style="color: #339933;">=</span><span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span><span style="color: #000088;">$i</span><span style="color: #339933;">&amp;</span>lt<span style="color: #339933;">;=</span><span style="color: #cc66cc;">9</span><span style="color: #339933;">;</span><span style="color: #000088;">$i</span><span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #000088;">$fnum</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$nums</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$i</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">,</span><span style="color: #000088;">$fnums</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$i</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">,</span><span style="color: #000088;">$fnum</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$fnum</span> <span style="color: #339933;">=</span> <span style="color: #990000;">ereg_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;[^0-9\.]|^0{1,}&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$fnum</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$fnum</span><span style="color: #339933;">==</span><span style="color: #0000ff;">&quot;&quot;</span><span style="color: #009900;">&#41;</span> <span style="color: #000088;">$fnum</span><span style="color: #339933;">=</span><span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">return</span> <span style="color: #000088;">$fnum</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
<span style="color: #666666; font-style: italic;">//去除HTML标记</span>
<span style="color: #000000; font-weight: bold;">function</span> Text2Html<span style="color: #009900;">&#40;</span><span style="color: #000088;">$txt</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
<span style="color: #000088;">$txt</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;  &quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;　&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$txt</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$txt</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;&amp;lt;&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;&amp;amp;lt;&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$txt</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$txt</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;&amp;gt;&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;&amp;amp;gt;&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$txt</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$txt</span> <span style="color: #339933;">=</span> <span style="color: #990000;">preg_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;/[<span style="color: #000099; font-weight: bold;">\r</span><span style="color: #000099; font-weight: bold;">\n</span>]{1,}/isU&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;&amp;lt;br/&amp;gt;<span style="color: #000099; font-weight: bold;">\r</span><span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$txt</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">return</span> <span style="color: #000088;">$txt</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #666666; font-style: italic;">//清除HTML标记</span>
<span style="color: #000000; font-weight: bold;">function</span> ClearHtml<span style="color: #009900;">&#40;</span><span style="color: #000088;">$str</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
<span style="color: #000088;">$str</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'&amp;lt;'</span><span style="color: #339933;">,</span><span style="color: #0000ff;">'&amp;amp;lt;'</span><span style="color: #339933;">,</span><span style="color: #000088;">$str</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$str</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'&amp;gt;'</span><span style="color: #339933;">,</span><span style="color: #0000ff;">'&amp;amp;gt;'</span><span style="color: #339933;">,</span><span style="color: #000088;">$str</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">return</span> <span style="color: #000088;">$str</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
<span style="color: #666666; font-style: italic;">//相对路径转化成绝对路径</span>
<span style="color: #000000; font-weight: bold;">function</span> relative_to_absolute<span style="color: #009900;">&#40;</span><span style="color: #000088;">$content</span><span style="color: #339933;">,</span> <span style="color: #000088;">$feed_url</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
<span style="color: #990000;">preg_match</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'/(http|https|ftp):\/\//'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$feed_url</span><span style="color: #339933;">,</span> <span style="color: #000088;">$protocol</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$server_url</span> <span style="color: #339933;">=</span> <span style="color: #990000;">preg_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;/(http|https|ftp|news):\/\//&quot;</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">&quot;&quot;</span><span style="color: #339933;">,</span> <span style="color: #000088;">$feed_url</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$server_url</span> <span style="color: #339933;">=</span> <span style="color: #990000;">preg_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;/\/.*/&quot;</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">&quot;&quot;</span><span style="color: #339933;">,</span> <span style="color: #000088;">$server_url</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$server_url</span> <span style="color: #339933;">==</span> <span style="color: #0000ff;">''</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
<span style="color: #b1b100;">return</span> <span style="color: #000088;">$content</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #990000;">isset</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$protocol</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
<span style="color: #000088;">$new_content</span> <span style="color: #339933;">=</span> <span style="color: #990000;">preg_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'/href=&quot;\//'</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'href=&quot;'</span><span style="color: #339933;">.</span><span style="color: #000088;">$protocol</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">.</span><span style="color: #000088;">$server_url</span><span style="color: #339933;">.</span><span style="color: #0000ff;">'/'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$content</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$new_content</span> <span style="color: #339933;">=</span> <span style="color: #990000;">preg_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'/src=&quot;\//'</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'src=&quot;'</span><span style="color: #339933;">.</span><span style="color: #000088;">$protocol</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">.</span><span style="color: #000088;">$server_url</span><span style="color: #339933;">.</span><span style="color: #0000ff;">'/'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$new_content</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span> <span style="color: #b1b100;">else</span> <span style="color: #009900;">&#123;</span>
<span style="color: #000088;">$new_content</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$content</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
<span style="color: #b1b100;">return</span> <span style="color: #000088;">$new_content</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
<span style="color: #666666; font-style: italic;">//取得所有链接</span>
<span style="color: #000000; font-weight: bold;">function</span> get_all_url<span style="color: #009900;">&#40;</span><span style="color: #000088;">$code</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
<span style="color: #990000;">preg_match_all</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'/&amp;lt;a\s+href=[&quot;|\']?([^&amp;gt;&quot;\' ]+)[&quot;|\']?\s*[^&amp;gt;]*&amp;gt;([^&amp;gt;]+)&amp;lt;\/a&amp;gt;/i'</span><span style="color: #339933;">,</span><span style="color: #000088;">$code</span><span style="color: #339933;">,</span><span style="color: #000088;">$arr</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">return</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'name'</span><span style="color: #339933;">=&amp;</span>gt<span style="color: #339933;">;</span><span style="color: #000088;">$arr</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">2</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">'url'</span><span style="color: #339933;">=&amp;</span>gt<span style="color: #339933;">;</span><span style="color: #000088;">$arr</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #666666; font-style: italic;">//获取指定标记中的内容</span>
<span style="color: #000000; font-weight: bold;">function</span> get_tag_data<span style="color: #009900;">&#40;</span><span style="color: #000088;">$str</span><span style="color: #339933;">,</span> <span style="color: #000088;">$start</span><span style="color: #339933;">,</span> <span style="color: #000088;">$end</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span> <span style="color: #000088;">$start</span> <span style="color: #339933;">==</span> <span style="color: #0000ff;">''</span> <span style="color: #339933;">||</span> <span style="color: #000088;">$end</span> <span style="color: #339933;">==</span> <span style="color: #0000ff;">''</span> <span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
<span style="color: #b1b100;">return</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
<span style="color: #000088;">$str</span> <span style="color: #339933;">=</span> <span style="color: #990000;">explode</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$start</span><span style="color: #339933;">,</span> <span style="color: #000088;">$str</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$str</span> <span style="color: #339933;">=</span> <span style="color: #990000;">explode</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$end</span><span style="color: #339933;">,</span> <span style="color: #000088;">$str</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">return</span> <span style="color: #000088;">$str</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
<span style="color: #666666; font-style: italic;">//HTML表格的每行转为CSV格式数组</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">function</span> get_tr_array<span style="color: #009900;">&#40;</span><span style="color: #000088;">$table</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
<span style="color: #000088;">$table</span> <span style="color: #339933;">=</span> <span style="color: #990000;">preg_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;'&amp;lt;td[^&amp;gt;]*?&amp;gt;'si&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">'&quot;'</span><span style="color: #339933;">,</span><span style="color: #000088;">$table</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$table</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;&amp;lt;/td&amp;gt;&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">'&quot;,'</span><span style="color: #339933;">,</span><span style="color: #000088;">$table</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$table</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;&amp;lt;/tr&amp;gt;&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;{tr}&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$table</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #666666; font-style: italic;">//去掉 HTML 标记</span>
<span style="color: #000088;">$table</span> <span style="color: #339933;">=</span> <span style="color: #990000;">preg_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;'&amp;lt;[\/\!]*?[^&amp;lt;&amp;gt;]*?&amp;gt;'si&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$table</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #666666; font-style: italic;">//去掉空白字符</span>
<span style="color: #000088;">$table</span> <span style="color: #339933;">=</span> <span style="color: #990000;">preg_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;'([<span style="color: #000099; font-weight: bold;">\r</span><span style="color: #000099; font-weight: bold;">\n</span>])[\s]+'&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$table</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$table</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot; &quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$table</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$table</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot; &quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$table</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000088;">$table</span> <span style="color: #339933;">=</span> <span style="color: #990000;">explode</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;,{tr}&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$table</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #990000;">array_pop</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$table</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">return</span> <span style="color: #000088;">$table</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #666666; font-style: italic;">//将HTML表格的每行每列转为数组，采集表格数据</span>
<span style="color: #000000; font-weight: bold;">function</span> get_td_array<span style="color: #009900;">&#40;</span><span style="color: #000088;">$table</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
<span style="color: #000088;">$table</span> <span style="color: #339933;">=</span> <span style="color: #990000;">preg_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;'&amp;lt;table[^&amp;gt;]*?&amp;gt;'si&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$table</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$table</span> <span style="color: #339933;">=</span> <span style="color: #990000;">preg_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;'&amp;lt;tr[^&amp;gt;]*?&amp;gt;'si&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$table</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$table</span> <span style="color: #339933;">=</span> <span style="color: #990000;">preg_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;'&amp;lt;td[^&amp;gt;]*?&amp;gt;'si&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$table</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$table</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;&amp;lt;/tr&amp;gt;&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;{tr}&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$table</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$table</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;&amp;lt;/td&amp;gt;&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;{td}&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$table</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #666666; font-style: italic;">//去掉 HTML 标记</span>
<span style="color: #000088;">$table</span> <span style="color: #339933;">=</span> <span style="color: #990000;">preg_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;'&amp;lt;[\/\!]*?[^&amp;lt;&amp;gt;]*?&amp;gt;'si&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$table</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #666666; font-style: italic;">//去掉空白字符</span>
<span style="color: #000088;">$table</span> <span style="color: #339933;">=</span> <span style="color: #990000;">preg_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;'([<span style="color: #000099; font-weight: bold;">\r</span><span style="color: #000099; font-weight: bold;">\n</span>])[\s]+'&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$table</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$table</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot; &quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$table</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$table</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot; &quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$table</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000088;">$table</span> <span style="color: #339933;">=</span> <span style="color: #990000;">explode</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'{tr}'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$table</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #990000;">array_pop</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$table</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">foreach</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$table</span> <span style="color: #b1b100;">as</span> <span style="color: #000088;">$key</span><span style="color: #339933;">=&amp;</span>gt<span style="color: #339933;">;</span><span style="color: #000088;">$tr</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
<span style="color: #000088;">$td</span> <span style="color: #339933;">=</span> <span style="color: #990000;">explode</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'{td}'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$tr</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #990000;">array_pop</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$td</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$td_array</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$td</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
<span style="color: #b1b100;">return</span> <span style="color: #000088;">$td_array</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #666666; font-style: italic;">//返回字符串中的所有单词 $distinct=true 去除重复</span>
<span style="color: #000000; font-weight: bold;">function</span> split_en_str<span style="color: #009900;">&#40;</span><span style="color: #000088;">$str</span><span style="color: #339933;">,</span><span style="color: #000088;">$distinct</span><span style="color: #339933;">=</span><span style="color: #009900; font-weight: bold;">true</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
<span style="color: #990000;">preg_match_all</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'/([a-zA-Z]+)/'</span><span style="color: #339933;">,</span><span style="color: #000088;">$str</span><span style="color: #339933;">,</span><span style="color: #000088;">$match</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$distinct</span> <span style="color: #339933;">==</span> <span style="color: #009900; font-weight: bold;">true</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
<span style="color: #000088;">$match</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array_unique</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$match</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
<span style="color: #990000;">sort</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$match</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">return</span> <span style="color: #000088;">$match</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://www.onecho.com/2008-12-04/441.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>注定今夜不眠，受到三次惊吓！</title>
		<link>http://www.onecho.com/2008-09-15/350.html</link>
		<comments>http://www.onecho.com/2008-09-15/350.html#comments</comments>
		<pubDate>Sun, 14 Sep 2008 19:03:09 +0000</pubDate>
		<dc:creator>Kenami</dc:creator>
				<category><![CDATA[随想]]></category>
		<category><![CDATA[数据备份]]></category>
		<category><![CDATA[采集]]></category>

		<guid isPermaLink="false">http://www.onecho.com/2008-09-15/350.html</guid>
		<description><![CDATA[本以为放假了，可以有时间搞搞我的博客，但没想到惊得我一身冷汗！

写了一天的采集程序，终于可以很完美的采集其他站的内容，可是因为疏忽写错了循环，差点把“可能吧”的全部文章都“盗”了过来，一看文章数都已经300多篇了，兴奋之余，想到了后果，我怕“可能吧”的站长会举报我，影响我小站的权重（虽然至今还没有收录），百思之下决定删除，可是祸不单行，有写错了SQL，把我的文章一篇不留的全给删了，心里庆幸晚上刚备份了数据库。
]]></description>
			<content:encoded><![CDATA[<p>本以为放假了，可以有时间搞搞我的博客，但没想到惊得我一身冷汗！</p>
<p>写了一天的采集程序，终于可以很完美的采集其他站的内容，可是因为疏忽写错了循环，差点把“<a title="可能吧" href="http://www.kenengba.com">可能吧</a>”的全部文章都“盗”了过来，一看文章数都已经300多篇了，兴奋之余，想到了后果，我怕“<a title="可能吧" href="http://www.kenengba.com">可能吧</a>”的站长会举报我，影响我小站的权重（虽然至今还没有收录），百思之下决定删除，可是祸不单行，有写错了SQL，把我的文章一篇不留的全给删了，心里庆幸晚上刚备份了数据库。</p>
<p>可是&#8230;</p>
<p><span id="more-350"></span>看来不是祸不单行这么简单，本来简单的mysqldump命令执行了几遍没有结果，数据还是没有恢复，我开始找原因，权限问题？备份错误？mysql用户问题，不行用root，还不行！难道是RP问题？</p>
<p>我用editplus打开备份的sql文件，数据明明在啊？真是着急啊，我把robots.txt改了，我怕蜘蛛爬过来看我以前的页面都404了，先改成Disallow: /吧，唉～不得已</p>
<p>只能试试别的办法了，最终我用PhpMyAdmin终于恢复了我的数据，谢天谢地！时隔30分钟，我又赶紧把robots.txt文件给改了回来，我还是希望机器人来爬啊，等会看下log。</p>
<p>看来站长真的是辛苦的，我终于体会到一丁点，如果要坚持做站困难肯定更多，当然首先不要自己给自己找麻烦。</p>
<p>最后提醒大家，及时保存数据，以免“一无所有”！</p>
<p>最后写上网站备份的命令：</p>
<p>mysqldump -uroot -p123456 database_name &gt; onecho080915.sql ;</p>
<p>恢复命令：</p>
<p>mysqldump -uroot -p123456 database_name &lt; onecho080915.sql ;</p>
<p>好了，大家晚安！（我想你们早就睡了，现在都3点了，老婆，做梦有想到我吗？）</p>
]]></content:encoded>
			<wfw:commentRss>http://www.onecho.com/2008-09-15/350.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>[转载]网站被百度降权后，我们该做什么？</title>
		<link>http://www.onecho.com/2008-09-13/318.html</link>
		<comments>http://www.onecho.com/2008-09-13/318.html#comments</comments>
		<pubDate>Fri, 12 Sep 2008 19:05:31 +0000</pubDate>
		<dc:creator>Kenami</dc:creator>
				<category><![CDATA[网络文摘]]></category>
		<category><![CDATA[百度蜘蛛]]></category>
		<category><![CDATA[百度降权]]></category>
		<category><![CDATA[采集]]></category>

		<guid isPermaLink="false">http://www.onecho.com/?p=318</guid>
		<description><![CDATA[最近同事的网站被百度降权了，基本上相当被K掉了，怪让人伤心的，下面转载了一篇文章，希望能对担心被降权或被K站长一些帮助：

网站被百度降权后，我们该做什么?

　　我是一个小小的个人站长，做为个人站长，最怕的就是被中国的搜索引擎老大——百度给拔毛或者降权了。因为我们网站大部份的流量可能都来自于中国人最偏爱的国产搜索引擎——百度。这也正体现了所谓的爱国吧。呵呵!

　　我是一个新手，对搜索引擎那些什么算法了，咱一点儿不会，也没那心情去研究。下面是我针对我个人的两个网站(拇指屋文学网和5星作文网)所做的一点儿分析。如果您是高手，您可不必看这些内容。
]]></description>
			<content:encoded><![CDATA[<p>　最近同事的网站被百度降权了，基本上相当被K掉了，怪让人伤心的，下面转载了一篇文章，希望能对担心被降权或被K站长一些帮助：</p>
<p>网站被百度降权后，我们该做什么?</p>
<p>　　我是一个小小的个人站长，做为个人站长，最怕的就是被中国的搜索引擎老大——百度给拔毛或者降权了。因为我们网站大部份的流量可能都来自于中国人最偏爱的国产搜索引擎——百度。这也正体现了所谓的爱国吧。呵呵!</p>
<p>　　我是一个新手，对搜索引擎那些什么算法了，咱一点儿不会，也没那心情去研究。下面是我针对我个人的两个网站(拇指屋文学网和5星作文网)所做的一点儿分析。如果您是高手，您可不必看这些内容。</p>
<p>　　首先我们来分析一下个人网站为什么会被百度从搜索结果中排除我们网站上的内容?我认为不外乎下面的这几点。</p>
<p>　　1、 网站中自己的内容太少，没有个性。</p>
<p>　　<span id="more-318"></span>网站中几乎所有内容都来自于同类型或相关类型网站上的内容——这就是我们说的采集。如果是自动的采集，那与人家的内容就是一样的。对吧?没有做过任何修改，直接复制粘贴过来。(这样的内容如果您是百度，您会收录吗?虽然它是一个网站。您别忘记了，它是人控制的。人家可以人工处理的。对吧?)这样的内容会被百度蜘蛛所厌烦。再加上咱自己的网站本身权重不够大。您说它不排除咱，排除谁呢。所以我们要写些原创内容来充实网站。不要只是一味的采集。</p>
<p>　　2、 网站优化过度</p>
<p>　　网站优化这方面的内容我不会，但通过在站长网上看高手们的作品。也明白了一点儿。知道网站的关键词优化会带来大量的IP。所以就有站长朋友想尽办法来优化自己的网站。结果造成了过度优化。也许会短时间内访问量很大。但百度早晚会发现的，您认为它会不管吗?错了!说给您，您的下场会很惨的。它会很不客气的拔毛，严厉的会从收录中去除您网站所有数据。到头来，受害的是咱自己的心血呀!所以我们不能过度优化。高手们认为最好的优化应该掌握在5%到8%之间。不然就危险了。(在这里提醒个人站长朋友，优化可行，适可而止。)</p>
<p>　　3、 网站服务器不稳定。</p>
<p>　　这一点我不想多说，站长朋友都知道这一点的重要性。只有一个办法，找服务好的服务器提供商最可靠。如站长网服务。找好服务器是网站运行的基础。不能打开速度过慢或者总出现打不开网站的情况。对访客及搜索蜘蛛是很不友好的。</p>
<p>　　其次，我来说一下如何面对这些情况。如果发生这些情况后，我们要如何面对。(主要是被百度降权后)</p>
<p>　　下面我以我的拇指屋文学网(www.muzhi5.cn)为例来说明一下我是如何做的。因网站服务器的原因，我的网站感染病毒并造成一定的损失后。我们几个合作者疯狂的从别的网站上手动采集大量内容。百度收录逐渐增多。我们的目的只有一个，把损失找回来。但我们错了。虽然收录多了一些，但并没有带来多少IP。就是以前能带来IP的内容也不能从百度搜索结果中找到了。从2008年2月起，百度收录再次减少，每天只带来一两个IP。这对一个有1000多内容的网站来说真是可悲到了极点。到了3月份，网站首页从收录中消失了。看来网站是被降权了。这如何是好?于是我再次对网站进行了改动。把所有内容重新生成静态页面，每一栏目均加上相关关键词。(这样做的风险很大，闹不好就彻底完了。我是死马当活马医吧。)再次到百度提交网址。让它当成一个新站来收录。接下来我们组织人写了原创内容到网站上。并到站长网(www.admin5.com)来写写软文。同时到大的网站论坛宣传一下。网站从昨天起，就能从百度来一些流量了。而不再是一两个IP了。虽然目前百度只收录了2、3百内容，但我认为它会收录更多的(个人看法)。因为这毕竟是一个良好的开始嘛。</p>
<p>　　因此，做网站时，不只要有技术，不只要有内容。我们要有个人的东西——原创作品。百度喜欢原创，咱就多写些原创。如果您不会写原创内容。您可以把人家的内容改成自己的呀。那不就是原创吗?办法是有的。只要思考。您就可以成为高手。对吧?这样的网站，百度还会忍心拔毛吗?顺着这一思路，于是本人创办了一个全新的有关教育的网站——5星作文网(www.5xzww.com)因为我是教学的，所以手边有大量学生写的作文(全是原创)然后把网站上加些写作指导。为每一个中小学生提供写作技巧及指导方案。这个网站一来可以受到搜索引擎的青睐(虽然现在还没有被百度收录，但蜘蛛天天来爬，呵呵。)，二来可以做些有意义的事——为孩子们的写作提供一个良好的交流的平台。何乐而不为呢?这个网站也被同事所认可。认为实用性强些。呵呵。</p>
<p>自己的网站也刚开始做，蜘蛛也都每天来爬，很高兴，希望自己能坚持！</p>
]]></content:encoded>
			<wfw:commentRss>http://www.onecho.com/2008-09-13/318.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
