 




<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Developing a Web application</title>
	<atom:link href="http://itknowledgeexchange.techtarget.com/itanswers/developing-a-web-application/feed/" rel="self" type="application/rss+xml" />
	<link>http://itknowledgeexchange.techtarget.com/itanswers/developing-a-web-application/</link>
	<description></description>
	<lastBuildDate>Tue, 21 May 2013 12:21:34 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
	<item>
		<title>By: nigelmcfarlane</title>
		<link>http://itknowledgeexchange.techtarget.com/itanswers/developing-a-web-application/#comment-41192</link>
		<dc:creator>nigelmcfarlane</dc:creator>
		<pubDate>Fri, 10 Dec 2004 05:39:48 +0000</pubDate>
		<guid isPermaLink="false">#comment-41192</guid>
		<description><![CDATA[If your web page is to be a good global citizen, then you shouldn&#039;t scan a site for pages if the site&#039;s robots.txt tells you not to.

An excellent client-side web spidering library can be found here: www.bclary.com.

However: you&#039;re pretty stuck client-side because
of security.
You can&#039;t script into a web page that belongs to a foreign site (The &quot;Same Origin Policy&quot;). That means you can&#039;t follow all the links in the other site&#039;s loaded pages, to disciver the size of the site.

That means your code that accepts the form submission
has to do it. You might as well use the Google API (it&#039;s SOAP I think) and ask Google to do the word search for you. Then you don&#039;t need to count or
scan pages at all - just use the Google results.

- N.]]></description>
		<content:encoded><![CDATA[<p>If your web page is to be a good global citizen, then you shouldn&#8217;t scan a site for pages if the site&#8217;s robots.txt tells you not to.</p>
<p>An excellent client-side web spidering library can be found here: <a href="http://www.bclary.com" rel="nofollow">http://www.bclary.com</a>.</p>
<p>However: you&#8217;re pretty stuck client-side because<br />
of security.<br />
You can&#8217;t script into a web page that belongs to a foreign site (The &#8220;Same Origin Policy&#8221;). That means you can&#8217;t follow all the links in the other site&#8217;s loaded pages, to disciver the size of the site.</p>
<p>That means your code that accepts the form submission<br />
has to do it. You might as well use the Google API (it&#8217;s SOAP I think) and ask Google to do the word search for you. Then you don&#8217;t need to count or<br />
scan pages at all &#8211; just use the Google results.</p>
<p>- N.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: riverwind</title>
		<link>http://itknowledgeexchange.techtarget.com/itanswers/developing-a-web-application/#comment-41193</link>
		<dc:creator>riverwind</dc:creator>
		<pubDate>Wed, 08 Dec 2004 22:55:16 +0000</pubDate>
		<guid isPermaLink="false">#comment-41193</guid>
		<description><![CDATA[It depends on how you design your web application. At server level you can use software like &#039;spider&#039; to monitor the usage. I think that the most suitable method is to develop the controls and logs on the web application itself, i.e., the application level of control. However, it is quite tough for it.]]></description>
		<content:encoded><![CDATA[<p>It depends on how you design your web application. At server level you can use software like &#8216;spider&#8217; to monitor the usage. I think that the most suitable method is to develop the controls and logs on the web application itself, i.e., the application level of control. However, it is quite tough for it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: bobkberg</title>
		<link>http://itknowledgeexchange.techtarget.com/itanswers/developing-a-web-application/#comment-41194</link>
		<dc:creator>bobkberg</dc:creator>
		<pubDate>Wed, 08 Dec 2004 17:46:50 +0000</pubDate>
		<guid isPermaLink="false">#comment-41194</guid>
		<description><![CDATA[John Brandt already provided the basic answer.

The only other caveat I&#039;d offer is that a spider utility will only show you the web pages that are linked through the index (or other known) page.  Many sites have &quot;landing&quot; pages specifically designed for advertising click-through that are never referenced through anything but ads, and some pages which are meant to remain private.

The other place you can look is to do a search on that domain for robots.txt, since (if present) that specifies the files/pages/directories which a spider should or should not consider fair game for indexing, and may provide you with the information you need.

Bob]]></description>
		<content:encoded><![CDATA[<p>John Brandt already provided the basic answer.</p>
<p>The only other caveat I&#8217;d offer is that a spider utility will only show you the web pages that are linked through the index (or other known) page.  Many sites have &#8220;landing&#8221; pages specifically designed for advertising click-through that are never referenced through anything but ads, and some pages which are meant to remain private.</p>
<p>The other place you can look is to do a search on that domain for robots.txt, since (if present) that specifies the files/pages/directories which a spider should or should not consider fair game for indexing, and may provide you with the information you need.</p>
<p>Bob</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using memcached
Database Caching 3/10 queries in 0.073 seconds using memcached
Object Caching 295/301 objects using memcached

Served from: itknowledgeexchange.techtarget.com @ 2013-05-21 12:32:33 -->