<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Science, Technology, and Real Estate</title>
	<atom:link href="http://matt.stampede.org/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://matt.stampede.org</link>
	<description>Musings about the world around us with a business twist.</description>
	<lastBuildDate>Mon, 23 Nov 2009 20:24:26 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=abc</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Emerging Technologies: SSI at SuperComputing &#8216;09</title>
		<link>http://matt.stampede.org/?p=50</link>
		<comments>http://matt.stampede.org/?p=50#comments</comments>
		<pubDate>Mon, 23 Nov 2009 20:23:50 +0000</pubDate>
		<dc:creator>matt</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[blade]]></category>
		<category><![CDATA[packet capture]]></category>
		<category><![CDATA[ssi]]></category>

		<guid isPermaLink="false">http://matt.stampede.org/?p=50</guid>
		<description><![CDATA[The Intel Server System Infrastructure (SSI) Project has a lofty goal: To standardize the hardware for x86/x86_64 based blade servers and their backplanes. This is an enterprise and academic computing game changer without a doubt, but its current incarnation leaves a big hole in the debugability and security for applications and operating systems running on [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://ssiforum.org/">Intel Server System Infrastructure (SSI) Project</a> has a lofty goal: To standardize the hardware for x86/x86_64 based blade servers and their backplanes. This is an enterprise and academic computing game changer without a doubt, but its current incarnation leaves a big hole in the debugability and security for applications and operating systems running on theblades.
<p />
<p><span id="more-50"></span></p>
<h3>So, what&#8217;s a blade server?</h3>
<p>I&#8217;ll get to exactly what a blade server is shortly, but first an analogy serves well for visualization: think of your local telephone company and the phone line they provide you.  If you want an extra telephone in your house and you&#8217;ve already got the connectors wired, you buy a new phone, connect it to the jack, then you&#8217;ve instantly got a dial tone with no fuss and no pain.  This is exactly how blade servers are supposed to work for the computing industry. Put simply, a blade server (or simply &#8220;blade&#8221; for short) is a modular computer; a self contained motherboard, processor, RAM, and storage module that&#8217;s easily plugged or unplugged from a rack specifically designed to accept them.  It typically has a proprietary connector containing power, network, and health management functionaliy that&#8217;s automatically connected and configured when the blade is plugged into the backplane (which is the &#8220;jack&#8221;.)
<p />
<p>So, when a company or university needs more compuational power because the web-server is bogged down generating pages or because a scientific simulation would be too slow otherwise, they buy an extra blade and plug it in to instantly get another computing resource.  Of course they probably need to configure software on the new computer for it to be useful, but that&#8217;s not important for this discussion.  The take-away point is that a blade gives a no-fuss method for additing additional computers to your infrastructure, and as a bonus, when a computer inevitably fails blades give a great way to swap out the failed module with a new one in a matter of seconds.
<p />
<h3>The SSI Project</h3>
<p>A major problem hindering blade adoption has always been the lack of any standard (blades have been around <i>en-masse</i> for at least a decade, and yet most haven&#8217;t ever heard of them!)  Concretely, <a href="http://www.appro.com/product/greenblade_main.asp">Appro</a> will sell you their own design for a backplane and blade, which is different then the one <a href="http://www-03.ibm.com/systems/bladecenter/">IBM</a> sells, which is different than the one <a href="http://www.dell.com/us/en/business/servers/blade/ct.aspx?refid=blade&#038;cs=04&#038;s=bsd">Dell</a> sells.  Of course, this is a headache for numerous reasons.  First there is inherit risk in the future cost of any blade you buy; if a vendor decides there isn&#8217;t enough margin in their blade product line and doubles their prices, in general you can&#8217;t seek a third party blade as a second source to combat the margin surfing. Next, the same vendor may decide to end-of-life the very blades that your backplane accepts, leaving you searching eBay for used parts should you ever need more blades or replacement components. And worst case, what happends if the vendor goes out of business right when you buy your first backplane and single node?  It&#8217;s potentially the Edsel car of the computing industry!
<p />
<p>Intel has done something to change all of this.  Their goal is of course to sell more CPUs, and blades are a perfect way for them to do so since the marginal per-blade upgrade cost is typically much less than that of a full computer, people buy more CPUs because they can afford more blades. So, they&#8217;ve pulled together a consortium of <a href="http://ssiforum.org/index.php?option=com_weblinks&#038;view=category&#038;id=61&#038;Itemid=6">juggernauts in the blade industry,</a> to design a standard architecture for blade servers and their backplanes to ensure that most of the drawbacks to blade infrastructure are washed away. If and when vendors adopt the standard, you&#8217;ll be able to cross company lines for sourcing blade servers and backplanes, just like you can cross company lines for hard disks, RAM, workstations, etc. today.  If they pull such a feat off, it will be a landmark event in the computing industry to say the least, an event as significant as Compaq&#8217;s upheaval of the PC market with the <a href="http://oldcomputers.net/compaqi.html">reverse engineering and re-implementation of IBM&#8217;s BIOS</a>, or with AMD&#8217;s implementation of the x86 processor line in the i386 and i486 processor days. Let&#8217;s hope they do.
<p />
<h3>The Gaping Hole</h3>
<p>Unfortunately, the SSI picture isn&#8217;t all roses today; the standards committee has inadvertently created a security and debuggability nightmare.
<p />
<p>I&#8217;ve glossed over the networking aspects of blade computing, but further discussion is warranted, because this is the cheif problem with the current SSI implementation.  The backplanes for blade servers usually have an integrated network switch of some sort, with ethernet and infiniband incarnations being the most common. The utility here is clear; by including a network switch in the backplane a single network cable can be connected to the backplane and provide outside-world connectivity for every blade in the rack.  There&#8217;s not a thing wrong with the idea behind this method, but the SSI implementation lacks a way to monitor inter-node traffic which probably makes the security administrator and MPI application developer readers groan.
<p />
<p>What&#8217; exactly is the problem, in case you didn&#8217;t catch it? (And if you didn&#8217;t catch it don&#8217;t worry &#8212; it can be a subtle point even if you&#8217;re on the periphery of one of the above categories.)  The problem is that you can&#8217;t see anything that the nodes in an SSI backplane say to one another.  In effect, you can only monitor the connections from the backplane to the outside world.
<p />
<p>Consider the following illustrative case for why the current SSI specification is currently broken: A very common architecture for an internet website running an online store is to run a single or a few web-server blades that talk directly to internet shoppers serving them images, shopping carts, and other pages, with two or three times as many database blades connected to the web-servers with current information about item availability, stock, prices, outstanding orders, etc. A typical attack vector is the following: a malicious user breaks into the web-server through a known or newly developed vulnerability over an encrypted (https) link.  The user then directs the web-server to fetch credit card numbers, names, and addresses from the database server, typically through the unencrypted link between the web-server and the database server, then tunnels the information through the encrypted link back to their PC.  With the SSI blade system, a network forensics or capture device would have no way of seeing the unencrypted data-leakage, since it would happen exclusively on the blade backplane.  In fact, unless the database query statements are audited and/or an SSL decryptor in used to feed the forensics systems, the company under attack will probably never know. In practice, most corporations have neither an SSL decryptor nor query auditing, since both are an expensive and detail-oriented tasks and their need is normally mitigated by forensics devices snooping the un-encrypted traffic.
<p />
<p>Another, concrete example is in order.  Super-computers are typically strung together from many single computers of the same makeup, obviously a prime market for blade servers. The developers of the applications that run on super-computers typically use the <a href="http://www.mcs.anl.gov/research/projects/mpi/">Message Passing Interface (MPI)</a> framework for making the single computers act in parallel and in lock-step as one large super-computer.  MPI programming is unfortunately error prone and hard, however, which is why super-computer programmers command big salaries.  To debug MPI programs the quintessential method is to capture the messages that individual computers pass one-another, and examine them for errors or other incorrect behaviour.  With a super-computer made of SSI blades, however, this debugging paradigm is completely unavailable. A packet capture appliance has no single point of entry, and thus doesn&#8217;t see the messages that nodes pass one-another. Instead, developers need to debug through other means, like developing a framework for dumping messages to a log on each machine, collecting them, ordering them, and analyizing them, hoping that the framework didn&#8217;t miss a critical component of the message; or perhaps they could run <a href="http://www.tcpdump.org/">tcpdump</a> on each node, and hope that the traffic is slow enough for that tool to keep up (which may sound trivial but is in fact a major problem,) though in that case they still need a way to collect coalesce the resulting PCAP files.
<p />
<p>Of course, there are many other examples of what is lost without the ability to snoop backplane network traffic, but the idea behind the problem should at least be clear with the scenarios already presented. What&#8217;s needed then, is a fix.  The SSI specification can be augmented to support a network TAP port and all of these issues vanish in a blink.  I&#8217;ve personally told the SSI developers about this issue, and its now on their radar.  <a href="http://ssiforum.org/">More feedback</a>, of course, will always help.</p>
<h3>Conclusion</h3>
<p>The SSI platform represents a giant leap forward for the computing industry as a whole but it introduces a major security and the debugging nightmare into environments that already have too many of those things.  A simple change can make the collective lives of every SSI blade user simpler, so they can worry about everything else.</p>
]]></content:encoded>
			<wfw:commentRss>http://matt.stampede.org/?feed=rss2&amp;p=50</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>File Archive: Long Lost Flight Unlimited III (FU3) Patch</title>
		<link>http://matt.stampede.org/?p=43</link>
		<comments>http://matt.stampede.org/?p=43#comments</comments>
		<pubDate>Wed, 08 Jul 2009 02:37:48 +0000</pubDate>
		<dc:creator>matt</dc:creator>
				<category><![CDATA[Aviation]]></category>
		<category><![CDATA[flight simulator]]></category>

		<guid isPermaLink="false">http://matt.stampede.org/?p=43</guid>
		<description><![CDATA[I&#8217;m an avid private pilot in my spare time; now that my coursework is largely complete at the University of Utah, I&#8217;ve had a few spare minutes here and there to fly both on the computer and in the real world. While I was waiting for X-Plane and Microsoft Flight Simulator X to install, update, [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m an avid private pilot in my spare time; now that my coursework is largely complete at the University of Utah, I&#8217;ve had a few spare minutes here and there to fly both on the computer and in the real world. While I was waiting for <a href="http://www.x-plane.com">X-Plane</a> and Microsoft Flight Simulator X to install, update, install add-ons, etc, I dusted off my old Flight Unlimited 3 and Flight Unlimited 2 disks and did an install of the old classic on my MacBook Pro with wine.</p>
<p>After it was installed, I went searching the internet far and wide for what proved to be a nearly unreachable prize &#8212; the 2.0 patch to<a href="http://en.wikipedia.org/wiki/Flight_Unlimited_3"> Flight Unlimited 3</a>.  It took several hours of crawling, and ultimately registering for a Finnish website (using google translate to help along the way) to find the patch.  To save everyone else the trouble, consider this a silly gift from me to you:</p>
<p><a href='http://matt.stampede.org/wp-content/uploads/2009/07/FU3PATCH.EXE'>FU3PATCH 2.0, US Edition (Download)</a></p>
<p>If you&#8217;ve got a copy of the UK edition or other world editions, please <a href="mailto:matt@stampede.org">send them to me</a> and I&#8217;ll include them here!</p>
]]></content:encoded>
			<wfw:commentRss>http://matt.stampede.org/?feed=rss2&amp;p=43</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Telescope Array, HiRes, and Lasers</title>
		<link>http://matt.stampede.org/?p=36</link>
		<comments>http://matt.stampede.org/?p=36#comments</comments>
		<pubDate>Sun, 21 Jun 2009 20:42:03 +0000</pubDate>
		<dc:creator>matt</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://matt.stampede.org/?p=36</guid>
		<description><![CDATA[<p>I'm doing my PhD research in the <a href="http://www.telescopearray.org">Telescope Array</a> Physics group at the <a href="http://www.physics.utah.edu">University of Utah</a>.</p>
<p>One of the great mysteries of High Energy physics is the origin of the highest energy cosmic rays; the Pierre Auger experiment has given weak evidence that active galactic nuclei are a potential source, but this analysis is far from conclusive. One area of interest which will probably make its way into my PhD thesis involves confirming these results through improved modeling of the data processing for the Telescope Array experiment and its predecessor, HiRes, by correcting what is potentially a serious error that eliminates the ability to confirm Auger's result.</p>
]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m doing my PhD research in the <a href="http://www.telescopearray.org">Telescope Array</a> Physics group at the <a href="http://www.physics.utah.edu">University of Utah</a>.</p>
<p>One of the great mysteries of High Energy physics is the origin of the highest energy cosmic rays; the <a href="http://www.auger.org">Pierre Auger experiment</a> has given weak evidence that active galactic nuclei are a potential source, but this analysis is far from conclusive. One area of interest which I&#8217;m pursuing (and an area which will probably make its way into my thesis) involves confirming these results through improved modeling of the data processing for the Telescope Array experiment and its predecessor, HiRes, by correcting what is potentially a serious analysis error that eliminates the ability to confirm Auger&#8217;s result.</p>
<p>The essence of this error lies in the method used to find and remove events triggered from lasers from the Telescope Array and HiRes data sets. Without publicly revealing too many details until the analysis is complete, it is plausible that correcting this error will lead to a reevaluation of the origins of Cosmic Rays which <em>could</em> be compatible with the Auger collaboration&#8217;s results.</p>
]]></content:encoded>
			<wfw:commentRss>http://matt.stampede.org/?feed=rss2&amp;p=36</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Statistics are Catalyzing the 3rd Generation of Modern Targeted Advertising</title>
		<link>http://matt.stampede.org/?p=16</link>
		<comments>http://matt.stampede.org/?p=16#comments</comments>
		<pubDate>Wed, 30 Jul 2008 21:17:06 +0000</pubDate>
		<dc:creator>matt</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[advertising]]></category>
		<category><![CDATA[science]]></category>
		<category><![CDATA[third generation advertising]]></category>

		<guid isPermaLink="false">http://matt.stampede.org/?p=16</guid>
		<description><![CDATA[The targeted advertising landscape is starting to go another through a major overhaul, and it's certainly no stranger to change. By utilizing computers, statistical analysis, deep packet capture, cross-user correlation, and fast computation technology, companies are poised to cause consumers and advertisers alike to change the way products are sold. Sales will become more efficient (less cost-per-conversion), consumers won't see ads for things they wouldn't be interested in, and markets will take another step toward ultimate Adam Smith style efficiency.]]></description>
			<content:encoded><![CDATA[<p>The targeted advertising landscape is starting to go another through a major overhaul, and it&#8217;s certainly no stranger to change.  By utilizing computers, statistical analysis, deep packet capture, cross-user correlation, and fast computation technology, companies are poised to cause consumers and advertisers alike to change the way products are sold.  Sales will become more efficient (less cost-per-conversion), consumers won&#8217;t see ads for things they wouldn&#8217;t be interested in, and markets will take another step toward ultimate Adam Smith style efficiency.<br />
<!--–more–--><span id="more-16"></span><br />
<span style="text-decoration: underline;">Background</span><br />
Modern targeted advertising started in print media; advertisers placed banking ads in the business section of the newspaper and shoe ads in the fashion section. I&#8217;ll call this model &#8220;old media targeted advertising.&#8221;  This style of advertising has a place in the traditional media, but has serious drawbacks, inasmuch as it doesn&#8217;t target an individual consumer&#8217;s tastes but rather takes broad strikes at the entire landscape, hoping to hit a few interested parties.</p>
<p>The first major overhaul of targeted advertising happened with the inception of keyword based internet advertising.  Early pioneers like <a href="http://en.wikipedia.org/wiki/DoubleClick">DoubleClick</a> (1996) and <a href="http://www.yahoo.com/">Yahoo</a> shaped the second generation of ad delivery.  Now instead of simply placing a Financial Services ad in the printed Dow Jones Industrial charts, a Stock Broker could place their ad in-line with an internet article that talks about selecting a Stock Broker.  Not only was this more efficient in terms of targeting the buying consumer, but ad-space became much cheaper &#8212; a print advertising campaign that cost several thousand dollars and hit many unrelated parties became an online campaign that cost a few hundred dollars.</p>
<p>DoubleClick in particular tried to bridge the gap between the keyword based online advertising and the new, more intelligent history based advertising. What does that mean, exactly?  DoubleClick advertises in many places and they collect data each time they serve an advertisement, regardless of the site.  They take this data, and try and correlate many events to a single user; they try and track you through each site you visit in your day-to-day life in order to build an advertising profile, then they use this profile to serve you more targeted ads.</p>
<p>As a concrete example, a consumer (let&#8217;s call him Joe) visits <a href="http://www.expedia.com/">Expedia</a>, <a href="http://www.delta.com/">Delta Airlines</a>, and <a href="http://www.kayak.com/">Kayak</a>, presumably looking for airline tickets. Assuming DoubleClick advertises on both Expedia and Kayak, they can  try and put two-and-two together to determine that Joe is planning a trip, and they can serve travel related advertising to Joe on any site that he visits whether or not the content on the site is related to travel.</p>
<p>By making correlations between Joe&#8217;s site visits and the detected patterns, DoubleClick has been able to increase the effectiveness of their advertising    campaigns.  There are, however, some serious limitations to the DoubleClick model.  3rd generation advertising attempts to overcome these limitations.</p>
<p><span style="text-decoration: underline;">Present Limitations</span></p>
<ul>
<li> Present advertising software lacks a comprehensive view of a user&#8217;s actions. They see only what they see, and that&#8217;s a very small picture.</li>
</ul>
<p>When DoubleClick&#8217;s software realized that Joe was going to take a trip, they used every opportunity available to try and present him with ads for travel. Unfortunately, DoubleClick&#8217;s internet coverage is quite small; fewer than 1 in 5,000 sites are reachable by DoubleClick.  This means that while DoubleClick may have properly guessed that Joe intended to take a trip, they can&#8217;t see when Joe visits Delta&#8217;s website directly (which doesn&#8217;t have DoubleClick ads) and purchases his tickets.  Not only is DoubleClick&#8217;s ad wasted since Joe doesn&#8217;t need travel arrangements, they&#8217;ve also wasted their opportunity to present him with another more appropriate ad.  No one benefits from this situation.</p>
<ul>
<li>Keyword driven advertising misses the actual content.</li>
</ul>
<p>Perhaps the most well known keyword based advertising system on the internet is <a href="http://www.google.com/adsense">Google&#8217;s AdSense</a>.  AdSense has built an advertising network spanning many sites (incidently this network is intimately related to DoubleClick&#8217;s network as Google purchased DoubleClick in 2007 and has begun to integrate its products with DoubleClick).  These sites are corporate and personal alike, with all types of content and all motivations.  When a content provider signs up for AdSense, they integrate Google advertising into their own content and are paid based on readers&#8217; clicks on the advertising.</p>
<p>When Google delivers advertising based on keywords found on a content providers&#8217; site, they are doing so with blinders on.  For instance, a web page with investment advice may mention the term &#8220;Real Estate&#8221; tens of times, but not fundamentally talk about buying or selling a home. Nonetheless, AdSense will spot the many uses of the term &#8220;Real Estate&#8221;, and probably serve an ad for a local REALTOR.  This ad doesn&#8217;t really benefit anyone involved, and again the opportunity to serve a better, more targeted ad is lost.  This is the reason that the clicks-per-view for Google AdSense are quite low (0.05 clicks per impression).</p>
<ul>
<li>Conversions aren&#8217;t used to target other advertising.</li>
</ul>
<p>When a user is served an online advertisement and then clicks on the ad to buy a product, everyone benefits.  The advertiser has made a sale, the advertisement content provider (like Google&#8217;s AdSense) has generated revenue from the click, and the consumer gets something they&#8217;re after.  There is unrealized potential for more benefit from this transaction for every party, but the data regarding the transaction ends at a sale.  In the next, or 3rd, generation of targeted advertising, data regarding a sale is sacred and potentially much more valuable than the keywords on the website in which the ad was served.</p>
<p>To explain how this data is valuable, consider a poker tournament.  A good poker player will find a &#8220;tell&#8221; in all of her opponents.  A tell is a twitch, an involuntary twitter, a change in breathing patterns, or something else discernible when a player is bluffing, or playing as if their cards were better than they actually are.  By finding and learning to watch for a tell like this, a good player has given herself an advantage in the game by being able to recognize when the odds for a win are in their favor.  Compare this with the data collected in a sale &#8212; the sale is a consumer tell. Advertisers should be able to categorize, classify, and manage the events and content leading up to the click and sale, perhaps even going back weeks or months in browsing history.  Once this tell is discovered, the advertiser can look for the same tell in another consumer&#8217;s patterns to try and predict another purchase, or apply this tell to other consumers that behave similarly.</p>
<p><span style="text-decoration: underline;">The 3rd Generation of Targeted Advertising</span></p>
<p>Advertisers and marketers alike are advancing the advertising landscape to overcome these limitations and soon will take advantage of the tells ready to be found.  Deep packet capture, or the practice of capturing and storing all network traffic for a period of time, is becoming a standard component in the IT management community and has already hit mainstream in the intelligence community.  The idea is simple:  Store everything that happens on a network because you probably won&#8217;t know until long after the traffic has passed which parts are interesting.</p>
<p>The benefits of this data retention to government intelligence agencies should be obvious.  If the FBI arrests a suspected criminal, they could potentially obtain a warrant and see all of the suspect&#8217;s past network traffic which includes website visits, IM conversations, email, and other sensitive data.  The CIA and the NSA may use the same data as a basis for analysis; more on that later.</p>
<p>For IT managers, the benefit of data retention probably isn&#8217;t as immediately obvious, though the benefits are still quite important and ultimately valuable.  Foremost, a company involved in a lawsuit with an employee over wrongful termination would no-doubt like to have a complete history of all network traffic the litigious employee originated to use as evidence in a courtroom.  Similarly, the same company might be a defendant in another lawsuit and could produce network traffic during discovery or as evidence while defending itself in court.</p>
<p>From the technology development and maintenance standpoint, data retention through deep packet capture can provide an IT staff with a forensics toolkit for finding slow points in network infrastructure.  The staff may observe that packet transit times on a particular network are 5 times slower than other networks and use this data to find a faulty switch.  At the application level, the IT staff might develop a model for what standard network traffic looks like and apply the model to new traffic to quickly find problems and security breaches.</p>
<p>But back to the main question, how does this apply to targeted advertising?  The answer is that deep packet capture provides a history and dataset upon which &#8220;tell tracking&#8221; can be built &#8212; that is, it gives enough information to advertisers to fundamentally change the advertising landscape. By installing a capture box at the internet service provider (ISP) level, a 3rd generation advertising company would see all network traffic originating from or destined to that internet provider.  Every single instant message a consumer sends and every web page they visit can be used to build a profile which goes leaps beyond what DoubleClick has done.  Because the advertising company could see all traffic, with and without advertising alike, the amount of data available to find a consumer tell grows exponentially, as does the ability to convert adspace into sales.</p>
<p>Thus it becomes clear that packet capture is a required basis for 3rd generation advertising.  This basis isn&#8217;t sufficient to propel advertising into new efficiencies and conversion ratios though.  There is another critical component to be built on top of the packet capture, in the same way that flour is a critically important component in the final product when baking bread: statistics.</p>
<p>Statistics are the &#8220;how&#8221; for tell tracking. By building statistical profiles for all consumer traffic at once, advertisers can search for (and find) consumer behavior patterns which lead to sales.  Every iota of every data packet will be analyzed by a statistics engine, and each of these iotas will be analyzed in aggregate.  The statistical information generated from this process will be applied to other consumers data, and a spider web of interconnected behavior and patterns will be built.</p>
<p>At this point it&#8217;s easy to see why the government is interested in installing packet capture devices wherever possible and how they might analyze that collected information.  Law enforcement will be made more efficient as will traditional intelligence gathering and anti-terrorist monitoring.  Cyber-warfare attacks will be more easily recognized, predicted, and prepared for.  The same style software will also be a boon to advertisers.</p>
<p>This spider web will allow advertisers to assign probabilities to all types of behavior, the interesting behavors include being able to tell with a 78.3% confidence that Bob Richards is going to buy a new car in the next 3 days and selling Bob&#8217;s information and statistical profile to Ford or GM for hundreds of dollars.  The best part is that as time progresses and more data is collected, the spider web will grow bigger and become more accurate, possibly even to the point of realizing that Bob Richards is going to buy a new car before Bob himself knows it.  And because the advertisers aren&#8217;t looking at keywords to target their advertising, they could give Bob a car ad on a furniture restoration website and have higher conversion rates than placing an ad on <a href="http://www.edmunds.com/">Edmunds</a> today, hoping to find a car buyer.  Markets will take the next step toward ultimate efficiency, advertising will become cheaper on a cost-per-lead basis, and consumers will see more ads that are relevent to them.</p>
<p>So why hasn&#8217;t this happened before now?  Simply put, computers weren&#8217;t fast enough, storage wasn&#8217;t cheap enough, and the industry tends to move in small baby steps without making the giant leap to the next revolution.  The gap is still wide, and an enterprising company stands to create and own a new market which uses statistics to push targeted advertising into its third generation.</p>
]]></content:encoded>
			<wfw:commentRss>http://matt.stampede.org/?feed=rss2&amp;p=16</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New Utah Real Estate Purchase Contract (REPC)</title>
		<link>http://matt.stampede.org/?p=8</link>
		<comments>http://matt.stampede.org/?p=8#comments</comments>
		<pubDate>Sat, 31 May 2008 22:44:37 +0000</pubDate>
		<dc:creator>matt</dc:creator>
				<category><![CDATA[Real Estate]]></category>
		<category><![CDATA[REPC]]></category>

		<guid isPermaLink="false">http://matt.stampede.org/?p=8</guid>
		<description><![CDATA[Utah&#8217;s draft Real Estate Purchase Contract (REPC) is online and currently in it&#8217;s public comment phase.  I&#8217;ve given the state my feedback on the contract.  I&#8217;m quite happy to see that the contract has language governing e-mail transmissions and other electronic transmissions, but I was disappointed to see that the Real Estate Commission has yet [...]]]></description>
			<content:encoded><![CDATA[<p>Utah&#8217;s <a href="http://www.realestate.utah.gov/repc.html">draft Real Estate Purchase Contract (REPC)</a> is online and currently in it&#8217;s public comment phase.  I&#8217;ve given the state my feedback on the contract.  I&#8217;m quite happy to see that the contract has language governing e-mail transmissions and other electronic transmissions, but I was disappointed to see that the Real Estate Commission has yet to address the confusing verbage at the top of the contract:</p>
<blockquote><p>Utah law requires real estate licensees to use this form.  Buyer and Seller, however, may agree to alter or delete its provisions or to use a different form.</p></blockquote>
<p><span id="more-8"></span></p>
<p>Unfortunately, this leaves licensees like me stuck in a conundrum:  There are many contracts for the purchase (or eventual purchase) of real estate in which I am a principal (that is, a Buyer or Seller), and which aren&#8217;t appropriate for the REPC.   I&#8217;d like to use some other contract.  Am I allowed to?  The verbage is unclear and leaves a logical loophole.  I contacted the Utah Association of REALTORs and asked their lawyer this question.  His answer?  In his opinion, acting as a principal gives you the same recognitions that a non-licensed Buyer or Seller has.  He followed this opinion with the phrase, &#8220;just to be safe, you may want to contact the Division of Real Estate&#8221;.</p>
<p>I followed up with the Division of Real Estate&#8217;s principle lawyer.  His answer:  The law is vague, but you must always use the Utah approved REPC, regardless of your status in the transaction.  So &#8212; two lawyers that specialize in Real Estate Law, and two different results.  Can&#8217;t we just clear up the language?  I hate wasting the paper to print a REPC, only to write an addendum that says all language replaces and superceeds the REPC.  Can&#8217;t we do a bit better?</p>
]]></content:encoded>
			<wfw:commentRss>http://matt.stampede.org/?feed=rss2&amp;p=8</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>iFolder, Unison, PowerFolder, and multiple-computer synchronization</title>
		<link>http://matt.stampede.org/?p=11</link>
		<comments>http://matt.stampede.org/?p=11#comments</comments>
		<pubDate>Tue, 01 Apr 2008 17:01:37 +0000</pubDate>
		<dc:creator>matt</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://matt.stampede.org/?p=11</guid>
		<description><![CDATA[I regularly use 5 computers a week &#8212; one at my office at the University of Utah, one at my office at Solera Networks, one at my home office, a laptop as a research terminal, and a laptop for Solera work. This presents a serious challenge, inasmuch as there is data which I&#8217;d like to [...]]]></description>
			<content:encoded><![CDATA[<p>I regularly use 5 computers a week &#8212; one at my office at the <a href="http://www.physics.utah.edu/">University of Utah</a>, one at my office at <a href="http://www.soleranetworks.com/">Solera Networks</a>, one at my home office, a laptop as a research terminal, and a laptop for Solera work. This presents a serious challenge, inasmuch as there is data which I&#8217;d like to be present and modifiable in each place without worrying about copying files to-and-fro over and over again.</p>
<p><span id="more-11"></span>There are three basic solutions for a problem like this:</p>
<ul>
<li>Use a web storage and document platform, like Apple&#8217;s <a href="http://www.apple.com/mobileme/">MobileMe</a> or Amazon&#8217;s <a href="http://www.amazon.com/gp/browse.html?node=16427261">S3</a> service, or through a home-rolled storage platform running on <a href="http://stampede.org">my co-located server</a>.</li>
<li>Manually copy files from one computer to the others as required.</li>
<li>Use synchronization software.</li>
</ul>
<p>Because of the way I choose to use computers, there are additional constraints imposed.  First, I use Linux, which means the software I&#8217;m interested in should work in Linux.  Second, and even more important, my laptops only have periodic internet connectivity, though I want access to my documents and files all of the time, which means the documents <span style="text-decoration: underline;">must</span> be cached locally.  Thirdly, I develop applications for many platforms like the GP2X, the iPhone, and the Nintendo DS &#8212; I&#8217;d like to build the development environments for each of these targets once, and let the synchronization software distribute the toolchains and source code as appropriate.  This requirement demands that synchronization software be able to handle file attributes (like the executable bit), and be able to work with many, many files.</p>
<p>Unfortunately for me, none of the software I&#8217;ve found fits these needs very well.  The primary contender&#8217;s that I&#8217;ve studied are: <a href="http://www.ifolder.com/">iFolder</a>, <a href="http://www.powerfolder.com/">PowerFolder</a>, and <a href="http://www.cis.upenn.edu/~bcpierce/unison/">Unision</a>.</p>
<h3>iFolder</h3>
<p>Among each of the products that I tried, iFolder seemed like an immediate slam dunk.  Not only does iFolder support Linux, it supports Mac OS X and Windows 2000+.  It also stores files on the server with encryption (which is optionally not viewable by even the administrator), it operates in peer-to-peer or centralized server mode, it supports disconnected operations, and it&#8217;s a mature, Novell backed product, or so it would seem.  Unfortunately, there are some serious flaws which I encountered just during the setup process:</p>
<ul>
<li>iFolder is amazingly hard to compile, and the directions to do so are quite out-dated.</li>
<li>iFolder setup is designed for an old version of SuSE.  None of the file-paths exist on most distributions, including newer versions of openSuSE.</li>
<li>iFolder requires mod-mono and other odd-ball software extensions.</li>
<li>Peer-to-peer client mode doesn&#8217;t actually work.</li>
<li>Clients need big chunks of the server to compile and run.</li>
</ul>
<p>After many hours of trying to overcome these issues and wiping my home file server to install openSuSE, I was able to get an iFolder client and server running, using the server package from the openSuSE build service and home-rolled clients based on the packaging of simias and ifolder-client in the Ubuntu Launchpad.  At least now I&#8217;d have a working solution, even if it meant I had to run openSuSE on my file server (it&#8217;s not all bad, just slow due to it&#8217;s debug kernel and access controls, and it&#8217;s non-uniform, which presents other problems).</p>
<p>It&#8217;s too bad, however, that things weren&#8217;t this easy.  iFolder did work properly and easily on my simple &#8220;school/&#8221; folder, which contains all of my recent school work.  After this folder, I tried to synchronize my &#8220;cabinet/&#8221; folder, which is my digital equivalent of a file cabinet.  iFolder got slower and used more memory, but was able to deal with this folder properly as well.</p>
<p>Finally, I tried to synchronize my &#8220;Projects/&#8221; directory, which contains a few gigabytes worth of toolchains and coding projects.  The result?  iFolder pushed my load average above 10, used all of my available memory and swap, and made the OOM killer go wild on my workstations.  Oy-vei!</p>
<p>And thus, we keep looking for tools.</p>
<h3>Unison</h3>
<p>Unison is cross platform just like iFolder, which is a plus but not a requirement.  Unlike iFolder, unison doesn&#8217;t have a persistent client which runs continuously watching for a server connection, or watching for changes.  Instead, you must run it manually whenever you want to synchronize, or use cron, scripting, or some other task to make it run periodically and automatically, and to make it support offline operations.  I wrote a simple shell script which does some of this:</p>
<pre>#!/bin/bash
#
# Simple script to check for server connectivity and try and synchronize
# files with unison
# 

# The time to wait in-between synchronizations
SYNC_TIME=10m

# The unison profiles to synchronize
PROFILES=school cabinet Projects

# The server to check for connectivity.
SERVER=cessnaii.local

while [ /bin/true ]; do
      for project in $PROFILES; do
      	  ping -c 1 $SERVER &gt;/dev/null 2&gt;&amp;1 &amp;&amp; \
            	  unison $project -batch
      done
      sleep $SYNC_TIME
done</pre>
<p>There are a few problems with this script, but it serves well with some caveats.  The problems:</p>
<ul>
<li>You must have pre-configured the unison profiles.  You can do this with unison-gtk.</li>
<li>You should have ssh key exchange pre-configured.</li>
</ul>
<p>The other problems are general problems with unison &#8212; you must store to a VFS file system, and there is a master copy.  (Alternatives might be storing to davfs, storing to a database, storing to sshfs, and similar.)</p>
<h4>PowerFolder</h4>
<p>PowerFolder is a commercial file synchronization software with an open source component branch.  Because PowerFolder is a java application, it works on any platform that supports java (in theory).  I did all of my testing with the open source version of PowerFolder, because I didn&#8217;t want to pay a fair amount of money for their commercial offering which has some limitations (to be discussed shortly):</p>
<p>PowerFolder worked somewhat well, but it suffered from similar problems to iFolder.  I won&#8217;t go into the details because I believe the free software alternatives are better than PowerFolder, but the overview is:</p>
<ul>
<li>PowerFolder didn&#8217;t properly handle Unix file attributes.</li>
<li>PowerFolder&#8217;s memory usage grew and grew on big folders.</li>
<li>PowerFolder didn&#8217;t support offline operation.</li>
<li>Java requires a JVM, which typically requires a 32-bit machine in Intel land (though IcedTea and OpenJDK are correcting that, and GCJ is always a viable alternative.)</li>
</ul>
<h4>The Brave, New World</h4>
<p>In the end, I&#8217;ve decided to write a better synchronizer, one that is smarter and easier than the current offerings, and one that doesn&#8217;t require 64GB of memory to operate on large folders! (I&#8217;ll consider 640k  as about the right amount of memory usage.)  I&#8217;m planning to implement this tool as a QT based systray applet, with the following properties:</p>
<ul>
<li>Automatically detect server presence, synchronize as required.</li>
<li>Exteremely low CPU and memory usage.</li>
<li>Simple GUI configuration tool.</li>
<li>Allow encrypted storage.</li>
<li>Use storage plugins, specifically a plugin for generic filesystems and a plugin for KIO slaves (which will automatically support DAV, SSH, NFS, CIFS, Cameras, etc).</li>
<li>Portable between Windows, Linux/UNIX, and Mac OS X.</li>
<li>Use Kde wallet and similar services for password storage as required.</li>
</ul>
<p>Stay tuned for more updates!</p>
]]></content:encoded>
			<wfw:commentRss>http://matt.stampede.org/?feed=rss2&amp;p=11</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Left Handed Materials and Redshift</title>
		<link>http://matt.stampede.org/?p=3</link>
		<comments>http://matt.stampede.org/?p=3#comments</comments>
		<pubDate>Thu, 20 Mar 2008 21:22:42 +0000</pubDate>
		<dc:creator>matt</dc:creator>
				<category><![CDATA[Physics]]></category>
		<category><![CDATA[Left-Handed Materials]]></category>
		<category><![CDATA[LHM]]></category>

		<guid isPermaLink="false">http://matt.stampede.org/?p=3</guid>
		<description><![CDATA[In my Spring 2007 Electrodynamics II class at the University of Utah, Dr. Efros mentioned odd possibilities with respect to Left-Handed Materials (LHMs) in our universe. LHMs are a special kind of material (lovingly called a meta-material) which have electromagnetic properties contrary to what we&#8217;re used to experiencing in our every-day world.
Dr. Efros hypothesized about [...]]]></description>
			<content:encoded><![CDATA[<p>In my Spring 2007 Electrodynamics II class at the University of Utah, <a title="Dr. Efros" href="http://www.physics.utah.edu/~efros">Dr. Efros</a> mentioned odd possibilities with respect to <a title="Left Handed Materials (LHMs)" href="http://en.wikipedia.org/wiki/Left-handed_material">Left-Handed Materials (LHMs)</a> in our universe. LHMs are a special kind of material (lovingly called a meta-material) which have electromagnetic properties contrary to what we&#8217;re used to experiencing in our every-day world.</p>
<p>Dr. Efros hypothesized about how these LHMs would affect the matter distribution in the Universe.  He showed how divergent electromagnetic waves (including light) could be re-focused while travelling through the LHM, which would skew the matter distribution seen in the Universe (things would appear closer than they actually are).  He also suggested that he couldn&#8217;t think of a way in which we would know that this is happening.</p>
<p>I believe that we could determine if this is happening, in principle, using standard candles as a yardstick to compare with galactic redshift.  Although light from celestial objects would appear refocused, they would still be subject to redshift effects.  By comparing the two, we could in principle determine if we&#8217;re peering through LHMs.</p>
]]></content:encoded>
			<wfw:commentRss>http://matt.stampede.org/?feed=rss2&amp;p=3</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
