An approach to web development based on nodes

Share on Google+0Share on Facebook0Tweet about this on TwitterEmail this to someone

Since its babysteps, the World Wide Web (WWW) adopted the URL concept in order to provide a locating mechanism for its vast resources. In most cases, after the scheme (or protocol) part is recognized by the agent, a nameserver is used to resolve the domain part, and then, the request is sent to the appropriate port of a web server for further processing. The web server, in turn, analyzes the request and eventually produces a response.

A fundamental part of the web server’s request analysis is the mapping of the path to the resource. To do that, the server employs a predefined mechanism which, in its most basic form, is implemented as a simple 1-to-1 mapping to a portion of the local filesystem. For example, a url like http://www.example.com/path/to/image.png would be mapped to a local filesystem path such as /var/www/path/to/image.png (/var/www being the “document root” of the server). If the local path is actually resolved, the file is served. Otherwise, a HTTP 404 error would typically be returned to the agent.

This implementation is more than sufficient for most cases, because it provides a straightforward web-to-local mapping of both static and dynamic resources. In the latter case, the resource is considered to be a custom procedure (i.e. a script) that has to be executed in order to produce the response on-the-fly. During the execution phase, the script takes several other parameters into account, some of which are used to extend the resource’s “coordinates” beyond the script’s location. Following are some of the most common techniques used to accomplish that:

  • The PATH_INFO variable that comes after the script, i.e. http://www.example.com/path/to/script.php/extra/path/info
  • The QUERY part of the URL, i.e. http://www.example.com/path/to/script.php?area=extra&segment=path&item=info
  • A combined approach, which carries an actual path in a QUERY argument, i.e. http://www.example.com/path/to/script.php?path=/extra/path/info

Rewrite rules can be applied on the web server in order to provide the user with friendlier, prettier URLs, while the script can still access the several portions of the URL as originally intended. But that’s not the point of this article. The point is, rather, the fact that a portion of the URL sometimes has to be “virtual” and be handled by the script. Also, we are interested in situations where, from the script’s location downwards, the hierarchy remains tree-like — in other words, a namespace. From now on, we will confine the meaning of the request path to this namespace (i.e. whatever follows the path to the script) and represent it as a typical path, regardless of the actual technique chosen.

The Node

Consider that script.php has to handle the following namespace, visualized as a php array for readability:

$namespace=array(
    'water'=>array(
        'fish'=>array('bass','trout','barracuda'),
        'misc'=>array('seaweed','crab'),
        ),
    'air'=>array(
        'eagle',
        'bee'=>array('bee-hive'),
        ),
    'admin'=>array(
        'users'=>array('alice','bob','charlie'),
        'articles'=>array('bass','trout','barracuda','seaweed','crab','eagle','bee-hive'),
        'categories'=>array(
            'water',
            'air'=>array('bee'),
            ),
        ),
    );

This (remotely…) resembles a part of a simple, imaginary CMS. Each article is accessed through its full category path, since the categories are hierarchical. But when it comes to editing, articles are accessible through an /admin/articles/{id} path. The same goes for users and categories, only in the latter case the hierarchy is retained.

As with all tree structures, the namespace consists of branch and leaf nodes. A leaf node is one with no children while a branch is the opposite. Although this is true structure-wise, there is another, more important, branch/leaf node discrimination: A request-wise one. When, for example, /admin/users is requested, the leaf node for the request is users. It must respond with, let’s say, a table of users for the admin to choose of. In other words, it should activate a “content handler” routine and produce the response. From this point on, when we refer to a leaf node, we will mean it in a request-wise sense.

Now, what would happen if /air/bee/bee-hive/print was requested? With print being redundant, apparently bee-hive is our last chance to handle the request. However, the redundant portion of the path should be passed to bee-hive‘s “content handler” as a possible set of parameters.

In order to serve a request, the tree must be traversed from the root node to the leaf. This way, all ancestors of the leaf will have the chance to form the environment under which the content handler will execute. For example, bass will have different behaviour when accessed through /water/fish than through /admin/articles. This implies that, not only a node should be aware of its children, but it should also be aware of its parent. By knowing a node’s parent, we automatically know its parent’s parent and, recursively, we can eventually reach the root (given the fact that the root’s parent will always be NULL).

Now let’s define a basic, abstract node that encapsulates all the above:

<?php
abstract class NodeBase
    {
	
	private $parent=null;
	private $nodes=array();
	final function parent() { return $this->parent; }
	final function root() { return ($this->parent?$this->parent->root():$this); }
	
	function __construct(NodeBase $parent=NULL) 
		{
		$this->parent=$parent;
		}

	abstract function create_my_nodes();
	
	final function traverse(array $path)
		{
		$this->nodes=$this->create_my_nodes();
		if(($i=array_shift($path))&&isset($this->nodes[$i])&&($node=$this->nodes[$i]))  
            $node->traverse($path); // there is a child named $i
    	else // reached the end of the path, or the remaining cannot be resolved
			{
			array_unshift($path,$i);
			$this->leaf_execute($path); 
			}
		}
		
	abstract function leaf_execute();
	
	}

The leaf_execute() method plays the content handler’s role. This, as well as create_my_nodes(), which defines the node’s children, are left to the specific node type to implement. The former should output the actual content that the agent should receive if the node should act as a leaf, while the latter should return an associative array of the node’s children. You probably noticed that create_my_nodes() is not called in the node’s constructor but rather just before the decision whether the path should be traversed further. This is not only because it would be useless before that, but also because, creating nodes in the constructor could cause serious problems — but we’ll get to that, later on…

Other than that, the traverse() method is pretty straightforward: The $path parameter is an array of the remaining path to be followed. If the first element corresponds to a valid child node, that node is recursively traversed, with the rest of the path as a parameter. Otherwise, the current node is considered as the leaf node, and the redundant path, if any, is fed to its content handler.

Example: A childless root

To demonstrate a simple leaf_execute() implenemtation, as well as the simplest application possible, suppose the above code would be in a nodebase.class.php file. Then, consider the following script.php:

<?php
include "nodebase.class.php";

class MyRoot extends NodeBase
    {
    function create_my_nodes() {return array();} // no children for me please!
    function leaf_execute()
        {
        echo "Hello, I'm a childless ".get_class();
        }
    }
    
$root=new MyRoot(NULL);
$root->traverse(array());

When script.php is invoked, the agent should receive a message saying “Hello, I’m a childless MyRoot”. Notice that MyRoot::create_my_nodes() returns an empty array, saying that it has no children. Moreover, when we instantiate our root node, we give it a NULL parent, as all root nodes should have. Finally, we traverse the “whole thing” using  an empty array as a path to be followed, simply because there’s no path to follow.

Example: An infinite tree

This example will demonstrate the creation of NodeBase descendants as well as a namespace. It is also meant to show the reason why a node’s children should be created as late as possible.

Consider the following class definitions, stored in infinite.php:

<?php
class Branch extends NodeBase
    {
    function create_my_nodes()
        {
        return array(
            'branch'=>new Branch($this), // branch over and over
            'leaf'=>new Leaf($this),
            );
        }
    function leaf_execute()
        {
        echo "Hi, I'm a branch";
        }
    }
    
class Leaf extends NodeBase
    {
    function create_my_nodes() {return array();} // a leaf has no children
    function leaf_execute()
        {
        echo "Hi, I'm a leaf";
        }
    }

You can see that a Branch node always has two children: A Leaf and another Branch. Recursively, the latter will also have a Leaf and a Branch, and so on and so forth, to infinity. If we were to construct a node’s children during its own construction, the whole namespace would be created before the root was ever traversed! Apart from the unnecessary creation of a potentially large namespace, in our case the program would be driven into an infinite recursion.

Now, consider the following script.php. This time, we will use the PATH_INFO variable to determine the path requested.

<?php
include "nodebase.class.php";
include "infinite.php";

class MyRoot extends NodeBase
    {
	function create_my_nodes()
		{
		return array('test'=>new Branch($this));
		}
	function leaf_execute()
		{
		echo "The Root Of an Infinite tree!"; 
		}
	}
	
$TheRoot=new MyRoot(null);

$p=isset($_SERVER['PATH_INFO'])?$_SERVER['PATH_INFO']:'/';
$path=explode('/',trim($p,'/'));
$TheRoot->traverse($path);

The concept is the same as before. A MyRoot node is created, with a Branch child node named “test“. Moreover, the PATH_INFO variable is used to produce the array that represents the requested path. Valid paths are /, /test, /test/leaf, /test/branch, /test/branch/leaf, /test/branch/branch, /test/branch/branch/leaf and so on.

These two examples merely touch the surface of such a programming thechnique. There are more examples to come, in other articles, maybe including the initial “CMS” one.

Be safe!

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
Share on Google+0Share on Facebook0Tweet about this on TwitterEmail this to someone

Submit comment

Allowed HTML tags: <a href="http://google.com">google</a> <strong>bold</strong> <em>emphasized</em> <code>code</code> <blockquote>
quote
</blockquote>