PHP Application server - GSoC proposal

After, not so good feedback after present my idea on the PHP Dev maillist (probably for my “English”) so I’ve decide to write a blog post about it, with the main goal of clarify all those dark sides.

Objectives

  1. Simple principle of work.
  2. Migration painless process, people should be able to copy existence code and put on the worker and it should work.
  3. Easy scalability, just add more workers on need.
  4. Provide a simple and effortless way to deploy application across many servers.

Some concepts

  1. Client: The client has a list of workers, he chooses one randomly (similar to memcached), and query a function or queue a job.
  2. Worker: C or C++ application with PHP embed, which listens a TCP port (probably 25887). The worker can accept request  from anywhere, but only a single IP (from the master process) can submit and deploy
  3. Master process: PHP web page, where the sysadmin can upload PHP code and “deploy” (submit to every worker).

System overview

phpserver - system overview
The graphic above show how the system will look like, the clients (PHP scripts that runs on the webserver) can connect to any worker and request a function or queue a procedure. In the case of a function the client expects a response, otherwise it will not. The master process is the only one allowed to submit PHP code to the workers, with this the system has a really easy way to deploy applications.

Advantages

  1. Single point of source code distributions.
  2. There is not a central point (aka manager process), the client speaks directly to a worker, this keeps it really simple and fault tolerant. In traditional models if the manager crash, nothing works.
  3. I am planning to add a Cache layer on the client, to avoid call same functions with same parameters, if the function is cacheable, in order to reduce the network latency.
  4. It can be used to process large amount of data in parallel, as hadoop does.

Why

The big question, Why should I use this to scale if there are plenty of others well tested method to scale?

Answer this question without fanaticism it’s quite hard, but I will try. Basically my goal it’s provide an easy, painless and efficient way to scale, basically those three words are the differences, in theory you will be able to migrate a site by simple cut-and-paste from your application to the application server (through the master web-admin-page), also the master web-admin-page will be able to generate Proxy functions with the same name as the old name (the function that was cut and pasted on the appserver), so for your application (on the webserver) nothing had changed. Also that proxy function will have cache support, if the function’s return can be cached.

Part of it’s efficient would be the fact that there is not human readable protocol (no json, xml, http), just a method for fast serialization and deserialization, representing in a binary format the data.

Will it be secure, ssl, authentication or something?

The answer is no, part of the simplicity and efficient it’s avoid what is unnecessary since it is supposed that the appserver will run in a secure private network, so why loose CPU cycles checking out security?, of course any function could receive, in example, user and password, and check before execute it, but it’s out of appserver’s scope.

What would be the advantage over similar projects

  1. Speed, since it will be coded in C with threads and PHP embed
  2. Fault tolerant, since no central point, no-worker it’s indispensable.
  3. Add worker on hot, (I still need to figure out how to let know to clients about new workers),.
  4. Cache on client by the default (If a function it’s configured has return a cacheable object).

4 Comments

  1. Markus says:

    I’ve read your proposals on the list and I don’t think it’s your English. There’s room for improvement, sure ;-), but I think people understand quite well what you’re proposing and I think the very fundamental problem lies there in itself.

    Some things ahead: if you explain “Some concepts” and provide a graph without labeling all of the concepts, it’s confusing. Apply the Client/Worker terms into the graph.

    I think people find it hard to grok the C-Daemon thing. You’ve just provided architectural concepts and the daemon “looks” heavy weight. Heavy things are not everyone friends. If it can be light-weight, provide some insights why and how.

    I also think that many see “C daemon with embeded PHP” and wonder “So, what’s mod_php inside Apache then?”. You don’t explain that.

    Why can’t a Worker be just a well-proven-tested-has-been-around-for-decades-apache-instance instead some proprietary-looking-from-scratch-written-and-thus-probably-bug-ridden daemon?

    What interchange formats are you proposing?

    Since the dawn of PHP4 in May 2000 there have been zillions approaches of App-Servers, Distributed Environments, Self-Running-Daemons and whatnot. None of them *really* succeeded.

    Too complex for developers, too many bugs, looked fine on paper but didn’t do well (in whatever aspect) when done, too hard to use, no useful documentation, support, etc.

    So. I’m trying to stay object. For me this concept needs to be fleshed out better, thinking ahead.

    Good luck! :-)

  2. Markus says:

    No way to edit my comments. Oh my!

    The last thing should read “I’m trying to stay neutral.”.

    And something to add about caching: HTTP caching is well known and well understood; just in case you’re proposing a new cache technique which hasn’t seen the light yet you’ve to explain why it’s better to go from scratch.

    If it’s about performance, where are the performance savings.

    Ok, I’ll stop. I’m not the one to convince anyway. I’m just trying to write down what I think people visualize about your idea. I can be wrong, don’t forget that :)

    • César Rodas says:

      Probably you are right, thanks for the guide.

      I understand the basis about what your telling me, but I’m still convinced that way it’s a good way to scale.

      >I also think that many see “C daemon with embeded PHP” and

      >wonder “So, what’s mod_php inside Apache then?”.

      >You don’t explain that.

      My idea it’s separate the website, in services, i.e In you Apache (mod_php) you have basic routine to show the page, then you create an application and put on the Appserver (you write a service), called Register_User, from you Apache you can request the execution of that function, and it returns something, in this case true if success otherwise false.

      If the Register_User function does a lot of things, perform a select over a sql table, find duplicated e-mail, validate all those data. The whole overhead it’s perform in another server, and you mod_php waits the answer. Since the Apache will be idle waiting, it will be able to handle other request.

      My idea it’s not replace Apache, nor create a new kinda Webserver. My goal it’s provide a way to distribute work over machines, in order to have small process on the front, and hard process on over machines.

      Additional my idea was added “procedures” and “functions”, “functions” returns something, “procedures” does not (a.k.a Queue something for a late execution).

      It is clear enough?

      P.S: Regard the comment edit, it is WP, not my fault this time ;).
      P.S: Once again, thanks for you comment.

  3. César Rodas says:

    Just updated to the version 2 of this posts, thanks to Alex for the feedback.

Leave a Reply