20071217: Of TDD, algorithms and Code coverage

This evening at the Dojo, we have had a session on Kata Chop: implementing dichotomic search, and proving it.

20071215: Asynchronous builder and scala

Lift has support for interacting with RabbitMQ which is a highly scalable asynchronous messaging queue system written in Erlang. This gives me the idea of rewriting the embryonic Builder system in scala, with asynchronous communication based on RabbitMQ. The idea is to build software development monitoring nodes that would monitor data gathered at some software development spot and produce some analysis and feedback:

Nodes could be placed at various points in the development network: developer's station, standalone server with notification from SCM, centralized CI system ...

20071019: SFINCS and information policy

A framework for defining privacy policies: http://www.nyu.edu/projects/nissenbaum/papers/ci.pdf

20070930: Parsers for maven shell

For definition of a more human-friendly language interface to maven, I am looking at various parser generators implementation:

20070914: Maven PM (cont)

Try running StatSVN on repo.

$> wget
http://switch.dl.sourceforge.net/sourceforge/statsvn/statsvn-0.3.1.zip
$> unzip statsvn-0.3.1.zip
$> cd maven-src
$> svn log --xml -v > svn.log
$> cd ..
$> java -jar statsvn.jar maven-src/svn.log maven-src

Takes way too long as it extract line counts from revisions, will do some text manipulation instead from the log in text format.

20070914: Idée de stage

Construire une implémentation de patch/diff en Java pur compatible avec différents formats de Patch.

20070912: Idées pour un Dojo

20070907: Notes on Project Management for Maven

Issue resolution time for maven2:

Year Nb Issues MRT
2004 32 44
2005 891 35
2006 335 93
2007 392 266

Average age:

Year Avg. Age
2004 41
2005   227
2006   171
2007   107

20070906: Functional testing

As it names implies, functional testing is all about relating two functions:  

As we know from the universality of Turing machines and most other computing languages, this is not merely an abstraction: any piece of software can be interpreted in term of some function relating inputs to outputs. Or equivalently, any piece of code in any language can be translated into some pure functional language (ie. Haskell).  

Those functions are usually partial and may diverge

``````We can easily transform any partial specification into a complete specification by extending it such that it relates the subobject of its domain where it is undefined to \bot. So we have s: A\rightarrow (B \cup \bot) as the complete type for s.  

``````For any function f:A \rightarrow B, we can extend f's domain to A\cup \bot by defining f\bot = \bot.  

Testing`` needs the definition of some functor T that sends objects to finite sets so that functions are defined on finite sets and can thus be enumerated. It also needs the definition of some oracle`` object \Omega that is used to construct the natural transformation (?) assessing the equality of the two functions (specification and implementation).  

20070903: Functional abstraction for API changes

In classical version numbering schemes where each software release is numbered X.Y.Z.T, the various numbers usually have the following meanings:  

A change in major version number usually implies some form of incompatibility. If software is a program or some form of library that manipulates persistent data, then file formats, database schemas, exchange protocols are not supposed to be compatible. Lower version programs are not expected to be able to read/manipulate the data produced by a higher version program. The converse may also be true in which case this breaks upward compatibility. This latter property is most often what end-users expect so this is usually not a good idea to break it, and if required to do so, it is best to provide some tools for data conversion,

If software is a library, major version increase denotes a change in Application Programming Interface which implies that clients of the library will not work or will produce odd behaviors with the newest version of the library.  

A increase in minor version usually implies addition of some feature to the software without modification to existing features' behavior. Upward compatibility preservation is a mandatory property of minor version releases.

A increase in patch level denotes bug fixes changes: existing features' behaviors is corrected to meet requirements. The distinction between a bug and a feature is then based on functionality:  

Anyway, clients of the software that uses it as expected should not be negatively affected by changes in minor version or patch level.  

``````We can model this versionning scheme using functional abstraction. Let f:A \rightarrow B be a partial function, with domain A and range B, representing the specification````````````. At any point in time, the developed software s is a partial function with domain C and range D such that there are some inclusion functions i:C \rightarrow A and j:D \rightarrow B with f . i = j . s.  

``````````````````Two versions differing only in minor number are then two functions s_1:C_1\righarrow D_1 and s_2:C_2\rightarrow D_2 s.t. i_1 = i_2 . k, for some k: the domain inclusion map for first version can be factored through the second version's inclusion map. Or equivalently, the domain of s_2 contains the domain of s_1 and s_2 conincides on s_1 on those elements that are in C_1.  

Two versions differing in major versions are derived```` from different specifications f and g which need not be related.  

20070808: How to invoke scala compiler programatically

From the scala mailing list.

        // We currently call the compiler directly
        // To reduce coupling, we could instead use ant and the scalac
ant task




        import scala.tools.nsc.{Global, Settings}
        import scala.tools.nsc.reporters.ConsoleReporter


        {
            // called in the event of a compilation error
            def error(message: String): Nothing = ...


            val settings = new Settings(error)
            settings.outdir.value = classesDir.getPath
            settings.deprecation.value = true // enable detailed
deprecation warnings
            settings.unchecked.value = true // enable detailed
unchecked warnings


            val reporter = new ConsoleReporter(settings)


            val compiler = new Global(settings, reporter)
            (new compiler.Run).compile(filenames)


            reporter.printSummary
            if (reporter.hasErrors || reporter.WARNING.count > 0)
            {
                ...
            }
        }




        val mainMethod: Method =
        {
            val urls = Array[URL]( classesDir.toURL )


            val loader = new URLClassLoader(urls)


            try
            {
                val clazz: Class = loader.loadClass(...)


                val method: Method = clazz.getMethod("main",
Array[Class]( classOf[Array[String]] ))
                if (Modifier.isStatic(method.getModifiers))
                {
                    method
                }
                else
                {
                    ...
                }
            }
            catch
            {
                case cnf: ClassNotFoundException => ...
                case nsm: NoSuchMethodException => ...
            }
        }


        mainMethod.invoke(null, Array[Object]( args ))


20070804: Versionning MQ

Giorgos Keramidas <keramida@ceid.upatras.gr> writes:
> It may be useful to note here that MQ can 'version' the patch sets, and
> 'qcommit' can be used to keep a change log of the patch queue itself.
>
> You can convert the patch queue to a 'patch repository' with:
>
>         hg qinit -c
>
> Then the .hg/patches/ directory is a Mercurial repository too, and you
> can alternate between:
>
>         hg qrefresh -e
>         hg qcommit
>
> This will let you keep your patch queue versioned too, and you will be
> able to track the changes to your patches as they evolve over time.
>
> I often use MQ to keep a stack of patches on top of a snapshot of the
> 'official' source tree at work, and using 'qinit -c' with 'qcommit' lets
> me develop the patches themselves in small steps.  This is very useful
> some times, i.e. when I qrefresh at the wrong moment it's marvellously
> easy to roll back bogus patch changes by popping the stack of patches
> and running:
>
>         hg qpop -a
> 	( cd .hg/patches ; rm -f * ; hg up -C )
> 	hg qpush -a
>
> This restores the patch queue to a 'known good state', the state of the
> last 'hg qcommit' operation, and I can keep massaging the patches until
> they have the precise shape and contents I want them to have.
>
> Maybe something like this, based on 'qinit -c' and 'qcommit' can help
> you track the state of your patches too :-)


20070803: Using patches

While working on a Maven course, creating various maven projects for testing purpose, I started to use Mercurial Queues for storing versions information. The basic use case is:

# create mercurial repo
hg init
hg add  ...
hg commit ...
# create MQ 
hg qinit
# start working on a patch
hg qnew some-feature.patch
# periodically refresh your patch with changes
# option --git allows tracking of copies
# option -e allows editing of message
hg add ...
hg mv ...
hg rm ...
hg qrefresh --git -e
# then push some new patch when you're done
hg qnew

One can then plays with patch stacks:

# go back 
hg qpop
# go forward
hg qpush

The main advantage I see with this scheme is that it helps me getting focused on one feature at a time: No more commits containing tons of unrelated changes. Of course, it is possible to have some discipline with standard commits and to mess up a MQ repository, but I personnally find helpful that I am requested to give some title to my piece of work, and to group things intelligently.  

20070803: RDF again

I dropped the idea of doing diffs directly with objects: too complicated, lot of corner cases if we take into account objects in all their generality. I did manage however to hack sommer to convert a standard bean into a RDF graph:

This is crude but it works as a basic POC and could form the basis for the RDF-maven API I dream of, with some improvements:

Bought Semantic Web PrimerG.Antoniou and F.van Harmelen, The MIT Press, 2004

20070731: Exercise idea

  1. get some raw sources from a SVN repository
  2. create a maven project from them: this implies setting up POM, splitting sources along standard layout (take care of tests), eventually creating modules if there seems to be a need.  

20070731: Object diffs and Maven20

While working on new Maven2 course, I started over a project for generic gathering of data in maven build process. This project has two goals:

  1. start something useful on the ideas I expressed sometimes that we should have a uniform way to collect data in Maven and make it persistent,
  2. illustrate most common and uncommon things we do with maven and its ecosystem.  

The projects structure shall be the following:

Module Features Pedagogic goal
core library Implements object graph manipulations, diffs and decorations Illustrate basic maven process: creation of project, standard lifecycle (ie. releases, scm handling, POM features, basic reports)
maven graph api Based on core, used to annotate Maven POM with build information. Needs to handle build events and POM strcuture, as well as exploration of standard properties and default structure of a project Maven internals, basic multi-modules dependencies
maven graph mojos Plugin for instrumenting maven builds and offering the ability for other plugins to hook in data on the build Illustrate Maven plugins development, writing Mojos, testing plugins and maven test harness, integration testing
maven graph persistence Persistence module, based on Hibernate/Derby/PostrgreSql for storing/retrieving a graph structure DB, code generation and Hibernate modules
build events WS Web service (JBoss, sar) offering remote access to graph data structure Illustrate J2EE artifacts handling, profiles (for testing/deployment/production parameters)
reporting Web Application Web application (JBoss, Jetty) providing on demand graphs, reports and tables about the object graph structure Illustrate webapp construction, builtin jetty run plugin for inplace testing, web app deployment with cargo ?

Computing diff of 2 objects

We want to compare two objects o1 and o2 of the same type and compute the difference in values of the graph rooted at o1 and o2. Basic algorithm needs to run BF traversal in parallel for the two objects. At each step, we have 2 properties name/class/value triples t1 and t2 within a context. Several cases may arise:

  1. the two properties are identical, which means that either they are primitive types and they are equal, or they are objects and they are identical (strings and wrapper types are considered as primitive). Then recursion stops (even for objects, we do not push this objects properties as they will obviously be identical...)  
  2. the two properties are not identical:
    1. if they are primitives (or wrapper), a delta is emitted,  
    2. if they are objects of different runtime types, a delta is emitted and no recursive processing occurs  
    3. if they are objects of the same runtime type, then:
      1. if a mapping has been recorded for o1's property, and o2's property is consistent with this mapping, no delta is emitted,
      2. if a mapping exists but it is inconsistent, a delta is emitted
      3. if no mapping exists, a mapping is recorded, properties are pushed and exploration continues.  

Information storage

Build information about a project could be stored with the POM/artifact in the repository like anything else. Then it would be retrieved at start of build, updated, and put back through install/deploy goals.

20070729: Test, test, test !

I spent quite a few hours this week-end debugging OpenJgraph,  or more precisely the DigraphLayeredLayout which layout graphs in layers following Sugiyama's algorithm. I was reviving a tool I wrote couple of years ago for analyzing Struts configuration files and the actions graph consistently blew up the layout algorithms.  

I then started debugging, adding unit tests along way to check behavior of various algorithms and methods in the package: graph traversal, connected set management, layout positionning, directed acyclic graph handling of edges... It turns out that the culprit was a tiny method called getOppositeVertex() in class EdgeImpl which used <code>=</code> instead of equals()= to compare its vertices ! Then identical strings constructed at different times would be considered different vertices and incident edges would not consider them opposite because of difference in identity.

Once again, I am bitten by the lack of through unit tests that would check consistent behavior and express it more precisely.  

20070729: Update Git confiug

Following mail from Junio Hamano, I updated my git config to handle synchronization between desktop and laptop:

From: Junio C Hamano <gitster@pobox.com>
Subject: Re: Newbie problem
To: Insitu <abailly@oqube.com>
Cc: git@vger.kernel.org
Date: Sat, 28 Jul 2007 01:01:49 -0700


Insitu <abailly@oqube.com> writes:


> Now, I want to be able to do:
> lap> git push
> or
> lap> git pull
>
> instead of 
> lap> git push ssh://pc/~/.git
>
> I think I need to reconfigure my remote branches/origin on laptop but
> don't want ot break everything.


The necessary syntax and configuration files are all documented
fairly detailed in the manual pages, but it is a tad hard to
know where to look:


    http://www.kernel.org/pub/software/scm/git/docs/git-fetch.html
    http://www.kernel.org/pub/software/scm/git/docs/git-push.html
    http://www.kernel.org/pub/software/scm/git/docs/git-config.html


If you use recent enough git (post 1.5.0), the recommended way
to keep two boxes in sync is:


On mothership box, in .git/config:


 [remote "origin"]
     url = satellite:.git/
     fetch = +refs/heads/*:refs/remotes/origin/*
     push = refs/heads/*:refs/remotes/origin/*
 [branch "master"]
     remote = origin
     merge = refs/heads/master


On satellite laptop, in .git/config:


 [remote "origin"]
     url = mothership:.git/
     fetch = +refs/heads/*:refs/remotes/origin/*
     push = refs/heads/*:refs/remotes/origin/*
 [branch "master"]
     remote = origin
     merge = refs/heads/master


Then, whenever you start working on the satellite:


	$ git pull


which, while you are on "master" branch, would use 'origin' as
the default remote (thanks to branch.master.remote configuration),
store the copy of mothership's branches in refs/remotes/origin/,
and merges the "master" branch obtained from the mothership to
your "master" branch on the satellite [*1*].  


When you are done working on the satellite:


	$ git push


will push to "origin" by default, which would push all your
branches (thanks to remote.origin.push configuration) to
mothership's refs/remotes/origin/.


When you go back to the mothership, your work done on the
satellite are already pushed into the refs/remote/origin/
tracking branches, so you can merge them in (you can do this
after shutting down your satellite laptop, which is the beauty
of this setup):


	$ git merge origin/master


to merge in the changes you did on the satellite.




[Footnote]


*1* If you prefer to keep a straight history, you may want to
    fetch+rebase instead of pull which is a fetch+merge, in
    which case this step will be:


	$ git fetch
        $ git rebase origin/master

20070720: Clever data structures

From: Dan Weston <westondan@imageworks.com>                         
Subject: Re: [Haskell-cafe] Hints for Euler Problem 11              
To: Ronald Guida <ronguida@mindspring.com>                          
Cc: haskell-cafe@haskell.org                                        
Date: Thu, 19 Jul 2007 20:54:40 -0700                               


[1. text/plain]                                                     


Here's my hint, FWIW.                                               


Pick a data structure that makes your life easier, i.e. where horz, 
vert, and diag lines are handled the same way. Instead of a 2D      
structure, use a 1D structure.                                      


Then,                                                              
previous

data Dir = Horz | Vert | LL | LR


stride Horz = 1                 
stride Vert = rowLength         
stride LL   = rowLength - 1     
stride LR   = rowLength + 1     


nextItem dir = drop (stride dir)

20070717: Assertions and contracts

While reading (again) assertions usage in Java, it occured to me that the ban of assertion use for preconditions checking of public methods in DBC setting was due to the possibility that assertions be disabled at runtime. The specific example given is:

previous

    /**
     * Sets the refresh rate.
     *
     * @param  rate refresh rate, in frames per second.
     * @throws IllegalArgumentException if rate <= 0 or
     *          rate > MAX_REFRESH_RATE.
     */
     public void setRefreshRate(int rate) {
         // Enforce specified precondition in public method
         if (rate <= 0 || rate > MAX_REFRESH_RATE)
             throw new IllegalArgumentException("Illegal rate: " +
         rate);


         setRefreshInterval(1000/rate);
     } 

Here, the parameter is checked to enforce contract, in the absence of constrained types (something that made the strength of Ada). To me, this is typical of Defensive programming: the program is guarded against mistakes made by its caller, something that is different from contract-based programming.  

The idea of contract-based programming is to share the burden of trust. A contract binds the two parties in a relationship that says:

  1. If caller respects its share of the contract, ie. calls some function or object in the right way or with the right set of parameters, then
  2. Callee will return some sensible (and predictible) value, or let the world in some well-known state.  

``This is a logical implication A \rightarrow B, a statement that has the property of being always  true  if A is false. Recall truth-table:  

A B `` A\rightarrow B
T T T
T F F
F T T
T T T

Thus if 1) is not respected by the client, then B can do whatever it please, it is no more bound to respect its share of the contract.  

In contract-based programming, verification of input parameters for contract enforcement is then not mandatory as it is the responsibility of the client to ensure that its inputs are correct, if it expects a meaningful answer. One can thus rewrite the preceding example as:

previous

    /**
     * Sets the refresh rate.
     *
     * @param  rate refresh rate, in frames per second. Should be <= 0 or
     *         > MAX_REFRESH_RATE.
     */
     public void setRefreshRate(int rate) {
        assert rate <= 0 || rate > MAX_REFRESH_RATE :"Illegal rate: " + rate;
        setRefreshInterval(1000/rate);
     } 

The advantages of this formulation  are:

  1. it can be made more efficient at runtime by disabling assertions verification,
  2. it gives the same amount of information to the client, making clear what its share of the contract is,
  3. it can be checked selectively by enabling assertions, for example during integration testing phase,
  4. it does not prevent defensive programming if needed: just enable assertions.  

20070714: Some new books

Bought some books recently on various topics. In no particular order:

Thanks to Raphaël Marvie, I have also read Agile Software Development. Principles, Patterns and PracticesRobert Martin, Prentice-Hall, 2003.  

20070714: Utopie

"[...]Et beaucoup en viennent à croire que ses nombreux échecs prouvent que l'éducation demeure une tâche coûteuse, d'une complexité incompréhensible, que c'est une alchimie mystérieuse - la recherche, pourquoi pas, de la pierre philosophale !" Ivan Illich, Une société sans école, in Oeuvres complètes, Vol.I, Fayard 2003

20070714: De la rigueur  de la science

C'est tout à fait par hasard, dans l'épilogue de La vie rêvée des maths, de D.Berlinski, que j'ai retrouvé l'origine de la citation de J.L.Borgès sur la carte à l'échelle 1 pour 1. Il s'agit d'un très court texte d'un paragraph, écrit avec Bioy Casarès sous pseudonyme et paru dans l'édition espagnole de 1954 de l'Histoire universelle de l'infamie. Le texte n'avait pas été repris dans la première édition (1974) des Oeuvres complètes en français dans la Pléiade mais est adjoint au corpus de notes. La référence exacte (en français) est donc:

De la rigueur de la science, p.1509 in José Luis Borgès, Oeuvres complètes, Vol.I, Bibliothèque de la Pléiade, Gallimard, Paris, 1993

Je reproduis ici l'intégralité du texte:

"...En cet Empire, l'Art de la Cartographie fut poussé à une telle Perfection que la Carte d'une seule Province occupait toute une Ville et la Carte de l'Empire toute une Province. Avec le temps, ces Cartes Démesurées cessèrent de donner satisfaction et les Collèges de Cartographes levèrent une Carte de l'Empire, qui avait le Format de l'Empire et qui coïncidait avec lui point par point. Moins passionnées pour l'Etude de la Cartographie, les Générations Suivantes réfléchirent que cette Carte Dilatée était inutile et, non sans impiété, elles l'abandonnèrent à l'Inclémence du Soleil et des Hivers. Dans les Déserts de l'Ouest subsistent des Ruines très abîmées de la Carte. Des Animaux et des Mendiants les habitent. Dans tout le Pays, il n'y a plus d'autre trace des Disciplines Géographiques."

(Suarèz Miranda, Viajes de varones prudentes, livre IV, chap. XIV, Lérida, 1658)

20070712: Live CD for Building software

Goal: Creating a Live CD/Linux distribution for building/CI of Java/native software. The distribution should contain all the necessary/useful things for serving a community of developers for Java software based on maven2:

The distribution is preconfigured so that it boots with all the necessary services built and running. One can start using the distro immediately for accessing svn repo, issue tracking, configuring CI and the like.  

Possible implementations:

$> dd if=/dev/zero of=~/myFileSystem.img bs=1024 count=650000

20070706: Irreducibility of Languages

W.Heisenberg's book, Ordnung der Wirklichkeit (Le manuscrit de 1942, Alia in french) introduced the notion of layers of reality:  

This classification represents a process by which subject becomes closer and closer to object. It can be lifted to languages in a way different from classical Chomsky's hierarchy. The latter deals with layers of complexity: from simpler languages (eg. regular languages) to more complex languages (eg. natural languages). Another classification is in the representability relationship existing between languages.  

There is need for different views of problems, hence different languages, each irreducible into one another (eg. no grand narrative, no common semantics).  

20070705: Agile Questions

From the XP mailing list:

1) What is the affect of doing software design on an iterative and evolutionary basis, rather than doing design up-front?

2) How does being part of a "whole team" allow team members to handle external responsibilities? In particular, how can a "customer" or "product owner" be part of a whole team, if they are alone are responsible for business success? Also, how do traditonal management roles work in the "whole team" environment?

3) How can agile development work with user interaction design and usability evaluation?

4) What is the affect of pair-programming? In particular, how does it affect productivity and communication?

5) How reliant is agile development on co-location of team members, and what can be done to accommodate distance between members?

6) How does test-driven development support agile development? Can all requirements be expressed as tests? How can tests be augmented, deleted, or changed over time?

7) How does agile development accommodate various business contexts? For example: in-house development, outsourced development, consumer product development, ...?

8) How does agile development change over time, after developers and customers have gained experience, and practices have become accommodated into routine?

9) Is agile development, or some practices, suited more to some application domains than others? For example: business IT, embedded systems, e-commerce, ...?

10) How does agile development work with established software infrastructure? For example, legacy systems, established libraries, frameworks, ...?

11) How does agile development work with established management structures? What areas of resistance do agile teams encounter and how are those areas addressed? Is agile development compatible with management needs? How does agile development tackle change management amongst it's stakeholders?

20070704: The Compilation Continuum

There is a continuum of transformations from source code to runtime software:

It should be possible for the developer to chose the point of computation for some feature: compile-time, deployment time, linking time, runtime, with the same syntax.  

In C++ template style:  

previous

public int pow(int n, int e) {
    if( e == 1 )
      return n;
    if(== 0)
      return 1;
    return n * pow(n, e-1);
}


public int doSthing(int n) {
   int k = pow(n,4);
   ...
}

could expand to: previous


public int doSthing(int n) {
   int k = n * n * n * n;
   ...
}

In macro style, we could have:  previous




   List<String> names = ....
   names = names.map(new Function<String,String>() {
      public String apply(String s) {
         return s.toUpper();
      }
   }

could expand to:

previous

   List<String> names = ....
   List<String> names_ = new ArrayList<String>();
   for(Iterator<String> i = names.iterator();i.hasNext();) 
       names_.add(i.next().toUpper());
   names = names_

or could be handled by the library containing map().  

A feature/behavior could be provided at various stages in this process.  

20070628: What is software ?

20070618: Coding with automata

Problem:

Hypothesis and Rationale :

Solution:  

Application:  

Future works:

20070524: On coverage

http://blog.objectmentor.com/articles/2007/05/16/100-code-coverage

20070521: On Verification

Recently read an article from R.DeMillo, R.Lipton and A.Perlis about program verification, Social Processes and Proofs of Theorems and Programs. They criticize the totalitarian view of program verification  zealots, studying the process of proof and the way mathematicians make progress, and showing that the naive equation programs=theorems and verification=proof is flawed as it does not represents truthfully the real nature of both mathematics and programming.  

Both are social activities, proofs and programs being complex things that are meant to express something, to convey some idea and to entail some belief from the part of the reader. Both are meant therefore to be read by human being, the latter being also aimed at formal processing by a machine. Formalization is a theoretical possibility that should be left for the gods.  

Verification does not work, and never will: They are intractable except for trivial programs, and they do not take into account the real "life" of programs. Real-world programs changes, have inconsistent or  incomplete requirements, need finite resources to complete...  

From http://lambda-the-ultimate.org/node/2216#comment-32646

20070519: Notes for integrating Muse and Fit

To execute Fit tests from Muse pages, it should be sufficient to subclass Parse object such that base wiki syntax is parsed instead of HTML syntax. This necessitates some refactoring of Fit code to allow pluging-in of parser, or could by done simply by subclassing FileRunner.process() method.

A simpler method would be to subclass Fitlibrary's fitlibrary.runner.CustomRunner which provides a hook method makeTables() to produce tables in a Parse object. To run tests implies then to:

  1. parse the Muse files in test mode
  2. for each test file, start execution of a CustomRunner subclass
  3. override makeTables() to retrieve Muse tables

20070518: Web application development sucks (contd.)

While preparing some training material about Inversion of Control pattern and its implementation in Spring, I attempted to implement as quickly as possible an application simulating an electronic cash dispenser. I first decided to write a Web application, using the cool Wicket framework I recently discovered at ApacheCon. But although I have attended the tutorial (about 450 bucks...), I was unable to produce some workable application in a reasonable amount of time.  

I reverted to write a basic CLI client, based on some state-transition pattern, something I understand quite well. I must have some disability that prevents me to fully understand how to write quickly and efficiently web application, something that has to do with the fragmentary nature of such applications and the numerous technologies you have to master (or at least gather) to start something.  

While surfing I came upon HOP, a Scheme-based language for developing web application, that embeds both client-side and server-side components in a single framework. As far as I can tell, this is something like this I really miss in the Java world.  

20070508: On humility (or the lack thereof)

I just read this post from the blog of Jonathan Locke, initiator of the Wicket Web Framework. Whether or not the thesis is true (that software industry and big business in general does not have the ability to produce outstanding tools for lack of global optimization incentives), it always seem odd to me when someone writes such immodest things. I just don't know if this is some special sense of humor or just plain hybris.  

20070508: Web application development sucks

During its talk at ApacheConEu, Matt Raible asked the audience what framework they were using. He then polled the attendees according to their preference and asked them whether their particular framework "sucked". The answer happened to be yes most of the time.  

My opinion is more generally that web application development itself plain sucks. We still have to produce adequate tools to make this anything but a painful experience and I think that the right tool is not a framework but a language.  

20070507: ApacheCon

I have setup a page collecting notes about the ApacheCon conference I attended last week.  

20070414: Notes on post-modernism and its relationship to programming

Following the Notes on Postmodern Programming, J.Nobble and R.Biddle, OOPSLA'02, I started writing an article about what I understand of post-modernism and how this applies to programming.  

20070329: BDD and Contracts

Behaviour Driven Development uses test cases as executable assets for asserting correctness of code with respect to expected behaviours. Developer writes test cases that describe expected behaviour of developed objets or group of related objects (Domain, related to Domain Driven Design). Test cases generally rely on the mock object's technique:  

This technique is lifted at a higher abstraction level with stories and scenarios that represent user acceptance tests or high-level behavioural tests. A story is described by a (role, feature, benefit) triple and implemented by a sequence of scenarios. A scenario is a set of:

It is then obvious that a scenario is also an particular instance of some contract that should be met by the CUT:

Given a set of scenarios on some object, we could reconstruct a specification by inferring some general property from the detailed contracts given as test case. These general properties may take the form of types or logical predicates on the state of the world, yielding some Kripke structure: states of world as set of assignments (pairs key-values) and transitions as events. This more abstract representation could be used to check some properties on the inferred specification: eg. consistency and completeness.

20070306: Inductive Graphs in Java

I tried to implement basic elements of inductive graphs in Java as defined by Martin Erwig in its article describing FGL, a Functional Graph Library for Haskell and ML. The notion of inductive graphs may be interesting for defining/traversing  graphs incrementally or constructing some comparison function such as a diff between graphs, assuming some canonical order can be found on the ordering of elements.  

Implementation is straightforward and quite simple using generics. I use a simple numbering trick to mark nodes during iteration over contexts such that visited edges do not appear twice during visit of a graph using contexts: Iterator marks each node returned and compare marks before adding an edge to a returned context. This ensure that iteration's time complexity is linear in the number of edges of the graph.  

I used a TreeMap for storing nodes, ensuring natural iteration over contexts is done using natural ordering of the nodes' type.

Some hints on diff (on same type graphs). We construct 3 lists of contexts showing diffs between left (L) and right (R) graph:  

The algorithm may be:

Are there some interesting properties this diff exhibits ? Is it the minimal transformation from one graph to the other ? Is it easier to compute than with standard graph representations ? Than with matrix representation ? What about generalized graph xformation like DPO ?

20070226: Supporting remote coverage

  1. create an artifact from patchwork with instrumented classes
  2. create a module that listen on a network for coverage information
  3. use these artifact in an integration test (e.g. with cargo or other plugin) that also starts coverage reporter

20070223: Muse and FIT

Following an idea from Brian Marick's blog and what is done  in FitNesse, it occured to me that Muse may be easily extended with some syntax for generating Fit/FitNesse tests that would be more free-form than tables. This could be done through adding a specialized parser recognizing special tags or combinations, or natural languages fragments.

20070216: Two new books

Being in Paris Thursday, I went to Le Monde en tique, a bookshop dedicated to computer books. Being a compulsive reader and book buyer, I had a hard time choosing among the many interesting books that were available there. I finally choose two "classical" books:

20070214: Extreme Hour lesson

I ran on Monday an Extreme Hour game with my students from the GLSI "Licence Professionnelle" at Lille I. I wrote a short account on this experiment, together with some pictures I shot during the game.  

20070213: Understanding Microformats

Microformats are a simple way to add semantics to XHTML/XML (-like) content. In XHTML, a microformat basically define some standard grammar implemented as tag and class names couples that can easily be embedded in any XHTML/XML compliant content: It is in some sense a standardized CSS. One example from hReview microformat, one that normalizes writing reviews:

This review of some restaurant: previous


<div>
 <span>5 stars out of 5 stars</span>
 <h4>Crepes on Cole is awesome</h4>
 <span>Reviewer: <span>Tantek</span> - April 18, 2005</span>
 <blockquote><p>
  Crepes on Cole is one of the best little creperies in San Francisco. 
  Excellent food and service. Plenty of tables in a variety of sizes 
  for parties large and small.  Window seating makes for excellent 
  people watching to/from the N-Judah which stops right outside.  
  I've had many fun social gatherings here, as well as gotten 
  plenty of work done thanks to neighborhood WiFi.
 </p></blockquote>
 <p>Visit date: <span>April 2005</span></p>
 <p>Food eaten: <span>Florentine crepe</span></p>
</div>

could convey microformat information as:

previous

<div class="hreview">
 <span><span class="rating">5</span> out of 5 stars</span>
 <h4 class="summary">Crepes on Cole is awesome</h4>
 <span class="reviewer vcard">Reviewer: <span class="fn">Tantek</span> - 
 <abbr class="dtreviewed" title="20050418T2300-0700">April 18, 2005</abbr></span>
 <div class="description item vcard"><p>
  <span class="fn org">Crepes on Cole</span> is one of the best little 
  creperies in <span class="adr"><span class="locality">San Francisco</span></span>.
  Excellent food and service. Plenty of tables in a variety of sizes 
  for parties large and small.  Window seating makes for excellent 
  people watching to/from the N-Judah which stops right outside.  
  I've had many fun social gatherings here, as well as gotten 
  plenty of work done thanks to neighborhood WiFi.
 </p></div>
 <p>Visit date: <span>April 2005</span></p>
 <p>Food eaten: <span>Florentine crepe</span></p>
</div>

The idea is that the information formatted with a microformat can be equally easily understood by a human, using standard browsing/newsfeed reader, and by a machine, using XML/XSLT parsing and transformation.  

On the same idea, there is also RDFa frow W3C and eRDF. From http://bnode.org/blog/2007/02/12/comparison-of-microformats-erdf-and-rdfa.  

20070209: More on STM

http://patricklogan.blogspot.com/2007/02/misguided-road-not-to-be-travelled.html

Some lengthy post on why Shared Transactional Memory is evil. Quite convincing, but is still very funny to implement...

20070208: Patchwork and functional coverage

Thinking about using patchwork as a basis for functional model-based analysis: one would map low level events `(tid, cid, mid, bid)` to high-level events using some kind of rational transduction notation, thus allowing mapping of complex (yet regular) patterns to single letters at the higher model level.

20070204: Patchwork advances

Patchwork project is rapidly advancingn towards being a usable tool. I have fixed last week a lot of bugs and did much refactoring to produce something working from the CLI.

Some Todos:

20070122: Tail call optimization

An article http://citeseer.ist.psu.edu/schinz01tail.html about general tail-calls optimization on the JVM. Presents several techniques for implementing this feature using Java constructs, based on the trampolining technique:  

Some measurements show that performance loss is acceptable, around a factor of 2.

20070118: POPL'07

Received notification of proceedings for POPL'07. Noticed two interesting articles:

  1. Scrap your boilerplate with XPath-like combinatorsRalf Lämmel
  2. Compositional dynamic test generationPatrice Godefroid

20070117: Agility in french

Added some summary from some agile books. Available on the  Agility page.  

20070117: TDD Controversy

From the agile testing mailing list, a discussion about the impediments to TDD widespread adoption:

Starting post

*Posted by Bob Evans, Agitar Software, creators of an interesting testing tool set for Eclipse.*

After using TDD for several years in a couple of companies and trying to get others to try  it out, only to see the majority of the attempts crash and burn, I have come to the conclusion that TDD is too difficulty for the majority of developers. I think that while most developers would now agree that unit testing is a good idea, the majority fail and do not  adopt it, let alone adopting TDD. I would like to volunteer to take the following position, and develop a point/counterpoint paper with anyone who is interested.  

Why is it that most developers cannot succeed with unit testing, let alone TDD, and what is  a better path to improve unit test adoption and by extension TDD adoption? We need more, and smarter, automation tools to handle all the tedium and to let developers focus on the happy path.  

Difficulties:

  1. TDD requires the ability to think much more abstractly than is necessary for plug and  chug development that a lot of developers can get away with currently. TDD requires dropping down to ever smaller baby-steps until some progress can be made. Most developers will give up instead of applying the excruciating patience to learn this skill.
  2. Most engineers who even try it, on their first couple of experiences end up boxing  themselves into corners with tests or designs that become brittle, and then they give up the first time they have a bunch of failing tests. Or, their teammates constantly break the tests and don't repair them. Management won't back the testing up because of the estimated cost: ~20% overhead for test maintenance (which I would argue is a very optimistic estimate.)  
  3. Among engineers who even try it, their code bases are not currently very testable and the process of making things testable is too cumbersome. There is too much tangling to be able to construct objects, and there may not be reasonable ways to assert the post conditions of objects that were constructable.  

I would take the position that while TDD is a great way to build software that also offers regression testing features for future development, our current environment is akin to assembly language development. For it to gain any greater adoption, more high-level tools are required to automate more of the menial steps involved.  

Automation Solutions:

  1. Tools like cruisecontrol, only automatically configured from the IDE project file or the ant/make scripts so that the test runs are automated and transparent.  
  2. Test generator tools that generate all the test cases developers don't, i.e., all the non happy-path test cases - expection cases, negative cases, off-by-one cases, so on. Also these tools should be able to traverse all the nasty dependencies to construct objects to be tested. This is just a constraint solving problem, which is what computers are good for.  
  3. Tools that report test failures at a higher level of abstraction, like change impact analysis, to help the developer more quickly see what failed and what caused the failure.  
  4. Tools that allow the developer to more easily manage test suites. For instance, help them repair or remove obsolete tests and like #2, generate new tests for new code, or at least show dropped coverage numbers so that developers can see where the holes are in their test suites.

Summary

Great developers manage to adopt TDD successfully today, better-than-average  developers manage to adopt unit testing on green field development, but most developers  only believe that unit testing is a good idea in theory. To make it possible for more  developers to get the benefit of unit testing, and ultimately for some portion of that to get  the benefit of test driven development, more of the tedium and menial tasks need to be  handled by automated tools. Automation, used appropriately, what could be called  Computer Aided Developer Testing (CADT), is essential to bringing testing to the next  level of developers.

My answer

I found this point of view really interesting and provocative. As a follow-up to Lisa Crispin reply, I would also note the emphasis on tools for helping TDD adoption growth, which is not a surprise given that Agitar software offers a testing solution that is somewhat along the line of points 1) through 4) (no offense intended, just reading between lines :-)).  

I have been bitten enough by this to try some counterpoint arguing on the list. I totally agree with the initial argument of the debate: TDD is difficult to implement and to sell to management on projects with unexperienced/non-agile/legacy code teams. Some comments on the various obstacles encountered on the path to adoption of TDD :  

  1. TDD is too abstract: I do not think this is the main problem, and I do not even think this is really a problem. As advocated in the JUnit documentation, programmers love to write tests if these tests are themselves programs. TDD does not require you to think abstractly, it only requires you to think. One could say that this is more than wat is expected from the average programmer, but that would deny to most of us professionalism and open-mindedness.  
  2. People get lost in TDD: Quite true given my own experience. Initially, programmers write either complex test cases, or trivial test cases that do not really provide any insight on the expected behavior of the tested code. If left alone, they can easily get the feeling that TDD is useless or too complicated for them,
  3. Non testable codebase: I would reply this is the whole point of TDD, to uncover untestable code, then make it more testable, then make it tested. THis may not be easy but it always pay-off.  

The author then argues that what is lacking is appropriate tools that would work on more abstract level, better integrate with IDE, better manage test suites. The analogy is made between assembly and high-level languages programming, hinting at the fact that advanced testing tools would be like modern compilers allowing one to work on a more abstract level.  

Here are some detailed responses:

  1. ??? Continuous integration and standard build tools are already quite commonplace in development process. While not being strictly orthogonal to TDD, they can be used without it, with reduced added benefits of course.  
  2. As a former researcher on test generation issues, I should definitely react on the sentence "this is just a constraint solving problem". I am quite sure the author is well aware on the available litterature on the subject, probably more than I am, and to the best of my knowledge, there does not exist any technique or tool that generate interesting test cases. Sampling, random generation, and to some extent CSP analysis can generate lot of test data input for mundane test cases (limit cases, 0, 1 and the like), but the resulting increase in code quality is somewhat questionable and has been questionned. This also leaves open the issue of test coverage (I prefer the term 'test objective'): a) Generating test cases with the intent of achieving a certain coverage number is even more difficult, and b) what is the true impact of coverage measures with respect to overall software quality ?
  3. That's a point against TDD (or rather against unit testing). Unit testing code just relieves you of the burden of finding where the problem is located.  
  4. Once again, tools only generate trivial test cases that increase coverage figures but not code quality nor understanding.  

To conclude, I believe, on the contrary that what prevents the widespread adoption of TDD is the following:

  1. there is already too much tool, and the development environment is getting more and more complex: Developers get lost in the management of their tools and are constantly distracted from their main task,
  2. moreover, there is a great lack of 'conceptual integrity' (term borrowed from F.Brooks) in the development environment (hence in the tools it is made of) and in the development process, with conflicting requirements, layers after layers of complex technologies, buzzword design... TDD is not more widely adopted because it is an intimate part of a whole set of practices (and values) that forms XP. I would recommend reading http://www.vanderburg.org/Blog/Software/Development/xpannealed.rdoc for further thoughts on this.  

Tools are, hmm.. well just tools ! They can help or hinder development, but they cannot by themselves solve behavioral problem. To be more widespread, TDD need trained people (which I am, on my modest and local scale, contributing to :-)) and change in development process.

20070112: About XP and agility

Started writing an essay about agility and XP methods, from an implementation point-of-view and in a non-agile environment.  

20070111: More projects...

User-friendly implementation of Software Transactional Memory in Java (or Scala ?). Wikipedia's definition emphasizes the lockfree-ness of STM, but DSTM2 is based on locks and its API does not seem very user-friendly.

Ideally, one should be able to write:

atomic {
      newNode->prev = node;
     newNode->next = node->next;
     node->next->prev = newNode;
     node->next = newNode;
}

or, less simple but may be as readable: previous


  XAction xa = new XAction();
  xa.start();
  ... // do something
  xa.commit();

One possible implementation I can think of would need bytecode manipulation

  1. at configuration time (eg. in a file, a property, prior to XA system starting), one defines the set of classes that should be monitored for STM. A special classloader is setup that handles these classes and instruments them ; all getfield, putfield, getstatic and putstatic instructions are wrapped with method calls that ensures usage of XA code: if there is a current transaction in this thread (storage in thread local), then the instruction is handled by the XA, else it is handled directly.  
  2. at runtime, one simply demarcates transactions using start(), commit() and rollback(). The commit() may throw a conflict exception, indicating that completion of the XA was impossible due to conglict arising.
  3. atomiticity is handled using optimistic concurrency control: for each field transactionnaly protected accessed during a transaction, the XA keep a log or copy of value, keeping the changes private to the XA. Upon commit, values are compared with the actual values and if a discrepancy is noticed, a conflict has been detected.
  4. a XAFunction interface can be provided that automatically demarcates a XA for a call() method and provides auto-restart.  
  5. there may be need for a cleanup thread, that would detect as early as possible uncommitable XA and would abort the conflicting XA.

20070111: Continous testing feedback

It might be useful to have a continuous test runner executing tests in the background and providing feedback on them. This is implemented in http://pag.csail.mit.edu/continuoustesting/ for Eclipse. Might be interesting to adapt for non-eclipse situations, eg. with an Ajax UI and webapp server pushing events.

I need:

  1. monitoring a set of class files, containing tests: when one class file changes, trigger tests execution. It should be possible to filter test executions accordnig to their dependency on classes. Test cases usually depends on concrete classes !
  2. a background thread that launches the tests: a new classloader is made with modified classes and tests are rerun. We do not fork a new process (VM) each time if possible. The test results are broadcasted on the fly to interested listeners (observer pattern).  
  3. a listener that works as a server for remote Ajax or whatever UI and pushes the results to it
  4. a UI

The reloader could be based on VFS to allow transparent handling of remote and local resources.