Cogitation and Rumination: development

Showing posts with label development. Show all posts

2014-03-21

A Wrinkle in Time

You've built a prototype, everything is going great. All your dates and times look great, they load and store correctly, everything is spiffy. You have your buddy give it a whirl, and it works great for them too. Then you have a friend in Curaçao test it, and they complain that all the times are wrong - time zones strike again!

But, you've got this covered. You just add an offset to every stored date/time, so you know the origin time zone, and then you get the user's time zone, and voila! You can correct for time zones! Everything is going great, summer turns to fall, the leaves change, the clocks change, and it all falls apart again. Now you're storing dates in various time zones, without DST information, you're adjusting them to the user's time zone, trying to account for DST, trying to find a spot here or there where you forgot to account for offsets...

Don't fall into this trap. UTC is always the answer. It is effectively time-zone-less, as it has an offset of zero and does not observe daylight savings time. It's reliable, it's universal, it's always there when you need it, and you can always convert it to any time you need. Storing a date/time with time zone information is like telling someone your age by giving your birthday and today's date - you're dealing with additional data and additional processing with zero benefit.

When starting a project, you're going to be better off storing all dates as UTC from the get-go; it'll save you innumerable headaches later on. I think it is atrocious that .NET defaults to system-local time for dates; one of the few areas where I think Java has a clearly better design. .NET's date handling in general is a mess, but simply defaulting to local time when you call DateTime.Now encourages developers to exercise bad practices; the exact opposite of the stated goals of the platform, which is to make sure that the easy thing and the correct thing are, in fact, the same thing.

On a vaguely related note, I've found a (in my opinion) rather elegant solution for providing localized date/time data on a website, and it's all wrapped up in a tiny Gist for your use: https://gist.github.com/aprice/7846212

This simple jQuery script goes through elements with a data attribute providing a timestamp in UTC, and replaces the contents (which can be the formatted date in UTC, as a placeholder) with the date/time information in the user's local time zone and localized date/time format. You don't have to ask the user their time zone or date format.

Unfortunately it looks like most browsers don't take into account customized date/time formatting settings; for example, on my computer, I have the date format as yyyy-mm-dd, but Chrome still renders the standard US format of mm/dd/YYYY. However, I think this is a relatively small downside, especially considering that getting around this requires allowing users to customize the date format, complete with UI and storage mechanism for doing so.

2014-03-13

On Code Comments

I've been seeing a lot of posts lately on code comments; it's a debate that's raged on for ages and will continue to do so, but for some reason it's been popping up in my feeds more than usual the last few days. What I find odd is that all of the posts generally take on the same basic format: "on the gradient of too many to too few comments, you should aim for this balance, in this way, don't use this type of comments, make your code self-documenting." The reasoning is fairly consistent as well: comments get stale, or don't add value, or may lead developers astray if they don't accurately represent the code.

And therein lies the rub: they shouldn't be representing the code at all. Code - clean, self-documenting code - represents itself. It doesn't need a plain-text representative to speak on its behalf unless it's poorly written in the first place.

It may sound like I'm simply suggesting aiming for the "fewer comments" end of the spectrum, but I'm not; there's still an entity that may occasionally need representation in plain text: the developer. Comments are an excellent way to describe intent, which just so happens to take a lot longer to go stale, and is often the missing piece of the puzzle when trying to grok some obscure or obtuse section of code. The code is the content; the comments are the author's footnotes, the director's commentary.

Well-written code doesn't need comments to say what it's doing - which is just as well since, as so many others have pointed out, those comments are highly likely to wind up out-of-sync with what the code is actually doing. However, sometimes - not always, maybe even not often, but sometimes - code needs comments to explain why it's doing whatever it's doing. Sure, you're incrementing Frobulator.Foo, and everybody is familiar with the Frobulator and everybody knows why Foo is important and anyone looking at the code can plainly see you're trying to increment it. But why are you incrementing it? Why are you incrementing it the way you're doing it in this case? What is the intent, separate from its execution? That's where comments can provide value.

As a side note (no pun intended), I hope we can all agree that doc comments are a separate beast entirely here. Doc comments provide meta data that can be used by source code analyzers, prediction/suggestion/auto-completion engines, API documentation generators, and the like; they provide value through some technical mechanism and are generally intended for reading somewhere else, not for reading them in the source code itself. Because of this I consider doc comments to be a completely separate entity, that just happen to be encoded in comment syntax.

My feelings on doc comments are mixed; generally speaking, I think they're an excellent tool and should be widely used to document any public API. However, there are few things in the world more frustrating that looking up the documentation for a method you don't understand, only to find that the doc comments are there but blank (probably generated or templated), or are there but so out of date that they're missing parameters or the types are wrong. This is the kind of thing that can have developers flipping desks at two in the morning when they're trying to get something done.

2013-12-18

Engineers, Hours, Perks, and Pride

This started as a Google+ post about an article on getting top engineering talent and got way too long, so I'm posting here instead.

I wholeheartedly agree that 18-hour days are just not sustainable. It might work for a brand-new startup cranking out an initial release, understaffed and desperate to be first to market. But, at that stage, you can expect the kind of passion and dedication from a small team to put in those hours and give up their lives to build something new.

Once you've built it, though, the hours become an issue, and the playpen becomes a nuisance. You can't expect people to work 18-hour days forever, or even 12-hour days. People far smarter than I have posited that the most productive time an intellectual worker can put in on a regular basis is 4 to 6 hours per day; after that, productivity and effectiveness plummet, and it only gets worse the longer it goes on.

Foosball isn't a magical sigil protecting engineers from burn-out. Paintball with your coworkers isn't a substitute for drinks with your friends or a night in with your family. An in-house chef sounds great on paper, until you realize that the only reason they'd need to provide breakfast and dinner is if you're expected to be there basically every waking moment of your day.

Burn-out isn't the only concern, either. Engineering is both an art and a science, and like any art, it requires inspiration. Inspiration, in turn, requires experience. The same experience, day-in, day-out - interacting with the same people, in the same place, doing the same things - leaves one's mind stale, devoid of inspiration. Developers get tunnel-vision, and stop bringing new ideas to the table, because they have no source for them. Thinking outside the box isn't possible if you haven't been outside the box in months.

Give your people free coffee. Give them lunch. Give them great benefits. Pay them well. Treat them with dignity and respect. Let them go home and have lives. Let them get the most out of their day, both at work and at home. You'll keep people longer, and those people will be more productive while they're there. And you'll attract more mature engineers, who are more likely to stick around rather than hopping to the next hip startup as soon as the mood strikes them.

There's a certain pride in being up until sunrise cranking out code. There's a certain macho attitude, a one-upmanship, a competition to see who worked the longest and got the least sleep and still came back the next morning. I worked from 8am until 4am yesterday, and I'm here, see how tough I am? It's the geek's equivalent to fitness nuts talking about their morning 10-mile run. The human ego balloons when given the opportunity to endure self-inflicted tortures.

But I'm inclined to prefer an engineer who takes pride in the output, not the struggle to achieve it. I want someone who is stoked that they achieved so much progress, and still left the office at four in the afternoon. Are they slackers, compared to the guy who stayed another twelve hours, glued to his desk? Not if the output is there. It's the product that matters, and if the product is good, and gets done in time, then I'd rather have the engineer that can get it done without killing themselves in the process.

"I did this really cool thing! I had to work late into the night, but caffeine is all I need to keep me going. I kept having to put in hacks and work-arounds, but the important thing is that it's done and it works. I'm a coding GOD!" Your typical young, proud engineer. They're proud of the battle, not the victory; they're proud of how difficult it was.

"I did this really cool thing! Because I had set myself up well with past code, it was a breeze. I was amazed how little time it took. I'm a coding GOD!" That's my kind of developer. That's pride I can agree with. They're proud because of how easy it was.

This might sound like an unfair comparison at first, but think about it. When you're on a 20-hour coding bender, you aren't writing your best code. You're frantically trying to write working code, because you're trying to get it done as fast as you can. Every cut corner, every hack, every workaround makes the next task take that much longer. Long hours breed technical debt, and technical debt slows development, and slow development demands longer hours. It's a vicious cycle that can be extremely difficult to escape, especially once it's been institutionalized and turned into a badge of honor.

2013-12-16

My Personal Project Workflow/Toolset

I do a lot of side projects, and my personal workflow and tooling is something that's constantly evolving. Right now, it looks something like this:

Prognosticator for tracking features/improvements, measuring the iceberg, and tracking progress
WorkFlowy for tracking non-development tasks (the most recent addition to the toolset)
Trac for project documentation, and theoretically for defect tracking, though I've not been good about entering defects in Trac recently; it doesn't seem worth the effort on a one-person project, though with multiple people I think it would be a must
Trello for cross-cutting all the above and indicating what's next/in progress/recently completed, and for quickly jotting down ideas/defects. Most of the defect tracking actually goes in here on one-man projects right now. This is a lot of duplication and the main source of waste in my current process.
Bitbucket for source control (I also use Atlassian's excellent SourceTree as a Git/Hg client.)

It's been working well for me, the only issue I have is duplication between the tools, and failing to consistently use Trac for defect tracking. What keeps me in Trello is how quick and easy it is to add items to it, and the fact that I'm using it as a catch-all - I can put a defect or an idea or a task into it in a couple of seconds; I just have to replicate it to the appropriate place later, which is the problem.

I think the issue boils down to being torn between having a centralized repository for "stuff to be done" (Trello) and having dedicated repositories catered to each type of thing to be done (Prognosticator, Trac, and WorkFlowy); and convenience. Trello is excellent for jotting something down quickly, but lacks the additional specific utility of the other tools for specific purposes.

I think what I'll end up doing is creating a "whiteboard" list in WorkFlowy, and using that instead of Trello to jot down quick notes when I don't have the time to use the individual tools; then I can copy from there to the other tools when I need to. That will allow me to cut Trello down to basically being a Kanban board.

2013-10-18

Pragmatic Prioritization

The typical release scheduling process works something like this:

Stakeholders build a backlog of features they'd like to see in the product eventually.
The stakeholders decide among themselves the relative priority of the features in the backlog.
The development team estimates the development time for each feature.
The stakeholders set a target feature list and ship date based on the priorities and estimates.

The problem here is primarily in step 2; this step tends to involve a lot of discussion bordering on arguing bordering on in-fighting. Priorities are set at best based on a sense of relative importance, at worst based on emotional attachment. Business value is a vague and nebulous consideration at most.

I propose a new way of looking at feature priorities:

Stakeholders build a backlog of features they'd like to see in the product eventually.
The stakeholders estimate the business value of each feature in the backlog.
The development team estimates the development time for each feature.
The stakeholders set a target feature list and ship date based on the projected return of each feature - i.e., the estimated business value divided by the estimated development time.

This turns a subjective assessment of relative priorities into an objective estimate of business value, which is used to determine a projected return on investment for each feature. This can then be used to objectively prioritize features and schedule releases.

I've been using this workflow recently for one of my upcoming projects, and I feel like it's helped me to more objectively determine feature priorities, and takes a lot of the fuzziness and hand-waving out of the equation.

Shameless self-promotion: Pragmatic prioritization is a feature of my project scheduling and estimation tool, Rogue Prognosticator.

2013-08-04

Assumptions and Unit Tests

I've written previously on assumptions and how they affect software development. Taking this as a foundation, the value proposition of unit testing becomes much more apparent: it offers a subjective reassurance that certain assumptions are valid. By mechanically testing components for correctness, you're allowing yourself the freedom to safely assume that code which passes its tests is highly unlikely to be the cause of an issue, so long as there is a test in place for the behavior you're using.

This can be a double-edged sword: it's important to remember that a passing test is not a guarantee. Tests are written by developers, and developers are fallible. Test cases may not exercise the behavior in precisely the same way as the code you're troubleshooting. Test cases may even be missing for the particular scenario you're operating under.

By offering a solid foundation of trustworthy assumptions, along with empirical proof as to their validity, you can eliminate many possible points of failure while troubleshooting, allowing you to focus on what remains. You must still take steps to verify that you do have test coverage for the situation you're looking at, in order to have confidence in the test results. If you find missing coverage, you can add a test to fill the gap; this will either pass, eliminating another possible point of failure, or it will fail, indicating a probable source of the issue.

Just don't take unit test results as gospel; tests must be maintained just like any other code, and just like any other code, they're capable of containing errors and oversights. Trust the results, but not the tests, and learn the difference between the two: results will reliably tell you whether the test you've written passed or failed. It is the test, however, that executes the code and judges passing or failing. The execution may not cover everything you need, and the judgement may be incorrect, or not checking all factors of the result.

2013-06-11

Teaching a Developer to Fish

I write a lot about development philosophy here, and very little about technique. There are reasons for this, and I'd like to explain.

In my experience, often what separates an easy problem from an intractable one is method and mindset. How you approach a problem tends to be more important than the implementation you end up devising to solve it.

Let's say you're given the task of designing a recommendation engine - people like you were interested in X, Y, and Z. Clearly this is an algorithmic problem, and a relatively difficult one at that. How do you solve it?

The algorithm itself isn't significant; as a developer, the algorithm is your output. The process you use to achieve the desired output is what determines how successful you'll be. I could talk about an algorithm I wrote, but that's giving a man a fish. I'd much rather teach a man to fish.

So how do you fish, as it were, for the perfect algorithm? You follow solid practices, you iterate, and you measure. That means you start with a couple of prototypes, you measure the results, you whittle down the candidate solutions until you have a best candidate, and then you refine it until it's as good as it can get. Then you deploy it to production, you continue to measure, and you continue to refine it. If you code well, you can A/B test multiple potential algorithms, in production, and compare the results.

How do you fish for a fix to a defect? You follow solid practices, you iterate, and you measure. You start by visual inspection, checking for code quality, and doing light refactoring to try to simplify the code and eliminate points of failure, to narrow down the possibilities. Often this alone will bring the root cause of the defect to the surface quickly, or even solve it outright. If it doesn't, you add logging, and you watch the results as you recreate the error, trying to recreate it in different ways, to assess the boundaries of the defect; if this is for an edge case, what exactly defines the "edge" that's affected? What happens during each step of execution when it happens? Which code is executing and which code isn't? What parameters are being passed around?

In my experience, logging tends to be a far more effective debugging tool than a step-wise debugger in most cases, and with a strong logging framework, you can leave your logging statements in place with negligible performance impact in production (with debug logging disabled), and with fine-grained controls to allow you to turn up verbosity for the code you're inspecting without turning all logging on and destroying the signal-to-noise ratio of your logging output.

You follow solid practices, you iterate, and you measure. If you use right process, with the right mindset, you'll wind up at the right solution.

That's why I tend to wax philosophical instead of writing about concrete solutions I've implemented. Chances are I wrote the solution to my problem, not your problem; and besides, I'd much rather teach a man to fish than give a man a fish.

2013-06-07

Code Patterns as Microevolution

Code patterns abide by survival of the fittest, within a gene pool of the code base. Patterns reproduce through repetition, sometimes with small mutations along the way. Patterns can even mate, after a fashion, by combining them, taking elements of each to form a new whole. This is the natural evolution of source code.

The first step to taming a code base is to realize the importance of assessing fitness and taking control over what patterns are permitted or encouraged to continue to reproduce. Code reviews are your opportunity to thin the herd, to cull the weak, and allow the strong to flourish.

Team meetings, internal discussions, training sessions, and learning investments are then your opportunity to improve both the quality of new patterns and mutations that emerge, as well as the group's ability to effectively manage the evolution of your source, to correctly identify the weak and the strong, and to have a lasting impact on the overall quality of the product.

If you think about it, the "broken windows" problem could also be viewed as bad genes being allowed to perpetuate. As the bad patterns continue to reproduce, their number grows, and so does their impact on the overall gene pool of your code. Given the opportunity, you want to do everything you can to make sure that it's the good code that's continuing to live on, not the bad.

Consider a new developer joining your project. A new developer will look to existing code as an example to learn from, and as a template for their own work on the project, perpetuating the "genes" already established. That being the case, it seems imperative that you make sure those genes are good ones.

They will also bring their own ideas and perspectives to the process, establishing new patterns and mutating existing ones, bringing new blood into the gene pool. This sort of cross-breeding is tremendously helpful to the overall health of the "code population" - but only if the new blood is healthy, which is why strong hiring practices are so critical.

2013-06-02

Building a Foundation

It's been said that pharmaceutical companies produce drugs for pennies per pill - except the first pill, which costs millions. Things aren't so different in the land of software development: the first usage of some new functionality might take hours, building the foundation and related pieces. But it could be re-used a hundred times trivially, and usually expanded or modified with little effort as well (assuming it was well-written to start with).

This is precisely what you should be aiming for: take the time to build a foundation that will turn complex tasks into trivial ones as you progress. This is the main purpose behind design concepts like the single responsibility principle, the Hollywood principle, encapsulation, DRY, and so on.

This isn't to be confused with big upfront design; in face, it's especially important to keep these concepts in mind in an agile process, where you're building the architecture as you go. It can be tempting to just hack together what you need at the moment. That's exactly what you should be doing for a prototype, but not for real development. For lasting functionality, you should assemble a foundation to support the functionality you're adding now, and similar functionality in the future.

It can be difficult to balance this against YAGNI - you don't want to build what you don't need, but you want to build what you do need in such a way that it will be reusable. You want to save yourself time in the future, without wasting time now.

To achieve a perfect balance would require an extraordinary fortune teller, of course. Experience will help you get better at determining what foundation will be helpful, though. The more experience you have and the more projects you work on, the better sense you'll have of what can be done now to help out future you.

2013-05-29

My Take on "Collective Ownership"/"Everyone is an Architect"

I love the idea of "collective ownership" in a development project. I love the idea that in a development team, "everyone is an architect". My problem is with the cut-and-dried "Agile" definition of these concepts.

What I've been reading lately is a definition of "collective ownership" that revolves around the idea of distributing responsibility, primarily in order to lift the focus on finger-pointing and blaming. A defect isn't "your fault", it's "our fault", and "we need to fix it." That's all well and good, but distributing blame isn't exactly distributing ownership; and ignoring the source of an issue is a blatant mistake.

The latter point first: identifying the source of an issue is important. I see no need for blame, or calling people out, and certainly no point in trying to use defects as a hard metric in performance analysis. However, a development team isn't a factory; it's a group of individuals who are constantly continuing their education, and honing their craft, and in that endeavor they need the help of their peers and managers to identify their weaknesses so they know what to focus on. "Finding the source of an issue" isn't about placing blame or reprimanding someone, it's about providing a learning opportunity so that a team member can improve, and the team as a whole can improve through the continuing education of each member.

In regard to distributing ownership, it's all too rare to see discussion of distributing ownership in a positive way. I see plenty of people writing about eliminating blame, but very few speaking of a team wherein every member looks at the entire code base and says "I wrote that." And why should they? They didn't write it alone, so they can't make that claim. For the product, they can say "I had a hand in that," surely. But it's unlikely they feel like they had a hand in the development of every component.

That brings us around to the idea that "everyone is an architect." In the Agile sense, this is generally taken to mean that every developer is given relatively free rein to architect the component they're working on at any given moment, without bowing down to The Architect for their product. I like this idea, in a sense - I'm all for every developer doing their own prototyping, their own architecture, learning their own lessons, and writing their own code. Up to a point.

There is a level of architecture that it is necessary for the entire team to agree on. This is where many teams, even Agile teams, tend to fall back on The Architect to keep track of The Big Picture and ensure that All The Pieces Fit Together. This is clearly the opposite of "everyone is an architect". So where's the middle ground?

If a project requires some level of architecture that everyone has to agree on - language, platform, database, ORM, package structure, whatever applies to a given situation - then the only way to have everyone be an architect is design by committee. Panning design by committee has become a cliche at this point, but it has its uses, and I feel this is one of them.

In order to achieve collective ownership, you must have everyone be an architect. In order for everyone to be an architect, and feel like they gave their input into The Product as a whole - or at least had the opportunity to do so - you must make architectural decisions into group discussions. People won't always agree, and that's where the project manager comes in; as a not-an-architect, they should have no bias and no vested interest in what choices are made, only that some decision is made on each issue that requires consideration. Their only job in architectural discussions is to help the group reach a consensus or, barring that, a firm decision.

This is where things too often break down. A senior developer or two, or maybe a project manager with development experience, become de facto architects. They make the calls and pass down their decrees, and quickly everyone learns that if they have an architecture question, they shouldn't try to make their own decision, they shouldn't pose it to the group in a meeting, they should ask The Guy, the architect-pro-tem. Stand-up meetings turn into a doldrum of pointless status updates, and discussion of architecture is left out entirely.

Luckily, every team member can change this. Rather than asking The Guy when a key decision comes up, ask The Group. Even better, throw together a prototype, get some research together, and bring some options with pros and cons to the next stand-up meeting. Every developer can do their part to keep the team involved in architecture, and in ownership, and to slowly shift the culture from having The Architect to having Everyone Is An Architect.

2013-05-10

The Importance of Logging

Add more logging. I'm serious.

Logging is what separates an impossible bug report from an easy one. Logging lets you replace comments with functionality. I'd even go so far as to say good logging separates good developers from great ones.

Try this: replace your inline comments with equivalent logging statements. Run your program and tail the log file. Suddenly, you don't need a step wise debugger for the vast majority of situations, because you can see, in the log, exactly what the program is doing, what execution path it's taking, where in the source where each logging statement is coming from, and where execution stopped in the event of a crash.

My general development process focuses on clean, readable, maintainable, refactorable, self-documenting code. The process is roughly like this:

Block out the overall process, step by step, in comments.
Any complex step (more than five or ten lines of code), replace the comment with a clearly-named method or function call, and create a stub method/function.
Replace comments with equivalent logging statements.
Implement functionality.

Give all functions, methods, classes, parameters, properties, and variables clear, concise names, so that the code ends up in some semblance of readable English.
Use thorough sanity checking, by means of assertions or simple if blocks. When using if blocks, include logging for any failed checks, including what was expected and what was found. These should be warnings.
Include logging in any error/exception handling code. These should be errors if recoverable, or fatal if not. This is all too often the only logging a developer includes!

Replace inline comments with equivalent logging statements. These should be debug or info/trace level; major section starts should be higher level, while mid-process statements should be lower level.
Add logging statements to the start of each method/function. These should also be debug or info/trace level. Use higher-level logging statements for higher-level procedures, and lower-level logging statements for more deeply-nested calls.
For long-running or resource-intensive processes, particularly long loops, add logging statements at regular intervals to provide progress and resource utilization details.

Make good use of logging levels! Production systems should only output warnings and higher by default, but it should always be possible to enable deeper logging in order to troubleshoot any issues that arise. However, keep the defaults in mind, and ensure that any logging you have in place to catch defects will provide enough information in the production logs to at least begin an investigation.

Your logging messages should be crafted with dual purpose in mind: first, to provide useful, meaningful outputs to the log files during execution (obviously), but also to provide useful, meaningful information to a developer reading the source - i.e., the same purpose served by comments. After a short time with this method you'll find it's very easy to craft a message that serves both purposes well.

Good logging is especially useful in an agile environment employing fast iteration and/or continuous integration. It may not be obvious why at first, but all the advantages of good logging (self-documenting code, ease of maintenance, transparency in execution) do a lot to facilitate agile development by making code easier to work with and easier to troubleshoot.

But wait, there's more! Good logging also makes it a lot easier for new developers to get up to speed on a project. Instead of slogging through code, developers can execute the program with full logging, and see exactly how it runs. They can then review the source code, using the logging statements as waypoints, to see exactly how the code relates to the execution.

If you need a tool for tailing log files, allow me a shameless plug: try out my free log monitor, Rogue Informant. It's been in development for several years now, it's stable, it's cross-platform, and it's completely free to use privately or commercially. It allows you to monitor multiple logs at once, filter and search logs, and float a log monitoring window on top of other applications, to make it easier to watch the log while using the program to see exactly what's going on behind the scenes.Give it a try, and if you find any issues or have feature suggestions, feel free to let me know!

2013-05-06

Sanity Checks: Assumptions and Expectations

Assertions and unit tests are all well and good, but they're too narrow-minded in my eyes. Unit tests are great for, well, testing small units of code to ensure they meet the basic requirements of a software contract - maybe a couple of typical cases, a couple of edge cases, and then additional cases as bugs arise and new test cases are created for them. No matter how many cases you create, however, you'll never have a test case for every possible scenario.

Assertions are excellent for testing in-situ; you can ensure that unacceptable values aren't given to or by a piece of code, even in production (though there is a performance penalty to enabling assertions in production, of course.) I think assertions are excellent, but not specific enough: any assertion that fails is automatically a fatal error, which is great, unless it's not really a fatal error.

That's where the concept of assumptions and expectations come in. What assertions and unit tests really do is test assumptions and expectations. A unit test says "does this code behave correctly when given this data, all assumptions considered?" An assertion says "this code assumes this thing, and will not behave correctly if it gets another, so throw an error."

When documenting an API, it's important to document assumptions and expectations, so users of the API know how to work with your code. Before I go any further, let me define what I mean by these very similar terms: to me, code that assumes something operates as if its assumptions are correct, and will likely fail if its assumptions turn out to be incorrect. Code that expects something operates as if its expectations are met, but will likely still operate correctly even if they aren't. It's not guaranteed to work, or guaranteed to fail; it's likely to work, but someone should probably know about it and look into it.

Therein lies the rub: these are basically two types of assertions, one fatal, one not. What we need is an assertion framework that allows for warning-level assertion failures. What's more, we need an assertion framework that is performant enough to be regularly enabled in production.

So, any code that's happily humming along in production, that says:

Assume.that(percentage).isBetween(0,100);

will fail immediately if percentage is outside those bounds. It's assuming that percentage is between zero or one hundred, and if it assumes wrong, it will likely fail. Since it's always better to fail fast, any case where percentage is outside that range should trigger a fatal error - preferably even if it's running in production.

On the other hand, code that says:

Expect.that(numRows).isLessThan(1000);

will trigger a warning if numRows is over a thousand. It expects numRows to be under a thousand; if it isn't, it can still complete correctly, but it may take longer than normal, or use more memory than normal, or it may simply be that if it got more rows than that, something may be amiss with the query that got the rows or the dataset the rows came from originally. It's not a critical failure, but it's cause for investigation.

Any assumption or expectation that fails should of course be automatically and immediately reported to the development team for investigation. Naturally a failed assumption, being fatal, should take priority over a failed expectation, which is recoverable.

This not only provides greater flexibility than a simple assertion framework, it also provides more explicit self-documenting code.

2013-05-03

Be Maxwell's Demon

Source code tends to follow the second law of thermodynamics, with some small differences. In software, as in thermodynamics, systems tend toward entropy: as you continue to develop an application, the source will increase in complexity. In software, as well as in thermodynamics, connected systems tend toward equilibrium: in development, this is known as the "broken windows" theory, and is generally considered to mean that bad code begets bad code. People often discount the fact that good code also begets good code, but this effect is often hidden by the fact that the overall system, as mentioned earlier, tends toward entropy. That means that the effect of broken windows is magnified, and the effect of good examples is diminished.

In thermodynamics, Maxwell's Demon thought experiment is, in reality, impossible - it is purely a thought experiment. However, in software development, we're in luck: any developer can play the demon, and should, at every available opportunity.

Maxwell's demon stands between two connected systems, defeating the second law of thermodynamics by selectively allowing less-energetic particles through only in one direction, and more-energetic particles through only in the other direction, causing the two systems to tend toward opposite ends of the spectrum, rather than naturally tending toward entropy.

By doing peer reviews, you're doing exactly that; you're reducing the natural entropy in the system and preventing it from reaching its natural equilibrium by only letting the good code through, and keeping the bad code out. Over time, rather than tending toward a system where all code is average, you tend toward a system where all code is at the lowest end of the entropic spectrum.

Refactoring serves a similar, but more active role; rather than simply "only letting the good code through", you're actively seeking out the worse code and bringing it to a level that makes it acceptable to the demon. In effect, you're reducing the overall entropy of the system.

If you combine these two effects, you can achieve clean, efficient, effective source. If your review process only allows code through that is as good or better than the average, and your refactoring process is constantly improving the average, then your final code will, over time, tend toward excellence.

Without a demon, any project will be on a continuous slide toward greater and greater entropy. If you're on a development project, and it doesn't have a demon, it needs one. Why not you?

2013-04-24

Real Sprints

Agile methodologies talk about "sprints" - workloads organized into one to four week blocks. You schedule tasks for each sprint, you endeavour to complete all of it by the end of the sprint, then you look back and see how close your expectations (schedule) were to reality (what actually got done).

Wait, wait, back up. When I think of a sprint, I think short and fast. That's what sprinting means. You can't sprint for a month straight; you'll die. That's a marathon, not a sprint.

There are numerous coding competitions out there. Generally, you get around 48 hours, give or take, to build an entire, working, functional game or application. Think about that. You get two days to build a complete piece of software from scratch. Now that's what I call sprinting.

Of course, a 48 hour push is a lot to ask for on a regular basis; sure, your application isn't in a competition, this is the real world, and you need to get real work done on an ongoing basis. You can't expect your developers to camp out in sleeping bags under their desks. But that doesn't mean turning a sprint into a marathon.

The key is instilling urgency, while moderating burnout. This is entirely achievable, and can even make development more fun and engaging for the whole team.Since the term sprint has already been thoroughly corrupted, I'll use the term "dash". Consider this weekly schedule:

Monday: Demo last week's accomplishments for stakeholders, and plan this week's dash. This is a good week to schedule any unavoidable meetings.
Tuesday and Wednesday: your 48 hours to get it done and working. These are crunch days, and they will probably be pretty exhausting. These don't need to be 18-hour days, but 10 hours wouldn't be unreasonable. Let people get in the zone and stay there as long as they can.
Thursday: Refactoring and peer reviews. After a run, athletes don't just take a seat and rest; they slow to a jog, then a walk. They stretch. The cool off slowly. Developers, as mental athletes, should do the same.
Friday: Testing. QA goes through the application with a fine-toothed comb. The developers are browsing the web, playing games, reading books, propping their feet up, and generally being lazy bums, with one exception: they're available at a moment's notice if a QA has any questions or finds any issues. Friday is a good day for your development book club to meet.
By the end of the week, your application should be ready again for Monday's demo, and by Tuesday, everyone should be well-rested and ready for the next dash.

Ouch. That's a tough sell. The developers are only going to spend two days a week implementing features? And one basically slacking off? Balderdash! Poppycock!

Think about it, though. Developers aren't factory workers; they can't churn out X lines of code per hour, 40 hours per week. That's not how it works. A really talented developer might achieve 5 or 6 truly productive hours per day, but at that rate, they'll rapidly burn out. 4 hours a day might be sustainable for longer. Now, mind you, in those four hours a day, they'll get more done, better, with fewer defects, than an army of incompetent developers could do in a whole week. But the point stands: you can't run your brain at maximum capacity eight hours straight, five days a week. You just can't - not for long, anyway.

The solution is to plan to push yourself, and to plan to relax, and to keep the cycle going to maximize the effectiveness of those productive hours. It's also crucial not to discount refactoring as not being productive; it sets up the following weeks' work, and reduces the effort required to get the rest of the development done for the rest of the life of the application. It's a critical investment in the future.

Spending a third of your development time on refactoring may seem excessive, and if it were that simple, I'd agree. But if you really push yourself for two days, you can get a lot done - and write a lot of code to be reviewed and refactored. In one day of refactoring, you can learn a lot, get important work done, and still start to cool off from the big dash.

That lazy Friday really lets you relax, improve your craft, and get your product ready for next week, when you get to do it all over again.

Zen Templates Development Journal, Part 2

Having my concept complete (see Part 0) and my simple test case working (see Part 1), I was ready to start on my moderate-complexity test case. This would use more features than the simple test case, and more pages. I really didn't want to have to build a complete site just for the proof of concept, so I decided to use an existing site, and I happened to have one handy: rogue-technologies.com.

The site is currently built in HTML5, using server-side includes for all of the content that remains the same between pages. It seemed like a pretty straightforward process to convert this to my template engine, so I got to work: I started with one page (the home page), and turned it into the master template. I took all of the include directives and replaced them with the content they were including. I replaced all of the variable references with model references using injection or substitution. I ID'd all the elements in the master template that would need to be replaced by child templates. I then made another copy of the homepage, and set it up to derive from the master template.

I didn't want to convert the site to use servlets, since it wasn't really a dynamic site; I just wanted to be able to generate usable HTML from my template files. So I created a new class that would walk a directory, parse the templates, and write the output to files in an output directory. Initially, it set up the model data statically by hand at the start of execution.

All was well, but I needed a way for the child template to add elements to the page, rather than just replacing elements from the parent template. I came up with the idea of appending elements, using a data attribute data-z-append="before:" or "after:", to cause an element to be appended to the page either before or after the element from the parent with the specified ID. This worked perfectly, allowing me to add the Google Webmaster Tools meta tag to the homepage.

With this done, I set to work converting the remaining pages. Most of the pages were pretty straightforward, being handled just like the homepage; I dumped the SSI directives, added some appropriate IDs and classes, and all was well. However, the software pages presented some challenges. For one thing, they used a different footer than the rest of the site. It was time to put nested derivation to the test.

I created a software page template, which derived from the master template, that appended the additional footer content. I then had the software pages derive from this template, instead of deriving from the master template and - by some stroke of luck - it worked perfectly on the first try. I wasn't out of the woods yet, though.

The software pages also used SSI directives to dynamically insert the file size for downloadable files next to the links to download them. I wasn't going to reimplement this functionality, however, I was prepared to replace these directives with file size data stored in the model. But I wanted to keep the model data organized, so I needed to support nesting. The software pages also used include directives to include a Google+ widget on the pages; this couldn't be added to the template, as it was embedded in the body content, so it seemed like a perfect case for snippets - which meant I needed to implement snippet support.

Snippet support was pretty easy - find the data attribute, look up the snippet file, parse it as an HTML fragment, and replace the placeholder element with the snippet. Easy to implement, worked pretty well.

Nested properties I thought would be a breeze, as I had assumed it was natively supported by StrSubstitutor. Unfortunately it wasn't, so I had to write my own StrLookup. I decided that, since I was already doing some complex property lookups for injection, I'd build a unified model lookup class that could act as my StrLookup and could be used elsewhere. I wanted nested scope support as well, for my project list: each project had an entry in the model, that consisted of a name, latest version, etc. I wanted the engine to iterate this list, and for each entry, rather than replacing the entire content of the element with the text value of the model entry, I wanted it to find sub-elements and replace each with an appropriate property of the model entry. This meant I needed nested scoping support.

I implemented this using a scope stack and a recursive lookup. Basically, every time a nested scope was entered (e.g., content injection using an object or map, or iteration over a list), I would push the current scope onto the stack. When the nested scope was exited (i.e., the end of the element that added the scope), I popped the scope off. When iterating a loop, at the start of the iteration, I'd push the current index, and at the end of the iteration, I'd pop it back off.

This turned out to be very complex to implement, but after some trial and error, I got it working correctly. I then re-tested against my simple test case, having to fix a couple of minor defects introduced there with the new changes. But, at last, both my simple and moderate test cases were working.

I didn't like the static creation of model data - not very flexible at all - so I decided to swap it out with JSON processing. This introduced a couple of minor bugs, but it wasn't all that difficult to get it all working. The main downside was that it added several additional dependencies, and dependency management was getting more difficult. I wasn't too concerned on that front though, since I was already planning for the real product to use Maven for dependency tracking; I was just beginning to wish I had used Maven for the prototype as well. Oh well, a lesson for next time. For now, I was ready for my complex test case - I just had to decide what to use.

2013-04-22

Zen Templates Development Journal, Part 0

Zen Templates is based on an idea I've been tossing around for about six months. It started with a frustration that there was no way to validate a page written in PHP or JSP as valid HTML without executing it to get the output. It seemed like there had to be a way to accomplish that.

I started out looking into what I knew were global attributes, class and id. I did some research and found that the standard allows any character in a class or id; this includes parens and such, meaning a functional syntax could be used in these attributes, which a parser could then process to render the template.

This seemed practically ideal; I could inject content directly into the document, identifying the injection targets using these custom classes. I toyed with the idea of using this exclusively, but saw a couple of serious shortcomings. For one, sometimes you want to insert dynamic data into element attributes, and I didn't see a good way to handle that without allowing a substitution syntax like that of JSP or ASP. I decided this would be a requirement to do any real work with it.

I also saw the problem of templates. Often each page in a dynamic site is called a template, but I'm referring to the global templates that all pages on a site share, so there is only one place to update the global footer, for example. I had no good solution for this. I started thinking about the idea of each page being a template and sharing a global template - much akin to subclasses in object oriented programming, one template derives from another.

I started batting around different possibilities for deriving one template from another, and decided on having a function (in a class attribute) to identify the template being derived from, with hooks in the parent template to indicate which parts the "subtemplate" would be expected/allowed to "override".

I let the idea percolate for a while - a few weeks - as other things in life kept me too busy to work on it. Eventually it occurred to me that all these special functions in class attributes were pretty messy, and a poor abstraction for designers. It could potentially interfere with styling. It would produce ugly code. And I was on a semantic markup kick, and it seemed like a perfect opportunity to do something useful with semantic markup.

So I started rebuilding the concept and the current Zen Templates was born (and the name changed from its original, Tabula Obscura.) As I committed to maximizing the use of semantic markup and keeping template files as valid, usable HTML, I reworked old ideas and everything started falling into place. I remembered that the new HTML5 data attributes are global attributes as well, and would give me an additional option for adding data to markup without interfering with classes or ruining the semantics of the document.

I ironed out all the details of how derivation should work; it made semantic sense that a page that derived from another page could be identified by class; and, taking a page from OOP's book, it made sense that an element in the subpage with the same ID as an element in the parent page would override that element, making any element with an ID automatically a hook; somewhat like a subclass overriding methods in the superclass by defining a method with the same signature.

I sorted out the details of content Injection as well, thinking that, semantically, it made sense that an element of a certain class would accept data from the model with an identifier matching the class name. Even better, I didn't need a looping syntax; if you try to inject a list of items into an element, it would simply repeat the element for each item in the list. This simplified a lot of syntax I've had to use in the past using JSP or Smarty.

I also wrote out how substitution should work, using a syntax derived from JSP. Leaning on JSP allowed me to answer a lot of little questions easily. I would try to avoid the use of functions in the substitution syntax, because it does make the document messier, and forces more programming tasks on the designer. I conceded that some functions would likely be unavoidable.

When I felt like I had most of the details ironed out, a guiding principal in mind, and a set of rules of thumb to help guide me through questions down the road, I was ready for a prototype. Stay tuned for Part 1!

Zen Templates Development Journal, Part 1

Once my concept was well documented (see Part 0), I was ready to start developing my prototype. I had many questions I needed to answer:

Is the concept feasible, useful, and superior in some meaningful way to existing solutions?
What kind of performance could I expect?
How would derivation work in real-world scenarios? What shortcomings are there in the simple system described in my concept?
Ditto for content injection and substitution.
How would I handle model data scoping?
Would it be better to parse the template file into a DOM Document, or to parse it as a stream?

I started with an extremely simple use case: two templates, one deriving from the other; a couple of model data fields, one of them a list; use of basic derivation, injection, and substitution, with no scope concerns. I built the template files and dummy model data, such that I could quickly tell what was working and what wasn't ("this text is from the parent template", "this text is from the child template", "this text shouldn't appear in the final output", etc.) I also build a dead-simple servlet that did nothing but build the model, run the template renderer, and dump the output to the HttpServletResponse.

With this most basic use case in place, I started to work on the template renderer. I started with the state, initialization, and entry point. For the state, I knew I needed a reference to the template file, and I needed a Map for the model data. For initialization, I needed to take in a template file, and initialize the empty model. For convenience, I allowed initialization with a relative file path and a ServletContext, to allow referencing template files located under WEB-INF, so that they couldn't be accessed directly (a good practice borrowed from JSP.) I created accessors for adding data to the model.

The entry point was a function simply named "render". It was 5 lines long, each line calling an unimplemented protected method: loadTemplate, handleDerivation, handleInjection, handleSubstitution, and writeOut. These were the five steps needed for my basic use case.

I then went to work on building out each of the steps. The easiest was loading the template file from disk into a DOM Document using Jsoup (since XML handlers don't deal well with HTML content). At this point I added two Documents to the renderer's state, inDoc and outDoc. inDoc was the Document parsed from the template file, outDoc was the Document in memory being prepared for output. I followed a basic Applications Hungarian Notation, prefixing references to the input document with "in" and references to the output document with "out".

Since I needed to be able to execute derivation recursively, I decided to do it by creating a new renderer, passing it the parent template, and running only the loadTemplate and handleDerivation methods; then the parent renderer's outDoc became the child's starting outDoc. In this way, if the parent derived from another template, the nested derivation would happen automagically. I would then scan the parent document for ID's that matched elements in the child document, and replace them accordingly. Derivation was done.

Next up was injection: I started out by iterating over the keys in my model Map, scanning the document for matching class names. Where I found them, I simply replaced the innerHtml of the found element with the toString() value of the model data; if the model data was an array or collection, I would instead iterate the data, duplicating the matched element for each value, and replacing the cloned element's innerHtml with the list item's toString() value. This was enough functionality for my simple test case.

Reaching the home stretch, I did substitution ridiculously simply, using a very basic regex to find substitution placeholders (${somevariable}) and replacing each with the appropriate value from the model. I knew this solution wouldn't last, but it was enough for this early prototype.

Last up was writing the rendered output, and in this case, I allowed passing in an HttpServletResponse to write to. I would set the content type of the response, and dump the HTML of my final Document to the response stream.

I ran it, and somehow, it actually worked. I was shocked, but excited: in the course of a little over an hour, I had a fully working prototype of the most basic functions of my template engine. Not a complete or usable product by any means, but an excellent sign. I made a few tweaks here and there, correcting some minor issues (collection items were being inserted in reverse order, for example), but it was pretty much solid. I also replaced my RegEx-based substitution mechanism with the StrSubstitutor from commons-lang; this was pretty much a direct swap that worked perfectly.

Time for the next test, my moderate complexity test case.

The Development Stream

I was reading today about GitHub's use of chat bots to handle releases and continuous integration, and I think this is absolutely brilliant. In fact, it occurs to me that using a chat bot, or a set of chat bots, can provide an extremely effective workflow for any continuous-deployment project. Of course, it doesn't necessarily have to be a chat room with chat bots; it can be any sort of stream that can be updated in real-time - it could be a Twitter feed, or a web page, or anything. The sort of setup I envision would work something like this:

Everyone on the engineering team - developers, testers, managers, the whole lot - stay signed in to the stream as long as they're "on duty". Every time code is committed to a global branch - that is, a general-use preproduction or production branch - it shows up in the stream. Then the automated integration tests run, and the results are output to the stream. The commit is deployed to the appropriate environment, and the deployment status is output to the stream. Any issues that occur after deployment are output to the stream as well, for immediate investigation; this includes logged errors, crashes, alerts, assertion failures, and so on. Any time a QA opens a defect against a branch, the ticket summary is output to the stream. The stream history (if it's not already compiled from some set of persistent-storage sources) should be logged and archived for a period of time, maybe 7 to 30 days.

It's very important that the stream be as sparse as possible: no full stack traces with error messages, no full commit messages, just enough information to keep developers informed of what they will need to look into further elsewhere. This sort of live, real-time information stream is crucial in the success of any continuous-deployment environment, in order to keep the whole team abreast of any issues that might be introduced into production, along with when and how they were introduced.

Now, what I've described is a read-only stream: you can't do anything with it. GitHub's system of using an IRC bot allows them to issue commands to the bot to trigger deployments and the like. That could be part of the stream, or it could be part of another tool; as long as the deployment, and its results, are output to the shared stream for all to see. This is part of having the operational awareness necessary to quickly identify and fix issues, and to maintain maximum uptime.

There are a lot of possible solutions for this sort of thing; Campfire looks particularly promising because of its integration with other tools for aggregating instrumentation data. If you have experience with this sort of setup, please post in the comments, I'd love to hear about it!

2013-04-17

Truly Agile Software Development

Truly agile software development has to, by nature, allow for experimentation. In order to quickly assess the best option among a number of choices, the most effective method is empirical evidence: build a proof of concept for each option and use the experience of creating the proof, as well as the results, to determine which option is the best for the given situation.

While unit tests are valuable for regression testing, a test harness that supports progression testing is at least as useful. Agile development methodologies tend to focus on the idea of iterating continuously toward a goal along a known path; but what happens when there's a fork in the road? Is it up to the architect to choose a path? There's no reason to do so when you can take both roads and decide afterward which you prefer.

Any large development project should always start with a proof of concept: a bare-bones, quick-and-dirty working implementation of the key functionality using the proposed backing technologies. It doesn't need to be pretty, or scaleable, or extensible, or even maintainable. It just has to work.

Write it, demo it, document what you've learned, and then throw the code away. Then you can write the real thing.

It may seem like a waste of time and effort at first. You'll be tempted to over-engineer, you'll be tempted to refactor, you'll be tempted to keep some or all of the code. Resist the urge.

Why would you do such a thing? If you're practicing agile development, you might think your regular development is fast enough that you don't need a proof. But that's not the point; the point is to learn as much as you can about what you're proposing to do before you go all-in and build an architecture that doesn't fit and that will be a pain to refactor later.

Even if it takes you longer to build the proof,it's still worth it - for one thing, it probably took longer because of the learning curve and mistakes made along the way that can be avoided in the final version, and second because again, you've learned what you really need and how the architecture should work so that when you make the production version you can do it right the first time, with greater awareness of the situation.

This approach allows much greater confidence in the solutions chosen, requiring less abstraction to be built in to the application, which allows for leaner, cleaner code, and in less time. Add to that the value of building a framework that is flexible enough to allow for progression testing, and you've got the kind of flexibility that Agile is really all about.

Note: Yes, I understand that Scrum calls prototypes "spikes". I think this is rather silly - there are already terms for prototypes, namely, "prototype" or "proof of concept". I'm all for new terms for things that don't have names, but giving new names to things that already have well-known names just seems unnecessary.

2013-04-09

HTML5 Grid Layouts

I have to take issue with the swarm of "responsive grid layout" systems that have been cropping up lately. Yes, they're great for wireframes and prototypes. No argument there. And yes, they take care of a lot of the legwork involved in producing a responsive layout. Great. But in the process, they throw semantic markup and separation of concerns out the window.

The idea of semantic markup is that your document structure, IDs, and classes should describe the content of the document. Separation of concerns, in HTML and CSS, means using classes and IDs to identify what something is (not how it should appear), and using CSS to identify content and determine how it should appear; this allows you to change content without having to change appearance, and vice versa: the concerns of document structure and appearance are kept separate.

That means, as far as I'm concerned, as soon as you put a 'class="two column"' into your HTML, you've lost the game. You've chained you structure to your presentation. Can you change your presentation without modifying the markup? Not any more. All we've achieved in this is bringing back the days of nested tables for layout, with a pretty CSS face on it. With one dose of "clever" we've traveled back in time 15 years. Only this time, there *are* other ways to do it. There's no excuse. It's just plain laziness.

Is building a truly semantic, responsive, attractive layout possible? Absolutely. Difficult? Yes. Is it worth the effort? In the long run, I think it is - except for those cases mentioned above, prototypes and wireframes, code that's meant to be disposable. But any code that has to be maintained in the future will be hamstrung by these systems.

Web development has made tremendous strides over the last 10 years. It's amazing how far we've come in terms of what can be done and how. Don't take all those advances and use them to regress all the way back to clunky old table-based layouts. Try using them to do something new, and interesting instead. There's no reason the idea of software craftsmanship should be missing from the web design world.

Pages