Using IntelliJ IDEA

I’ve started using Intellij IDEA as an IDE. I’ve always heard good things about it but stuck with Eclipse because it’s pretty good, and the standard used by most clients I’ve worked with.

What got me to try it out was the Groovy/Grails support. I’ve been doing some work with Grails and had heard they had a good plugin. After working with it a while, the Groovy support is the best out of the three major Java IDEs (Eclipse, NetBeans, IDEA). The Maven 2 support is great too.. It’s by far the best support for Maven 2 in any IDE. There are some features I would still like them to add (such as a Maven repository index/search feature), but the important things work well (handling Maven dependencies).

In general, I find myself having to fiddle with the UI less than I do with Eclipse (configuring and switching perspectives, resizing panes, etc).

The other plugin I was really interested in was the Mercurial IDEA plugin. This plugin is still early in the development cycle. Unfortunately it gave me an error when starting up – it was compiled for JDK 1.6 and I’m stuck with 1.5 on my Mac. I imported the MercurialIdea plugin source into Idea, reset it to use the Idea platform runtime, and rebuilt the plugin without success. No biggie, the Mercurial command line is easy enough to use.

Posted in Software Development, Tools | Leave a comment

OpenID, Yahoo, and other news

Recent OpenID developments…

At the end of this month, Yahoo will be supporting OpenID. Any Yahoo user that chooses to enable the feature will be able to use their Yahoo ID on sites that support OpenID. The announcement doesn’t mention Yahoo accepting OpenIDs from other providers, so this is probably only one-way for now. So put together the population of Yahoo users which are now getting OpenID support, and AOL users, which have had it for a while, and you have a pretty large chunk of the web population that have OpenIDs (once they enable them anyways). Now we need more websites to accept OpenIDs from other providers.

I’m waiting for Acegi to support OpenID so I can build support into a webapp I’m working on. Looks like there’s been some recent progress on that front.

Another OpenID related, if a bit old, tidbit – MyOpenId now supports Information Card as a means to authenticate with MyOpenID. So when prompted to authenticate to MyOpenID, you can present an Information Card and authenticate without using a username or password at all.

Posted in Tools, Web | Leave a comment

Handy Java utility: jarexplorer

Wanted to give kudos to a handy little Java utility I started using: jarexplorer. It’s a Swing-based app that indexes the contents of jar files found in an entire directory structure, and lets you search for classes and files within those jar files. It’s blazingly fast and also provides viewers to look into classes, text files, and images. Double click on the search results and the viewer comes up. Nice to see tools that do one thing and do it well.

Posted in Java, Tools | Leave a comment

Using Mercurial

As I posted earlier, I’ve been looking at some of the decentralized version control systems and started using Mercurial. There are several decentralized VCSs, including Bazaar, GIT, Mercurial, and others. Out of the decentralized VCSs, Mercurial and GIT seemed to have the greatest momentum and adoption rate. GIT is being used by the Linux Kernel, Wine project, and others. Mercurial is being adopted by OpenJDK, OpenSolaris, Mozilla, among others. Having wide adoption and momentum are important to me because I want to use something that’ll be around several years from now and will be supported by GUI tools, build and continuos integration tools, and IDEs.

After looking into both, I ended up going with Mercurial over GIT. Both Mercurial and GIT seem to fit my needs; decentralized, fast, and flexible, but GIT didn’t seem as well supported on non-Linux platforms and the Java IDE plugins are further behind. Mercurial has binaries available for Mac, Windows, and of course Linux. On the Mac and Windows side, there are a couple different options for how to install Mercurial; using a ports packaging system such as Cygwin on Windows or MacPorts on Mac, or installing natively on the OS. The Eclipse IDE plugin is still early in development, but it sounds like the NetBeans plugin is pretty full featured.

When working with Mercurial, everything is a copy of the repository, including branches and the local repository that developers work out of. Each repository is as fully capable as any other and includes all of the change history. It sounds inefficient to have a full copy of the repository on each workstation, but it turns out that Mercurial repositories are usually smaller than a Subversion working directory. This is partly because Mercurial stores changes to files in a very efficient manner, and partly because Subversion stores a significant amount of extra data in a working directory in order to avoid accessing the server for some operations.

In Mercurial, changes to files are committed to a local repository as a unit of work, called a change set. The change sets can then be pushed or pulled from other repositories in order to share work with others. It’s then up to you how you organize your work and manage the exchange of change sets between repositories. You can have a single master repository that everyone pushes/pulls from, you can have multiple masters for better performance and reliability, you can have a hierarchical model where committers receive change sets from the community, then pass it up to module owners, who pass them up to release owners (from what I’ve heard the Linux Kernel is organized in this way), or even have no master and have members of the team exchange change sets directly with each others’ repositories (this sounds like too much work though).

Most of all, I was impressed with Mercurial’s attention to details in their design decisions:

  • Mercurial never makes updates to a revision log of a file, just appends to it. This minimizes the opportunities for data getting corrupted; if you never update or delete data, there is very little chance of it being accidentally corrupted beyond repair.
  • Mercurial stores the diffs to files (text and binary) rather than storing the full copy of a file for every revision. It then “replays” the diffs in order to reconstruct the full copy of a revision of a file when needed. This is much more efficient than it may seem, but after a large number of changes it can become expensive. To compensate for this, Mercurial stores a snapshot copy of a file revision for fast access when the chain of diffs becomes too long.
  • Minimize locking – read and update operations have been ordered in such a manner that locking isn’t necessary for most commands. Reads and clone/pull operations don’t lock the repository at all. Mercurial uses locks to ensure only one process writes to a repository at a time, but read and clone/pull operations are not affected by this.
  • Minimizing filesystem seek operations – Seek operations are relatively expensive, so Mercurial has been designed to minimize this. This is one of the reasons that Mercurial keeps repository metadata in a single directory (.hg at the root) rather than stored with each directory as Subversion does (.svn directories) – having separate metadata directories would require a seek for every directory.

I’ll follow up later with a post on setting up & working with Mercurial.

Posted in Software Development, Tools | Leave a comment

pyAntTasks moved to Mercurial

I’ve started using the Mercurial version control system lately and switched over some projects over to it from Subversion. Since Google Code doesn’t yet support hosting Mercurial repositories, pyAntTasks has stayed on Subversion. Until now that is – I set up a Mercurial repository for pyAntTasks on sharesource here. I’ll also push out any new releases to Subversion on Google Code, so people can still use that as well.

Posted in Python | Leave a comment

This blog is now an OpenID

[update: when I switched to WordPress, I made http://rpstechnologies.net/ron my OpenID page - see that URL instead of this blog for the OpenID headers]

This blog is now my OpenID. OpenID is a decentralized single sign-on system for the web. I’d been hearing about it for a while, and when I looked into it, I liked what I saw, so started using it for Magnolia and Plaxo. Unlike single sign-on systems such as MS Passport, OpenID is completely open in licensing and implementations, and is truly decentralized. Anyone can set up to be an OpenID provider, and there are several options including myOpenID, Verisign, AOL, and open source providers. I also like that it’s a simple system and relatively easy to implement. There are several OpenID libraries, including libraries for Java, Python, Ruby, and PHP.

When using OpenID, a URL is your unique identifier, which you use to sign into websites that support OpenID. Normally the URL is hosted on a OpenID provider such as myOpenID (e.g. http://rsmith847.myopenid.com/). But OpenID also supports delegation, where you can use any web page as your OpenID. For this to work, the web page you want to use as your OpenID has to have some tags in the header that instruct any OpenID consumer (the site you’re trying to log into) to go to your OpenID provider to sign you in. Why this extra level of indirection? So you can use the same web page as your OpenID even if you change OpenID providers. Simon Willison’s blog has clear instructions on how to set up any web page as an OpenID.

So if you look at the source for this web page, you’ll see two tags at the bottom of the header, which point to my OpenID provider (myOpenID at the moment). For some reason Plaxo had problems accepting my web page URL as an OpenID, so I had to fall back to using my myOpenID URL. Magnolia had no problems with it.

Technorati Tags:

Posted in Web | Tagged | Leave a comment

Version control systems: Going decentralized

I’ve used a number of version control systems over time, starting out with SCCS and RCS. I used CVS for many years for my own work and for client projects where CVS could be used. For the last 4 years I’ve been using and recommending Subversion, a relatively recent centralized version control system. Subversion is a big improvement over CVS, and I’ve found it to be a good VCS, but it still suffers from problems common to all centralized version control systems.

In a centralized version control system, there’s a single repository that tracks the changes to all files, and any version control operations (check out, commit, merge, etc) must go through the central repository. Lately I’ve been tracking the decentralized version control systems that have come out and have been getting attention. As the name implies, in a decentralized version control system, there isn’t a single central repository and server responsible for tracking changes to files. In a decentralized version control system, you can have any number of repositories tracking changes to the same set of files. Each repository is as fully capable as the next and able to carry out all of the version control operations (commit, checkout, merge, etc) on the files. No repository is inherently more important than another. The changes to files (change sets) can be pushed and pulled from one repository to another. Although no repository is inherently more important than another, when teams work with decentralized version control systems, they typically designate one repository as the “master” repository to which all changes ultimately get pushed and from which other repositories can pull changes. Another variation on this is to have two or more “master” repositories which exchange change sets between them. This model of having multiple repositories extends all the way down to the workstation. In most decentralized VCSs, developers work off a local repository, which is just as full fledged and capable as any “master” repository.

There are several real world “itches” I experienced that got me looking into decentralized version control systems:

Working offline

I sometimes work somewhere that doesn’t have a connection to the central Subversion repository – either I’m working somewhere without an internet connection, or don’t have a connection to the internal network where the Subversion repository resides. This is actually not that uncommon for me. With a central repository, this means I can’t rename files in Subversion, commit changes, create a new branch to work on an independent change, merge work, etc. I’m basically limited to editing files without renames/moves. With a decentralized VCS, I could do everything I can do with a central repository while disconnected, because every developer has a full fledged repository they are locally working from. When I’m again connected with the “master” repository, I can push my changes up to it.

Delays in getting central VCS repository set up

Sometimes getting a new repository added to a centralized VCS for a new project can take quite a while. The repository has to be set up, permissions have to be granted, and the person responsible for it may be out, or have a million other things to do. Until the repository is set up, the team has to share source code by some other means (email, etc) which is error prone and no versioning information is captured. With a decentralized VCS, a team could start working together using their own repository, then push all of the changes into a “master” repository when it becomes available. The team is able to get work done without being blocked, and the system administrators are able to control access to the authoritative VCS.

Distributed teams with poor network access

When working with distributed teams, the network access for offsite teams has frequently been average to poor. Sometimes connectivity to the central repository goes down altogether, and when connectivity is up, operations such as commits, merges, and updates from the repository are painfully slow. Teams work around this by falling back to sharing code via email (error prone, cumbersome), or holding off sharing changes, which makes for muddled commits and makes integration more difficult.

Moving a repository from one server to another

Sometimes you need to switch a VCS repository from one server to another; maybe the server needed to be used for something else, a more powerful server was needed, or the VCS repositories were being consolidated onto fewer servers. With a centralized VCS, when this happens, the entire team needs to switch over at the same time. This means that any changes that are going on have to be committed (hopefully they don’t break the build), the repository has to be copied over to the new location, the continuous integration process needs to be switched over, and all developers need to switch their workstations to point to the new repository. Needless to say this disrupts development and incurs a real cost.

With a decentralized VCS, the new repository could be set up with a copy of the old repository while the old repository is still being used. The continuous integration process and developers could gradually switch to the new repository when practical (e.g. after completing development of a story). All the while, the new repository can pull any changes that are committed to the old repository, ensuring the new repository is up to date. You could also have changes committed to the new repository flow back to the old repository while it is still in use. Whenever you have migrated all of the processes and people to the new repository, the old repository can be turned off. Everything has switched over without disrupting development.

Repository going down

Unfortunately, VCS servers do go down, more frequently than it seems they should. Usually you just need to restart the server process, but sometimes hardware failures can take the repository down for longer periods. If you are diligent about keeping backups of your repository, maybe you can get back up and running without much delay. The worst case of this I remember was when using ClearCase in a large development team where everyone used virtual views, which was recommended by the ClearCase admins. When the ClearCase server went down, no one could even access their code (since the files are served up from a virtual filesystem by ClearCase), so most of the development team just went home after a couple hours of being down. I’ve heard that ClearCase virtual views aren’t recommended for this reason, but I don’t know enough about the intricacies of ClearCase to say.

If using a decentralized VCS, you can set up one or more mirrors of the main VCS repository, so if one goes down, people can use one of the others without interrupting development. Also, since everyone has a full repository on their workstations, it’s pretty easy to clone one of those repositories and set up a temporary main repository for everyone to push/pull changes to. But for short outages, the development team likely won’t even notice since people mostly work off their local repository and holding off from pushing changes to the main server for a while isn’t a big deal. In short, a decentralized VCS is much more fault tolerant than a centralized VCS.

There are several capable decentralized version control systems available, including Bazaar, Mercurial, SVK, GIT, and more. Some of these are being used on major open source projects such as the open source JDK, OpenSolaris, and the Linux kernel.

Since this post is getting lengthy, I’ll follow up with another post on my experience after having switched to a decentralized VCS.

Posted in Tools | Tagged | Leave a comment

iPhone hacks

Well, it looks like hackers have been hard at work on the iPhone since it came out and have managed to hack the iPhone to gain access to the filesystem, alter the iPhone’s internals, and to launch custom programs. This has in turn been used to enable people to install custom ring tones and sounds, use the EDGE network from a laptop, and install and run a simple program on the iPhone. It’s unclear whether Apple can/will disable some of these hacks in the future. These hacks are still really only for techies and you may be voiding your warranty. Also, it looks like some of these hacks break visual voicemail and EDGE network access.

Meanwhile, there are already hundreds of web-based iPhone applications using the web-based development model Apple supports. I’m still hoping Apple will open up the SDK for custom applications.

Posted in General | Leave a comment

Web development for iPhone

Apple has posted guidelines for developing web applications for the iPhone. Apple has not provided an SDK for developing applications, so for now this is the only way to create applications for the iPhone.

The guidelines tell us how to best construct web pages for Safari on the iPhone, how to optimize media served up to the iPhone, what technologies are in (JavaScript, AJAX, most rich media formats), and what technologies are out (Flash, Java applets, plugins). They also tell us how to create links in web pages to the phone, mail, and Google maps applications so that phone numbers in applications can automatically dial a phone number, etc.

I actually think this is a pretty good development model for many types of applications. Since everything is using standard web technologies (XHTML, JavaScript, DHTML, AJAX), there’s an army of developers that already have the necessary skills, and adapting an existing web application to work well on the iPhone will be much less work than rebuilding it for a custom SDK. Due to AJAX and frameworks such as GWT, the usability gap between desktop UIs and web application UIs has narrowed considerably.

To me, the one big missing piece in this model is the ability to store data offline, along the lines of Google Gears. Google Gears is a browser plugin and includes SQLite, a very capable embedded RDBMS. This would allow applications to be delivered to the iPhone that can work online or offline, storing their data locally on the iPhone, and syncing up with servers when connected. I want my applications to work when there isn’t good EDGE nor WiFI network coverage. I would think twice about choosing the iPhone as the platform for that new sales team application if I knew it would only work if the sales guy was in an EDGE-covered area. I would like to be able to use Google reader to catch up on my RSS feeds even if I’m on the train and I keep losing my net connection. The only downside that comes to mind is storing application data in this manner will tend to use up the iPhone’s flash memory (4 GB or 8 GB).

Although I like the idea of developing applications for the iPhone using standard web technologies, I hope Apple will make an iPhone SDK available soon so developers can create applications that can take fuller advantage of the iPhone’s resources. I can’t create a voice memo application as a webapp.

Posted in General | Tagged | 1 Comment

What MDA needs: a user’s perspective

One of my areas of interest in software development is model driven architecture and model driven software development. I’ve successfully applied MDA to software projects and have seen the benefits first hand. Although MDA/MDSD tools and techniques can bring several benefits to software projects, including improving developer productivity, following are some features that I think would greatly increase the adoption of MDA/MDSD in the industry:

Make it easy to write and customize code generation cartridges

After working with MDA/MDSD tools for a while, it’s common to need to customize the code generation cartridges and/or write entirely new ones. The MDA/MDSD tool should make it easy to do so. The model transformation and code generation process should be easy to understand and creating new transformations/generators should be straightforward. Creating new cartridges for AndroMDA 3 is fairly involved and not easy to get up to speed with. The process involves creating UML meta-models, using AndroMDA to generate code from the meta-models, then writing Java code to implement the model to model transformations. AndroMDA 4 looks to be simplifying the process greatly; you still define your meta-models, but the ATL language is used for model to model transformations rather than running generators and writing Java code. openArchitectureWare uses a similar process to AndroMDA 4.

Fast transformation/code generation cycles

Developers rightly don’t have much patience for anything that significantly slows down the code-compile-test cycle. Agile development and refactoring techniques call for continuous code-compile-test cycles. Because in an MDA approach, the model (expressed as UML or some other form) is part of the code, the model will be updated and transformed to code frequently. Practically, this means that MDA tools should support running transformation directly in the modeling tool and IDE, and should support incremental transformations, so the entire application does not have to be regenerated for small changes.

Capable open source UML tools

I’ll happily fork over money for a commercial product that works well and is reasonably priced, but there are many people and software development groups that will be much less likely to try something out if there’s a significant price tag involved, especially when you’re not even sure it’ll work for you long term. There’s also a subset of open source developers for whom relying on a commercial tool, regardless of the price, is a non-starter. For MDA to gain wider adoption, a capable open source UML 2.0 tool is needed. That’s not to say there shouldn’t be commercial offerings, just that at least one viable open source option is needed. ArgoUML is the closest candidate I’ve seen.

Support for textual models

Some types of models don’t lend themselves to expression via UML. For these cases, the MDA tool should support models expressed via other formats, such as text-based models (e.g. a declarative language). The MDA tool should be able to combine information from multiple models, some in UML, and some in other formats, during a transformation cycle. The MDA developer can then use the format that best fits the model to be expressed. openArchitectureWare supports non-UML models, including text-based languages. It looks like AndroMDA 4 utilizes EMF, which supports models in various formats.

Standardized meta-models

A meta-model defines the language that is used to model a particular domain. In UML terms, it defines the stereotypes that may be attached to the various classes, attributes, associations, and other elements, as well as the tags that may be used to provide additional information to the MDA transformations. Currently, each MDA tool defines its own, different, meta-model. So although modeling an Entity class for one MDA tool may be very similar to modeling an Entity class for another MDA tool, they are incompatible. Wherever possible, MDA tools should start supporting each other’s meta-models, creating de-facto standards over time. Where an existing meta-model doesn’t meet their needs, MDA tools should attempt to extend an existing meta-model rather than creating a new one. This would allow development teams to more easily port a model from one MDA tool to another and would reassure teams their investment in a model will not be wasted.

Intelligent refactoring

Refactoring refers to techniques for making improvements to code without altering its behavior. “Improving the Design of Existing Code” as Kent Beck’s seminal book puts it. In an MDA-based system, refactorings frequently span both the model as well as the hand-written code. A simple example of this is the “Rename method” refactoring. If a method is renamed in a UML model, when the model-to-code transformation occurs, there very well may be hand-written subclasses that must also rename the method, and other classes that invoke the renamed method that need to be updated. Modern IDEs have built in support for the most common refactorings, and will apply those related changes for you. MDA tools should be able to detect the change in the model (the refactoring), and allow the developer to automatically apply the corresponding refactoring changes in their IDE. For this to work, MDA tools will likely have to be integrated more closely with IDEs such as Eclipse and NetBeans via plugins.

Finally, one thing MDA tools do not need:

Round tripping

When introducing MDA tools, I frequently get the remark “you know, I wish tool X would support round-tripping”. Round tripping is a feature found in many modelling tools since the early 90′s, where code could be generated from the UML model, then the code could be modified, and those changes could be synchronized back into the model. Round-tripping and reverse-engineering largely do not apply to MDA. Round-tripping is really only practical if there’s a very close correspondence between the classes being modelled and the code being generated. In UML tools that generate code, there is frequently a single source module that is generated for each class that is modeled. In an MDA transformation, a single class in a UML model may result in several files being generated; a DAO interface, a DAO implementation class, a domain object interface, a domain object implementation class, a hibernate mapping file, and an SQL DDL file. In this case, how should changes to source code map back to the UML model? What if an attribute is added to the domain interface and class but not to the HBM file? What if an attribute is renamed in one but not the others? I think the request for this feature is really a band-aid for some other problem, such as: being too restrictive in who can update the model, code generation cycles that take too long, or unfamiliarity with UML and modeling.

Posted in MDSD | Tagged | Leave a comment