Saturday, 6 December 2014

Why I moved to Git from Mercurial

There are stories everywhere about how Git is eating everyone's lunch. For a long time, I happily used Mercurial primarily because I needed a cross-platform DVCS that had a sane command-line interface. I don't think there is much disagreement that the first few iterations of git were not very portable nor was the command-line very friendly. Mercurial has been in daily use by me very nearly since it came out and I don't have any complaints. It's that good. So why move?

One word: Magit.

The only Emacs packages for Mercurial I tried (vc.el and Monky) were very basic. Monky, an imitation of Magit, was a very superficial approach and vc.el still hasn't moved very far out of the days of centralized version control, though it has somewhat. Additionally, Magit has a killer feature that rightfully matches Git's killer feature: editing the index.

ADHD Coding and the Index

Sometimes, due to what I assume is late-blooming ADHD, I go on development tangents where I fix bugs and/or add little features here and there. When it comes time to commit, I have many changes intermixed with one another and committing them all at the same time is a terrible idea for a clean, readable history.

Git's source editing process works like this: edit -> stage -> commit. Until you stage something, it cannot not be committed. Take a look at the Git book for more information.

What Magit allows you to do superbly (obviously through Git), is within each file, "stage" the only the changes you want. So if you have 3 blocks of changes within a file, you can choose to only stage the one change without reverting the other 2. Now you can commit related changes together even if you worked on multiple things at the same time.

So when using the ADHD development methodology described above, staging diff hunks is ridiculously helpful and Magit makes it easy to do the right thing even for simpletons: s for stage, u for unstage. I never did try very hard to figure out how to do it with Mercurial but I think I remember hearing the word "plugin" or "queues" before I zoned out.

Projects

I've moved most of my private repositories, including my JIRA time tracker over to Git as well as my dependency injection library for C++. Switching to Git for continuous builds was easy as pie. The only real change I had to make was the bit of the build that grabbed the HEAD revision:

diff --git a/ConfigureWorklogAssistant.cmake b/ConfigureWorklogAssistant.cmakeindex 988f6c4..1fb5d36 100644
--- a/ConfigureWorklogAssistant.cmake
+++ b/ConfigureWorklogAssistant.cmake
@@ -32,8 +32,9 @@ macro(define_kimi_product OutputFileVar ProductGUID CompanyName

   set(KIMI_PRODUCT_SOURCE_REVISION)
   execute_process(
-    COMMAND hg head tip --template "{rev}"
+    COMMAND git log -n 1 --format=%h
     OUTPUT_VARIABLE KIMI_PRODUCT_SOURCE_REVISION
+    OUTPUT_STRIP_TRAILING_WHITESPACE
     )

   configure_file(

The future

I am pretty much set with Git for now. The fact that it only took a few hours is probably a testament to how similar the two are. However, if Mercurial was to develop an interface as good as Magit, including supporting some form of staging in a big way, I think I would consider switching back. A DVCS written in Python is too delicious to ignore.

Thursday, 30 October 2014

A walk through a cool code generation trick from Clang

[edit: fixed css]

If you don't know, Clang is an open source C, C++, Objective-C and Objective-C++ front-end for the LLVM compiler. Clang/LLVM contributors are probably some of the brightest software engineers you'll come across so there is always something to learn. Take a look at LLVM weekly to see the kinds of things people are working on. Just in the last week:

  • Improved MSVC/Golang/GCC support/compatibility
  • New intrinsics for the LLVM IR
  • A disassembler for the Hexagon backend
If you want to dig into the architecture of Clang/LLVM, you should read the LLVM chapter from "Architecture of Open Source Applications". In fact, both volumes of the book should be mandatory reading for anyone interested in software design and architecture.

The Clang Philosophy

5H126L1.png
Figure 1: The Clang Philosophy, summed up by Xzibit

When building Clang itself, you are actually using 3 code generators:
  • The C++ compiler you are using to compile Clang
  • The C++ preprocessor
  • A DSL-to-C++ code generator called tblgen - yes, there is
    literally a custom compiler used to build clang!
tblgen is run during the build process to generate (usually) C++ code. Although I'm not aware of it generating anything else, it could conceivably be used to do so. The purpose of the program is to help the developers maintain lists of domain-specific information. For example: allowable attributes on a function or types of nodes in the AST.

Here is the example we're going to look at: Attr.td. In particular, we'll follow the lines below into the final C++:

class MSInheritanceAttr : InheritableAttr;

def SingleInheritance : MSInheritanceAttr {
  let Spellings = [Keyword<"__single_inheritance">];
}

During the build, tblgen is given Attr.td which it parses into a AST and processes:

OS << "#ifndef LAST_MS_INHERITABLE_ATTR\n";
OS << "#define LAST_MS_INHERITABLE_ATTR(NAME)"
" MS_INHERITABLE_ATTR(NAME)\n";
OS << "#endif\n\n";

Record *InhClass = Records.getClass("InheritableAttr");
Record *InhParamClass = Records.getClass("InheritableParamAttr");
Record *MSInheritanceClass = Records.getClass("MSInheritanceAttr");
std::vector<Record*> Attrs = Records.getAllDerivedDefinitions("Attr"),
      NonInhAttrs, InhAttrs, InhParamAttrs, MSInhAttrs;
for (std::vector<Record*>::iterator i = Attrs.begin(), e = Attrs.end();
     i != e; ++i) {
  if (!(*i)->getValueAsBit("ASTNode"))
    continue;

  if ((*i)->isSubClassOf(InhParamClass))
    InhParamAttrs.push_back(*i);
  else if ((*i)->isSubClassOf(MSInheritanceClass))
    MSInhAttrs.push_back(*i);
  else if ((*i)->isSubClassOf(InhClass))
    InhAttrs.push_back(*i);
  else
    NonInhAttrs.push_back(*i);
}

EmitAttrList(OS, "INHERITABLE_PARAM_ATTR", InhParamAttrs);
EmitAttrList(OS, "MS_INHERITABLE_ATTR", MSInhAttrs);
EmitAttrList(OS, "INHERITABLE_ATTR", InhAttrs);
EmitAttrList(OS, "ATTR", NonInhAttrs);

Something you should probably take note of is how easy the code is to read and understand. This is the rule rather than the exception. In any case, the final result of tblgen (in this case) is a header file, AttrList.inc, part of which is reproduced here:

MS_INHERITABLE_ATTR(MultipleInheritance)
MS_INHERITABLE_ATTR(SingleInheritance)
MS_INHERITABLE_ATTR(UnspecifiedInheritance)
LAST_MS_INHERITABLE_ATTR(VirtualInheritance)

I know, it's not very impressive. But check out what I can do now (this probably doesn't actually work):

static const char * MsAttrs[] = {
#  define MS_INHERITABLE_ATTR(Name) #Name,
#  define LAST_MS_INHERITABLE_ATTR(Name) #Name
#    include "clang/Basic/AttrList.inc"
#  undef MS_INHERITABLE_ATTR
#  undef LAST_MS_INHERITABLE_ATTR
};

int main() {
  std::cout << "All MS attrs" << std::endl;
  for( auto attr : MsAttrs)
    std::cout << attr << std::endl;
}

By listing a bunch of domain-specific "records" in a manner that can be used by the preprocessor, I can now do whatever I like with those records! I've used this to great effect in my own code (from my JIRA time tracker). Here is a snippet of JiraTaskAttrs.inc which contains a list of attributes for a task:

JIRA_TASK_ATTR(Key)
JIRA_TASK_ATTR(Summary)
JIRA_TASK_ATTR(Description)
JIRA_TASK_ATTR(Type)
JIRA_TASK_ATTR(TypeIconUrl)
JIRA_TASK_ATTR(ProjectKey)
JIRA_TASK_ATTR(ProjectName)
JIRA_TASK_ATTR(ProjectImageUrl)
JIRA_TASK_ATTR(Priority)
JIRA_TASK_ATTR(PriorityIconUrl)
JIRA_TASK_ATTR(Status)
JIRA_TASK_ATTR(StatusIconUrl)
JIRA_TASK_ATTR(Resolution)
JIRA_TASK_ATTR(Security)
JIRA_TASK_ATTR(Assignee)
JIRA_TASK_ATTR(Reporter)
JIRA_TASK_ATTR(Labels)
JIRA_TASK_ATTR(AffectsVersions)
JIRA_TASK_ATTR(FixVersions)
JIRA_TASK_ATTR(Components)
JIRA_TASK_ATTR(TimeEstimated)
JIRA_TASK_ATTR(TimeSpent)
JIRA_TASK_ATTR(TimeRemaining)

Using this header file, I can generate enumerations:

enum TaskAttr{
# define JIRA_TASK_ATTR(Name,...) TaskAttr_##Name,
#   include "JiraTaskAttrs.inc"
# undef JIRA_TASK_ATTR
  NumTaskAttrs
};

Or function definitions:

#define IMPL(Enum)                                                      \
  QString                                                               \
  KJiraRepositoryType::TaskAttr_ ## Enum() const {                      \
    return toQString(model::Task::getAttrColumnName(JiraRepositoryType::TaskAttr_ ## Enum)); \
  }

#define JIRA_TASK_ATTR(Name,...) IMPL(Name)
#include "common/connections/jira/JiraTaskAttrs.inc"

Here is another example from the same project of how I use this trick to enumerate configuration values: Source, C++. I've used this in numerous other places, but this should give you an idea of the types of things that are possible using just the preprocessor and a little forethought.

Conclusion

A good way to become a better craftsman is to learn from others. Software developers are kind of lucky because we have come up sharing the tools of the trade. The open source Clang compiler offers the combined experience of many intelligent software engineers and as a result, there are many tips to be learned that can make you better at your own job. So start reading today!

Saturday, 24 May 2014

Moving from SCons to CMake

TL;DR: I'm happy with CMake + Ninja, even though the language is ugly and the speedup is not that impressive.

I'm not ashamed to admit it: I stayed away from CMake because the language seemed inelegant.


Elegance and beauty can be defined in many ways but context is apparently important. Maybe this is why the most beautiful language in the world is relegated to being used as an extension language for my editor rather than to run the world.

I am currently in the middle of a project moving from SCons to CMake and I am pleasantly surprised at the quick progress. The SCons project has about 3KLOC of SCons build scripts that do everything from generating code for language bindings to building 20-something external C and C++ libraries. My experience with this larger project has also convinced me to move over my JIRA time tracker, because of all the other things I should be doing, the most important thing is moving over the build system for no particular reason at all.

SCons served the larger project well when it started. Python was an easy language to learn, if not master and so most people could read the code and make little modifications when they needed to. As the project grew and things got more complicated to manage, we started mimicking other build systems to add features that SCons did not have. The most important feature we cribbed was the "use requirement" feature from Boost Build. In a nutshell, it allows you to attach "arbitrary" requirements to a target that you are consuming. So if you use SomeFunkyDll.dll in your project, you may want to require that the library clients define SOME_FUNKY_DLL when they consume it. Automatically doing this in SCons was a big pain in the rear. I don't need to tell you how we did it because it is not relevant but I wanted to give you an idea of the types of things we found important.

Another problem that the project started having is once we passed some critical mass of source code, the build seemed to slow down significantly. In the old days, SCons could be slow as molasses because it wanted to be correct over everything else. We tried everything in the "go fast" recommendation list but at best, a no-op build took a long time and this was resulting in too many sword fights.

Eventually, we got to the point where we thought that working on this project was becoming inefficient so we looked for alternatives. I had always had CMake in the back of my mind as I had actually written my own makefile generator for another project. The architectural separation of specification and the actual build actions allowed optimizations at different levels not unlike a compiler which allowed me to add many more features than I would have been able to had I had to also develop the build action logic.

I won't be talking about the larger project yet (maybe ever) but I will talk about my experience moving over Worklog Assistant's build system. Just so that there is a frame of reference:

  • Number of targets in build (including third-party files): ~1K
  • Lines of app code (excluding third-party files): ~20K

That is, it's not a huge project.

A post on build systems can go on forever, but the performance of the build for each of these activities is what actually matters to me when I'm doing my daily work:

  • Clean build
  • Modifying a core header file (aka a file that nearly everything depends on)
  • Modifying any cpp file in the build
  • Modifying an important, but not core header file
  • For some reason this seems to be important when comparing build systems: building when nothing changed. I don't personally care about this metric.
Each of these will have some sort of relative importance and for me, the activities where build performance matters most are modifying non-core source and header files. The rest have minimal importance.

The CMake language, other weirdness and leaky abstractions

My initial aversion to CMake was likely due to the fact that it was clear to me there would be weird conventions and things that worked kind of differently depending on the backend chosen. I was not wrong in this assumption, but I was wrong in how much they would bother me: not much. Don't get me wrong, it's frustrating not completely understanding why changing an add_custom_target to add_custom_command makes things work "better" with generated files, but by and large, there are few such design bugs.

The biggest reaction any decent developer probably has when being introduced to CMake is the language. It just looks bad. However, while the language is not efficient by any means, nor is it beautiful, it is predictable and regular which makes it easy to learn. After the initial shock of the aesthetics of the language, you settle into it pretty easily. Thought: perhaps the language being intellectually unappealing encourages the developer to spend as little time in it as possible making him/her quite efficient. Did they learn this from Java? Hmm.

Different build systems

I have not launched Visual Studio to do a build since perhaps 2008. There are many reasons to do so however. Comfort and familiarity being one. Incredibuild is another. I think this is the main reason why CMake will eclipse all other C++ build systems: it is a 1-to-N solution. That is, one team's preference to fulfill their requirements can support the particular preference of N other teams to fulfill theirs. This is not possible with other build systems.

Third-party library support

What surprised me the most was how often there were CMake modules available to use for $MY_FAVOURITE_LIBRARY. This allowed me to treat the building of third-party code as a black box by simply adding a call to add_subdirectory or adding the appropriate path to CMAKE_MODULE_PATH and adding a call to find_package. CMake would just use the right compiler and the right flags to avoid build incompatibilities. Previously, I'd just write build scripts for the third-party code myself. Win.

A highly unscientific test with lots of numbers to look official

Here is a nice chart:
All tests are run with the C++ compiler that comes with Visual Studio 2013 SP1.
  • scons -j8: A fairly well-optimized SCons build using many of the tips from the SCons wiki, run with 8 parallel jobs. Source lists are built dynamically, using a recursive glob + wildcard filter.
  • cmake + ninja 1: An unoptimized CMake (2.8.12.2) build using the Ninja stable release from September 2013. Source lists are built with a preprocessing step and some elisp to make maintenance of the preprocessed file easier. See gist.
  • cmake + ninja 2: Same as cmake + ninja 1, but configured with CMAKE_LINK_DEPENDS_NO_SHARED=ON
  • cmake 3.0 + ninja: Same as cmake + ninja 1, but using cmake 3.0
  • cmake + ninja latest: Same as cmake + ninja 1, but using Ninja from git
This looks pretty OK! CMake and Ninja beat SCons. But this is only part of the story. Here is the actual data:

Importancescons -j8cmake + ninja 1cmake + ninja 2cmake 3.0 + ninjacmake + ninja latestcmake 1 % speedupcmake 2 % speedupcmake 3.0 % speedupcmake + ninja latest % speedup over scons
Clean build5%224279264266261-24.55%-17.86%-18.75%-16.52%
Touch core header file and rebuild5%165178174180175-7.88%-5.45%-9.09%-6.06%
Touch a source file and rebuild50%463535353523.91%23.91%23.91%23.91%
Touch an arbitrary non-core header file and rebuild40%606865.56665-13.33%-9.17%-10.00%-8.33%
Do-nothing and build0%100.50.50.20.4495.00%95.00%98.00%95.60%
Weighted average100%5.00%7.12%6.56%7.49%
The Ninja build sees only a significant improvement over SCons when modifying source files but it has a terrible result when it comes to touching header files. I have my own theories as to why, but running ninja -d stats showed that loading the dependency file was the biggest slowdown.

The weighted average is computed using the "importance" column. I think touching source files should perhaps have a higher importance than I've given it above. Do I really edit header files as often as source files? I don't have the answer to that so as a proxy, I just ran a simple script counting how many header files vs source files were touched in a 6 month period and applied the ratio as a weight. My gut feeling is that I probably spend more time editing source files than header files.

Conclusions

Not as clear cut a win as I had hoped: a well-optimized SCons build can still be faster for some sizes of projects and it is possibly more correct. However, the equivalent SCons build does require more maintenance to provide the same features as one in CMake. I found that compiler/toolchain support is much better with CMake than with SCons and I write less code to do the same (complex) thing. Occasionally, CMake's abstractions leak making it awkward to do simple things like generating files, which are easy with SCons. On the other hand, CMake makes it easy to add per-file compile options which are difficult with SCons.

Since the performance for this particular example is roughly equal, I have to choose the system that makes me feel more productive. And by that metric, CMake + Ninja make me "feel" more productive only because Ninja starts building immediately.

What matters even more: the SCons community has always been small. The CMake community on the other hand is huge in comparison and is only growing. I expect performance to improve and more backend options to become available that improve performance.

I had about 4 or 5 years of SCons experience that went into the build for this project. This means there are a lot of little performance tricks I applied and as my experience with CMake is not nearly as long, it isn't a fair comparison. I should be ashamed of expecting a free speedup, but the speedup I did see for an important use case makes me feel as if it was worth it. 

On the upside, I no longer have to wait for SCons to start building something after hitting compile.

Have you transitioned to/from CMake? I'd be curious to hear about your experiences.

Interesting links

Monday, 12 May 2014

So I was right about the vector, wrong about the players

You will cringe hard when you read how I wrote about C# many years ago but I had honestly expected this to happen to C#, not Java. Maybe having C# as a ISO standard actually had some benefits.

Will there be a Heartbleed-like crisis where these ideas will cause a reckoning for the industry? As I understand it, the licensing for implementing the x86/x86_64 ISAs reached monopoly level years ago, actively shutting out competition. The main hardware innovation is now coming from mobile where x86 can't yet compete. PC sales have been leveling out ever since the introduction of iStuff. I don't think the three are correlated, but it is interesting.

PS: If you still want to be friends, do not read the rest of the content on that first link. Oh man.

Thursday, 21 March 2013

Design by committee works slowly

The latest draft for polymorphic lambda expressions, which I advocated for in a post about 3 thousand years ago, is a step in the right direction. I greatly appreciate the time that the authors are taking to push C++ forward. I know they do it on a volunteer basis and I believe their passion for it makes C++ one of the best languages to use for a variety of projects. On reading the draft though, I'm still a little underwhelmed.

Lambda expressions are anonymous functions that are common in languages with first-class functions like Lisp. Roughly, the language gives you the ability to create functions "at runtime" which allows you to store data and other state. Once this is possible, anything is possible. You can read more about anonymous functions at Wikipedia.

When a programmer uses anonymous functions, he or she is not doing it for a technical reason (i.e., performance) They are doing it for one or all of the following reasons:

  • Lexical locality: The data that the anonymous function will be operating on is somewhere nearby and we just need to do a little transform on it to make it useful to something else which is also nearby.
  • Readability: x => 2*x+y is much easier to read and understand than MyFunctor f(y) because you need to go look up the definition of MyFunctor.

In x => 2*x+y, you can see that the 'y' value must come from somewhere else in the function: capturing data in lexical scope is an important part of anonymous functions. This is the reason why MyFunctor takes in 'y' as a parameter.

Anyway, as my post tried to explain, one of the main problems is the ridiculous verbosity implicit in monomorphic lambda expressions. By allowing polymorphic lambdas, the verbosity has a chance to be reduced or even eliminated to the simplest possible thing. The latest draft makes an "auto" necessary on lambda expression parameters.

To recap, C++11 lambda expressions transform a statement like:

    [](double slope, double intercept, double x){ return slope * x + intercept; }

into a function object not completely unlike:

    struct LOL
    {
        double operator()(double slope, double intercept, double x){ return ... ;}
    };

Most lambda expressions will only ever be used with one set of parameter types and in one situation so it is not hard to understand why this is one acceptable syntax. However, languages like C# have much more concise syntax for the above case:

   slope,intercept,x => slope * x + intercept

The compiler figures out the types since it is a statically typed language and everyone is happy.

Before lambda expressions, in C++, we might have written:

    namespace bl = boost::lambda;
    ...    bl::_1*bl::_2 + bl::_3 ...

My goal for C++ lambda expressions would be to never use any of the Boost lambda libraries again, as useful and awesome as they are. With the new draft, the C++11 version becomes:

    [](auto slope, auto intercept, auto x){ return slope*x + intercept; }

As you can see, the above Boost Lambda form is arguably still preferable to the draft version of polymorphic functions just on length alone. And although it is longer, it is slightly easier to read and understand because of the named parameters. But why can't we spoil ourselves? There aren't too many technical tricks required to automatically turn:

    [](slope,intercept,x){ return slope*x + intercept; }

into the same form behind the scenes.

In my humble opinion, the auto actually adds nothing to readability and takes it away because I am required to read more to understand what is going on. Multiply this by thousands of expressions and multiple projects and it is just another thing I have to skip over. There is actually very little reason to require auto. With this extension, it is still easier to use Boost Lambda

The 5 people who voted "strongly against" making auto optional should rethink their votes. This is the best chance we have of getting it right the firstsecond time.

Sunday, 29 May 2011

Learning about Bitcoin or why I'll never use Bitcoin

Bitcoin is quite a promising e-currency. Created by some-guy-we-don't-really-know-or-a-double-agent-of-some-kind-who-is-probably-quite-Bitcoin-rich-now, it has some very useful properties:

  1. Creation of the money is an implicit and transparent agreement between users. That is, there is no centralized issuing authority and there is a finite quantity. Almost like gold.
  2. It is completely electronic and therefore very cheap to transfer. As a result, transaction fees are "low".
  3. Transactions are anonymized, yet completely public to avoid against double-spending.
  4. Some crypto stuff to make sure it is as secure as it can be today

The main desired outcome of a currency with these rules is autonomy of the currency from the somewhat arbitrary influence of centralized planners.

How you are supposed to use Bitcoin

So how do you use Bitcoin (BTC) as a consumer or vendor? Let us assume that you already have some BTC in your account.

  1. Visit place of business
  2. Locate item of interest which costs 0.02 BTC
  3. Go to cashier
  4. Pull out smart phone with your Bitcoin wallet or some kind of link to your Bitcoin wallet
  5. Use QR-code at register to find vendor's payment address. This address will likely be generated at the point of purchase
  6. Send Bitcoin to that address from your Bitcoin wallet
  7. The cashier and network verifies your payment (speed depends on transaction fee) and you go on your way

This is how it would work today if a business accepted BTC. I expect that if I am wrong and if Bitcoin does indeed take off, there will be clearing houses to speed up transactions like these. I think that these confirmations will necessarily be done outside the network but eventually, the network will also validate these which will be the final settlement step.

This exchange is appealing for various reasons. My favourite one is that the users of the system itself benefit by confirming transactions. That is, you can make Bitcoin just by verifying transactions.

Bitcoin Wallet

It is probably useful to discuss where Bitcoins are stored. This location, a file on your hard disk, is called a wallet. It consists of a set of private keys that correspond to each address generated as in the above scenario. This is your vault. If it is stolen in unencrypted form, your money is probably as as good as gone. But the coolest part is that if you have a backup and it was encrypted, you simply transfer the money to an account in a new wallet before the thieves are able to crack the encryption and almost by magic, your money is back again.

Anonymous vs Anonymized

Earlier, I said that transactions are anonymized. This is different from them being anonymous because an anonymizing technology does not imply anonymity. A transaction being anonymous means untraceable which is something that is quite easy to disprove in the BTC world.

Let's start at the beginning. How do you get BTC? There are a couple of ways. One way involves a lot of geekery and stuff that very few people have time for. This is called Bitcoin mining. For most people, just outright buying BTC like they buy USD is the most convenient. Currency is a proxy for labour so it is fine to buy BTC. As the market will continue to be volatile due to the simultaneous debasing of the USD, demand-side pressure as well as the continuous creation of BTC, I would spread out bigger purchases over a few months.

A convenient way to buy BTC is through an exchange. So let us walk through that process:
  1. Create an account with a BTC exchange. I used Bitcoin Market. This requires you to give them two things: an email address and a Bitcoin payment address. Notice how your email address is tied to your BTC address.
  2. Figure out the trade you want to make. I used BMBTC for PPUSD where BM = Bitcoin Market and PP = PayPal.
  3. Execute the trade by making a payment to some email address on PayPal.
When I executed this process, it took a total of 15 minutes for the trade to complete but it was a full hour before the money was in my actual wallet and verified by the network. You must note that this is the equivalent of someone on the other side of the world paying me $10 and someone delivering that $10 to me personally. Not to a bank account, not a promise for $10, but cold hard cash to me personally.

Notice that the process of conveniently buying BTC itself has multiple weak links:
  1. Your email address is tied to a Bitcoin address by Bitcoin Market
  2. Paypal knows who you are definitively through the use of your credit card
  3. Some random dude knows you bought some BTC
To avoid leaking too much information, you can create a new receiving address for every trade and update it on the Bitcoin Market. Note that Bitcoin Market has full trade information and PayPal has amount information. To reduce the risk there, you can use anonymizing email services or a special email just for Bitcoin purchases.

The main point is that once you use a credit card or a personal email address, your anonymity is compromised.

That's not such a big deal, to be honest. After all, you already trust a lot of people with your information online.

De-anonymizing the transactions

If the seller of the BTC was interested in which address bought the BTC through the exchange, s/he would just track the blocks for the specific amount.

When I purchased my BTC, I chose 2 BTC to see how difficult it would be to find in the block explorer. It was pretty easy! Why? Because I knew there would be three related transactions: one for 2, one for 1.99 and one for 0.01 (transaction fee by exchange.) The seller would know this as well.

So all I did was wait for a few blocks to come through the explorer and opened them all up in a browser tab and searched for 1.99. It took less than a minute.

So now, the seller of the BTC has tied my name (through Paypal) to an address.

You may be interested in the actual transaction as currently being confirmed by computers worldwide. Because of this decentralized confirmation, it is now impossible for the seller to re-sell the same BTC to someone else.

Using my Bitcoin or why I'll never use it

Can you figure out what I did with my BTC? Actually, you have all the information you need in this blog post. Once you figure it out, you'll understand why I'll never use it. The first person to add a comment with the right answer and their Bitcoin receiving address will get the remainder of my balance transferred to their Bitcoin address. It's not much, but I probably won't use it...

How to stay anonymous

There are ways to stay anonymous by obfuscating the block chain. However, this is not right. For a currency to be useful, its primitive form must be practically anonymous and not just anonymizing.

How I'd change Bitcoin

My main issues with Bitcoin:
  • Not anonymous: Identity "anchors" are very easy to establish by transacting with people as described above. This leads to a situation where an attacker can find out what you spend your BTC on for their own nefarious purposes.
  • The currency has no decay value. That is, it can be hoarded without consequence. I would like BTC to expire so that the currency can keep circulating. This maintains the value of the currency but prevents hoarding. The block chain has enough information to do this. Miners should be interested in this because it means they can continue to mine forever and keep a healthy Bitcoin economy.
I think the anonymity problem is the most hard to solve. I am only concerned with the ability to transfer coin between my own accounts easily without notifying anyone else. If some way could be devised to solve these problems, goodbye centralized currencies.


Tuesday, 26 April 2011

Deconstructing a dependency injection-driven application

I've been using my C++ dependency injection library for a project in the last year and it's gone pretty well. There are a lot of rough edges but I thought it could be interesting to the 3 of you still subscribed to this blog to de-construct the stock quote application.

About the application

The example itself is pretty straight forward. You have a choice of 3 stock quote providers: Yahoo!, static and phone. You choose one and ask for a stock quote. Magic happens and your stock quote arrives.

Example session (with some debug output)

Welcome to the DI Stock Quote App. Simplifying and complicating software development since 2010.
Which stock quote service would you like to use?
1: static
2: phone
3: yahoo
Enter your choice (1-3) and press enter: 3
You chose: yahoo
[DICPP]: No scope constructing: di::type_key<YahooStockQuoteService, void>
[DICPP]: Constructing: di::type_key<di::typed_provider<HttpDownloadService>, void>
[DICPP]: Completed constructing: di::type_key<di::typed_provider<HttpDownloadService>, void> with address: 0x100750
Stock symbol (type quit to quit): goog
[DICPP]: No scope constructing: di::type_key<HttpDownloadService, void>
[DICPP]: Constructing: di::type_key<boost::asio::io_service, void>
[DICPP]: Singleton: constructing: di::type_key<boost::asio::io_service, void>
[DICPP]: Completed constructing: di::type_key<boost::asio::io_service, void> with address: 0x1008a0
Current price for goog: 532.82
Stock symbol (type quit to quit): quit

See how the construction of the HTTP service is automatically delayed until actually needed. This is done through a concept called a "provider" which is basically an automatically generated factory.

About Dependency Injection

A really good introduction to the dependency injection technique as implemented by Guice can be found here. It's probably one of my favourite tech talks of all time.

Anyway, to refresh your memory, here are some of the main benefits of the technique used in Guice:

  • Object construction and lifecycle management is mostly handled for you.
  • Less boilerplate.
  • Makes code more testable.
  • Scopes (~object creation/lifecycle) can be customized by the user.

In short: a lot of the time, you no longer need to allocate objects or pass some object unused down multiple layers of functions or object constructors just to use them once way deep down in some code.

Magic!

I don't really recall how it is done in Guice but in the C++ library linked above, this magic is driven by a type registry which recursively registers constructor arguments as well as user customizations.

In extreme cases, you can initialize an entire application with a few lines of code:

  di::registry r;
r.add( r.type<MyApplication>() );
r.construct<shared_ptr<MyApplication>>()->execute();

This constructs the type registry which is a kind of factory. There is a mini-DSL for describing how you want the registry to handle the type. More on this later. In this case, we are asking the registry to "learn" about the MyApplication type as well as all objects that are required for constructing MyApplication.

"Pish-posh", you say. "MyApplication has a 0-arg constructor. I could do that in my sleep."

Would you be surprised if I said that the MyApplication type actually has 3 arguments?

Well, the above is almost what the StockQuote application looks like. Here is the main function for the stock quote example:

di::injector inj;
inj.install( StockQuoteAppModule() );
StockQuoteApp & app = inj.construct<StockQutoeApp&>(); // lifetime
app.execute();

And here is the constructor for the StockQuoteApp type:

DI_CONSTRUCTOR ( StockQuoteApp ,
( boost :: shared_ptr < UserInterface > ui ,
boost :: shared_ptr < StockQuoteServiceFactory > factory ));

When we ask the "injector" to construct the StockQuoteApp instance, it automatically creates the UserInterface as well as the StockQuoteServiceFactory instance.

The di::injector type is just a thin wrapper around the registry so you can treat it as such. The only thing it really provides is a little bit of syntax to allow you to create modules in a similar manner as Guice. The guts of StockQuoteAppModule accept a registry as a parameter and register the various types. You can see the mini-DSL referred to earlier:

void
StockQuoteAppModule::operator()( di::registry & r ) const
{
// In each module we define the module's root objects, in this case,
// StockQuoteApp as well as implementations/specializations of any
// abstract classes. For example, UserInterface is an ABC and we choose
// the console-based UI here.

r.add(
r.type<StockQuoteApp>()
.in_scope<di::scopes::singleton>() // The reason we can request a reference in the main function!
);

r.add(
r.type<UserInterface>()
.implementation<ConsoleInterface>()
.in_scope<di::scopes::singleton>()
);

r.add(
r.type<StockQuoteServiceFactory>()
.implementation<StaticStockQuoteServiceFactory>()
.in_scope<di::scopes::singleton>()
);

r.add(
r.type<HttpDownloadService>()
.implementation<AsioHttpDownloadService>()
);

r.add(
r.type<boost::asio::io_service>()
.in_scope<di::scopes::singleton>()
);
}

As you can see, the mini-DSL (ugly, ugly, ugly, details) describes a few things:

  • Default implementations for various interface classes. See UserInterface and ConsoleInterface, for example.
  • Life-cycle management. Singleton is mostly used here but you can also have HTTP-session scopes, thread-local scopes or no scopes (as in HttpDownloadService).

What this means is wherever a type T with a DI_CONSTRUCTOR macro is registered, the registry will use these rules described by the DSL to construct any arguments to T.

Providers
In this library, there is a concept of a type called a provider whose sole responsibility it is to construct objects (usually within the constraints of a scope). In the app session above, I pointed out how the HTTP download service is not instantiated until it is actually needed. This is done via a provider. You can see the YahooStockQuoteService has a constructor which accepts a provider and a function which makes use of it.

That should be enough information to peruse the example itself. Check the README as there are a couple of interesting exercises you can try.

By the way, this requires a Boost checkout with a built version of Boost Build. I apologize if you can't get it to build on checkout, but I haven't really focused on having other people use it!

Comments and thoughts welcome.