DRY, which stands for Don’t Repeat Yourself, is a well-known programming principle. Many developers, however, misunderstand the principle, possessing a false grasp of what DRY is really about. In this blog post we’ll investigate a case in which DRY is misunderstood and applied in the wrong manner, and its consequences. A hypothetical code base written in Golang will be used to demonstrate the idea, but the syntax should be easy enough to understand even without Golang knowledge.

The problem

Imagine we’ve recently joined a startup that operates in the promotion domain. The core product of our team is a piece of software that manages promotion for different vendors in our city. A promotion can be a 10% discount for all household products in our local supermarket, or a free coffee for all KFC orders. Furthermore, a promotion is often attached to a certain condition. For example, a discount may only trigger if the buyer is paying for at least $20.

A promotion can be created with some basic information like promotion type, discount value, associated gift, minimum order value to reach the promotion, a counter of remaining promotion usages, promotion start date and end date. We can model it using the struct below. Note that in a real world production system we should not use float32 to model currency value. Instead use a package like shopspring/decimal for that purpose.

type Promotion struct {
	Type              PromotionType
	Value             float32
	Gift              Gift
	MinimumOrderValue float32
	Start             time.Time
	End               time.Time
	Remaining         int
}

type Gift struct {
	Name     string
	Quantity int
}

func (g Gift) OutOfStock() bool {
	return g.Quantity == 0
}

type PromotionType string

const (
	promoTypeDiscount PromotionType = "discount"
	promoTypeGift     PromotionType = "gift"
)

type Order struct {
	Value float32
}

When our user places an order, we’ll have to check its eligibility for a promotion.

func (p Promotion) IsApplicable(o Order) bool {
	if n := time.Now(); n.Before(p.Start) || n.After(p.End) {
		return false
	}

	if o.Value < p.MinimumOrderValue {
		return false
	}

	if p.Type == promoTypeGift && p.Gift.OutOfStock() {
		return false
	}

	if p.Remaining == 0 {
		return false
	}

	return true
}

So far so good! We launch our amazing product, and it’s a success!

Our product manager comes to see us the next week, and demands that we work on the implementation of the next feature: a voucher. A voucher, in a nutshell, is very similar to a promotion: a user applying a voucher will get some benefits as long as their order matches the requirements. The only difference between them is that a voucher requires the input of a special code, which is generated and distributed to users.

We come up with a solution for Voucher, which heavily resembles Promotion.

type Voucher struct {
	Code              string
	Type              PromotionType
	Value             float32
	Gift              Gift
	MinimumOrderValue float32
	Start             time.Time
	End               time.Time
}

func (v Voucher) IsApplicable(o Order) bool {
	if n := time.Now(); n.Before(v.Start) || n.After(v.End) {
		return false
	}

	if o.Value < v.MinimumOrderValue {
		return false
	}

	if v.Type == promoTypeGift && v.Gift.OutOfStock() {
		return false
	}

	if o.VoucherCode != v.Code {
		return false
	}

	return true
}

type Order struct {
	Value        float32
	VoucherCode  string
}

Wait a minute! The implementation of Voucher looks suspiciously similar to Promotion! There are so many overlapping pieces here: all the fields are the same except for Code, time validation of Voucher is exactly the same as Promotion, both need to compare the order value with its minimum order value, and both need to ensure that the associated gift’s availability. So many things are repeated, and it looks like a case of DRY violation.

The DRY solution

To eliminate the duplications, we introduce a beautiful abstraction called Offer.

type Offer struct {
	Type              PromotionType
	Value             float32
	Gift              Gift
	MinimumOrderValue float32
	Start             time.Time
	End               time.Time
}

func (ofr Offer) IsApplicable(o Order) bool {
	if n := time.Now(); n.Before(ofr.Start) || n.After(ofr.End) {
		return false
	}

	if o.Value < ofr.MinimumOrderValue {
		return false
	}

	if ofr.Type == promoTypeGift && ofr.Gift.OutOfStock() {
		return false
	}

	return true
}

type Promotion struct {
	Offer
	Remaining int
}

func (p Promotion) IsApplicable(o Order) bool {
	if !p.Offer.IsApplicable(o) {
		return false
	}

	if p.Remaining < 1 {
		return false
	}

	return true
}

type Voucher struct {
	Offer
	Code string
}

func (v Voucher) IsApplicable(o Order) bool {
	if !v.Offer.IsApplicable(o) {
		return false
	}

	if o.VoucherCode != v.Code {
		return false
	}

	return true
}

By grouping all common fields into Offer, our code gets much cleaner and with no sight of duplications. All mutual business logics nicely fit within method IsApplicable() of Offer. We successfully defeat the evil duplication, and it’s time to call it a day. We deserve a good rest.

The abomination

Business is booming and our product team has found new ways to monetize. “Hey everyone, we signed a contract with a third party company that provides GaaS a.k.a. Gifts as a Service. They provide an API to check gift availability, which we need to integrate for Voucher. We still need to manage Promotion gifts in house, but offloading Voucher gifts will save us millions!”, says our product manager with sheer excitement!

We frown upon hearing this. Our beautiful abstraction will have to adapt to this change, and it’s hard because method IsApplicable is shared by both Promotion and Voucher. Perhaps if they have their validation logic separated, will it be much easier? No, we shall not falter for the sake of DRY. We must not succumb to duplication even though it means adding an additional condition check in validation logic. We introduce a new field in Offer to determine if the gift should be validated by the third party API.

type Offer struct {
	// other fields
	ExternalGiftValidation bool
}

func (ofr Offer) Validate(o Order) bool {
	// other validation rules

	if ofr.Type == promoTypeGift {
		if ofr.ExternalGiftValidation {
			if isThirdPartyGiftOutOfStock(ofr) {
				return false
			}
		} else if ofr.Gift.OutOfStock() {
			return false
		}
	}

	return true
}

func isThirdPartyGiftOutOfStock(ofr Offer) bool { 
	// validate gift quantity using the third party API
}

ExternalGiftValidation will be set to true upon constructing a Voucher, and false otherwise. The solution works, and everyone is happy. But something is bothering us. The abstraction is kinda clunky now. ExternalGiftValidation is introduced just because we decided to abstract all common fields and validation logic of Voucher and Promotion into Offer. The method to validate a Promotion will be concerned with a condition check that is not relevant to it at all.

Another week passes, and our product manager comes back to us with a new requirement: our top vendors are asking for the ability to restrict Voucher usage to certain week days only. For example, they need their voucher to be usable on Saturday and Sunday only, because theywant to target weekend traffic. Apparently we have to modify the part where time validation happens.

type Offer struct {
	// other fields
	Schedule map[time.Weekday]struct{}
}

func (ofr Offer) IsApplicable(o Order) bool {
	n := time.Now()
	
	if len(ofr.Schedule) > 0 {
		if _, ok := ofr.Schedule[n.Weekday()]; !ok {
			return false
		}
	}

	if n.Before(ofr.Start) || n.After(ofr.End) {
		return false
	}

	// other validation rules

	return true
}

This method now contains another piece of code that only applies to Voucher. What’s even worse is that we have to ensure Promotion will never write to, or read from Schedule because Schedule is a concept for only Voucher. Maybe we could extract Schedule to Promotion only, and validate that field in Promotion’s IsApplicable method but then we have to duplicate n := time.Now(). We all know that duplication is the arch enemy of DRY, so we won’t do that.

Not so long after launching the schedule feature, another request comes in: some vendors demand that Promotion price reduction must only be applied to the order subtotal excluding any kind of taxes and service charges. This means the order value comparison logic of Voucher and Promotion now differ from each other.

type Offer struct {
	// other fields
	ExcludeExtraFees bool
}

func (ofr Offer) IsApplicable(o Order) bool {
	orderValue := o.Value
	if ofr.ExcludeExtraFees {
		orderValue = removeExtraFees(orderValue)
	}

	if orderValue < ofr.MinimumOrderValue {
		return false
	}
}

func removeExtraFees(orderValue float32) float32 {
	// logic to exclude taxes and service charges
	return orderValue
}

We introduce another boolean field to handle a special case for Promotion. When a vendor wants their Promotion to exclude extra fees, we set ExcludeExtraFees to true.

We take a look at the final implementation of the promotion validation logic, which looks kinda messy now because there are so many branching conditions.

func (ofr Offer) IsApplicable(o Order) bool {
	n := time.Now()

	if len(ofr.Schedule) > 0 {	// only applicable to voucher
		if _, ok := ofr.Schedule[n.Weekday()]; !ok {
			return false
		}
	}

	if n.Before(ofr.Start) || n.After(ofr.End) {
		return false
	}

	orderValue := o.Value
	if ofr.ExcludeExtraFees {	// only applicable to promo
		orderValue = removeExtraFees(orderValue)
	}

	if orderValue < ofr.MinimumOrderValue {
		return false
	}

	if ofr.Type == promoTypeGift {
		if ofr.ExternalGiftValidation { // only applicable to voucher
			if isThirdPartyGiftOutOfStock(ofr) {
				return false
			}
		} else if ofr.Gift.OutOfStock() {
			return false
		}
	}

	return true
}

On one gloomy day, we see our product manager stride across the room, his face darkens as he mutters: “We have a big problem, some Promotion is being validated using the third party service while they should not be, and this is causing a cascading chain of failures”. We investigate and find that there has been a mistake in the data retrieval layer: ExternalGiftValidation is set to true when constructing a Promotion! It seems like the more requirements we get, our beautiful abstraction gets uglier and uglier, and more error-prone even with rigorous unit tests. If only there’s a way to make this better, perhaps by keeping the business logic of Voucher and Promotion separated in the first place so that they can grow in different directions?

What DRY really is

DRY is introduced in The Pragmatic Programmer as followed:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system

But what does a piece of knowledge mean? Too many of us naively understand DRY as “do not duplicate code”. In the 20th anniversary edition of The Pragmatic Programmer the author has to clarify:

Many people took it to refer to code only: they thought that DRY means “don’t copy-and-paste lines of source.”

That is part of DRY, but it’s a tiny and fairly trivial part.

DRY is about the duplication of knowledge, of intent. It’s about expressing the same thing in two different places, possibly in two totally different ways.

So DRY is more about knowledge copying, and less about code copying. It’s very tempting for developers to immediately get rid of code duplications, but there’s an actual greater evil, as quoted by Sandi Metz

Duplication is far more cheaper than the wrong abstraction

In our hypothetical promotion management system, we start with 2 things that look a lot similar at first sight. Voucher and Promotion share so many things, from their structural shape to their behavior patterns. Abstracting these similarities sounds like a good idea at first. As the story goes on as we observe, it turns out to be actually a bad idea as Voucher and Promotion gradually deviate from each other due to unforeseen requirements. In fact, these 2 concepts should be kept separated in the beginning. Though there are some duplications, it is not DRY violation because they represent 2 different things, 2 different pieces of knowledge in our system. These 2 pieces of knowledge may resemble each other in the beginning, but they’re still 2 different things. Even if they turn out to overlap in the future, it’s easier to abstract them later than fixing the issues caused by prematurely abstracting them.

The story in this article is hypothetical, but I’ve witnessed many times in real life when an abstraction is born due DRY being misunderstood. Then the dev team expands this abstraction to fit in more and more special cases, usually with a boolean flag. This process keeps going until the abstraction becomes unmaintainable. One may wonder why this phenomenon happens, but it’s not hard if we think about it. Who in their right mind would suggest breaking down a working piece of code that’s been perfecly serving production traffic? If it works don't touch it is a mantra every developer mumbles in their sleep. Even if we want to correct the abstraction, it takes lots of time to convince the stakeholders of the benefits. If the abstraction is introduced by anothet team member, it may even be harder to argue, because not all people have the courage to admit their mistake, and forgo their ego. It’s usually faster, and less painful to simply create a branching condition to satisfy the new requirements. Consequently, an abomination, where different pieces of logic of different components live in the same abstraction, is born.

By now I hope that we all realize how destructive it could be by misunderstanding DRY, and blindly applying it to our code base. Instead of looking for duplication removal opportunities, we should drive towards open-closed. Sandi Metz has an excellent talk that briefly touches the topic of duplication and abstraction, which I include in the last part of this article. Furthermore, we shouldn’t be afraid to break up an abstraction when it’s no longer useful, and neither should we be scared of some duplications. Not all duplications are a manifestation of DRY violation.

On the other hand, it must be emphasized that I do not promote copying and pasting code freely. A nice tip, also shared by Sandi Metz, is to use the dupe tags whenever we have duplications but are not sure if they should be generalized. Any decent modern IDE should be able to help us locate all the tags quickly. When we gain more knowledge of the domain we’re working in, the good abstractions will naturally reveal themselves. Only then should we revise the dupe tags, and abstracting similar codes where it makes sense.

References