Functions That Know Too Much
I try to avoid writing functions that know too much. What is a “function that knows too much”? If I had to write a Platonic definition, it’s a function which concerns itself too much with the accidental1 qualities of the problem it is trying to solve.
Let’s work through an example
// template.ts
interface Template {
description: string;
user: string;
html: string;
createdAt: Date;
updatedAt: Date;
}
// utils.ts
import { Template } from './templates';
const escapeCurlyBraces = (template: Template): void => {
template.html = template.html.replace(/\{/g, '\\{').replace(/\}/g, '\\}');
};
We can see that this function operates exclusively on the html
string
contained within the Template
interface, but it requires a Template
to be
passed in order to be used. This is not ideal; not only is it a hindrance for
reuse/testing (we’ll probably need to escape curly braces in contexts where we
don’t have a template), since this operates in-place, it’s unclear to the caller
what other fields of Template
get modified.
This is easily remedied. Just write a function that receives a string and returns a string.
const escapeCurlyBraces = (str: string): string => {
return str.replace(/\{/g, '\\{').replace(/\}/g, '\\}');
};
and if we find ourselves doing
template.html = escapeCurlyBraces(template.html)
frequently, we can add the
following function
const escapeTemplateCurlyBracesInPlace = (template: Template): void => {
template.html = escapeCurlyBraces(template.html);
};
Now this function clearly acknowledges that it’s operating on the template in
place and we still have access to the underlying escapeCurlyBraces
whenever we
need it.
Let’s look at a less obvious example
const MAX_REGEX_CHARS = 10_000;
// PROBLEM:
// Given an array of ID prefixes,
// create regexes that match any of the given ID prefixes.
// Each regex can be at most 10K characters long, so split up the
// batches appropriately.
const batchIDPrefixRegexes = (idPrefixes: string[]): RegExp[] => {
const res: RegExp[] = [];
let curLength = 0;
const curParts: string[] = [];
const flush = () => {
if (curParts.length === 0) {
return;
}
// Since these are just prefixes, we want to match from the start
// and use a non-capturing group.
res.push(new RegExp(curParts.map((v) => `(?:^${v})`).join('|')));
curLength = 0;
};
for (const [index, id] of idPrefixes.entries()) {
if (curLength + id.length >= MAX_REGEX_CHARS) {
flush();
}
curParts.push(id);
const isLastPrefix = index === idPrefixes.length - 1;
// The 4 is because of the parens and non-capturing
// specifier. The 5 is with the pipe included.
const additionalChars = isLastPrefix ? 4 : 5;
curLength += additionalChars + id.length;
}
flush();
return res;
};
Let’s enumerate everything this function knows:
- The caller is supplying ID prefixes, so we want to match with the start of an ID
- They want to use non-capturing groups
- Each batch regex can only have
MAX_REGEX_CHARS
characters - They want to receive regexes as a result
As it stands, this function is perfectly fine. It solves the problem it’s trying to solve, it’s easy to use, and it’s readable. It’s important to remember that this is our goal, not writing ostensibly reusable code.
That being said, let’s say that we now have a requirement to perform case-insensitive matching on complete names instead of IDs in some cases. Our first instinct might be to add a parameter to distinguish the two:
const batchRegexes = (inputs: string[], idPrefixes: boolean): RegExp[] => {
// Same as above...
const flush = () => {
if (curParts.length === 0) {
return;
}
if (idPrefixes) {
res.push(new RegExp(curParts.map((v) => `(?:^${v})`).join('|')));
} else {
res.push(
new RegExp(curParts.map((v) => `(?:^${v}\$)`).join('|'), 'i')
);
}
curLength = 0;
};
for (const [index, id] of inputs.entries()) {
// Same as above...
const additionalChars =
(idPrefixes ? 5 : 4) + (index < inputs.length - 1 ? 1 : 0);
// Same as above...
}
};
But now we have a function that masquerades as something general but is actually
just solving two very specific problems in an inflexible way. Moreover, the
caller now has to divine what the difference in behavior is when they pass along
false
for idPrefixes
.
How can we improve this? Let’s zoom in on the branch
if (idPrefixes) {
res.push(new RegExp(curParts.map((v) => `(?:^${v})`).join('|')));
} else {
res.push(new RegExp(curParts.map((v) => `(?:^${v}\$)`).join('|'), 'i'));
}
There are two major additions in the else
branch:
- The regex pattern matches the end of the string as well via
\$
- An option is specified to make the regex case insensitive
If we look back at our prior list of what the function knows, these relate to:
- The caller is supplying ID prefixes, so we want to match with the start of an ID
- They want to use non-capturing groups
- They want to receive regexes as a result
Seeing as how changing the requirement slightly invalidated most of these points, we can make an educated guess that what really matters is the batching aspect of this function.
Let’s rewrite it accordingly:
const batchRegexPatterns = (patterns: string[]): string[] => {
// Same as above...
const flush = () => {
if (curParts.length === 0) {
return;
}
res.push(curParts.map((v) => `(${v})`).join('|'));
curLength = 0;
};
for (const [index, id] of ids.entries()) {
// Same as above...
// Now we just account for the parens and pipe
const additionalChars = index === ids.length - 1 ? 2 : 3;
// Same as above...
}
};
At a glance, it just looks like we got rid of key functionality; that’s because we did. The function no longer takes into account whether we’re working with ID prefixes, or some other type of pattern. And that’s the point.
Now, if somebody wanted to use it for ID prefixes, they can do so as follows:
const idRegexes = batchRegexPatterns(
idPrefixes.map((prefix) => `?:^${prefix}`)
).map((pattern) => new RegExp(pattern));
And as follows for case-insensitive exact matches on names:
const nameRegexes = batchRegexPatterns(names.map((name) => `?:^${name}`)).map(
(pattern) => new RegExp(pattern, 'i')
);
So what does our new function know?
- Each batch regex can only have
MAX_REGEX_CHARS
characters - The caller wants to receive strings with the regex patterns as a result
With these assumptions in place, the function is both simpler and more flexible, and the calling code is more explicit about its behaviour.
Let’s look at another quick example.
const createWidget = async (params: CreateWidgetParams) => {
// You can imagine we're doing a bunch of database round trips here.
await validateCreateWidgetParams(params);
return await db.create(params);
};
Again, this is seemingly innocuous, but there’s one assumption that jumps out at me: we’re only going to be creating one widget at a time. In my experience, this is rarely the case.
This is a function that knows too much. Assuming that we’ll only be creating one
widget at a time is a stronger assumption than assuming we’ll create any
number of them. Sure, you can call this function in a loop, or even wrap many
concurrent invocations in a Promise.all
, but you’ll be paying the cost of the
database round trips for each one. We can argue about the elegance of our
batchRegexPatterns
function above all day, but we can’t argue the relatively
large performance hit we take with createWidget
here for batch use cases.
Conclusion
It pays to evaluate what assumptions your code makes and see if you can modify them to fit your problem better. Doing this in a code review setting is particularly effective. Over time, you get a feel for the right assumptions and you can spend less time refactoring and more time getting stuff done.
-
https://plato.stanford.edu/entries/essential-accidental/ ↩