Grouping
Sentry has an extensive grouping system (public documentation) which can be customized on the client and on the server via fingerprint rules and stack trace rules.
This documentation attemps to explain how the system currently functions and what limitations exist presently with it.
On the most basic level when an event comes into Relay, Relay will already associate with the event the current version of the grouping configuration. This means that from the very first moment an event enters the infrastructure the decision on which version of the grouping algorithm has been made.
When the event makes its way into the core event processing system ready to be saved and a fingerprint hasn't been set by the client yet, three systems start operating:
- At first stack traces are processed through a system called normalize_stacktraces_for_groupingwhere the stack trace rules are applied to make a pass over the "in app" flags of the frames. This for instance allows the active grouping code to mark frames as outside of the app or within the scope of the app. The original value of thein_appflag is preserved in the event as well. This allows the system to "revert" to the original values if grouping rules have to run again (for instance for reprocessing or similar).
- Next the server side fingerprinting rules are run. These have the ability to override the default fingerprint that would otherwise be generated by the grouping code. The output of the fingerprinting code can either be a list of strings will be hashed into the fingerprint, or it can also include the {{ default }}placeholder in which case the original grouping code will still run and be folded into the fingerprint to further subdivide the group.
- Finally the actual grouping algorithm runs if and only if the fingerprint is not been set yet or it uses the special {{ default }}value.
It's important to know that the grouping algorithm can produce more than one fingerprint hash. These hashes are collected and associated with issues. There are two types of hashes that can be created:
- flat hashes: these are traditional hashes where all hashes are equal. If any of these hashes exists in a group it is associated with it and any hash not yet associated with the group is added.
- hierarchical hashes: these are secondary hashes that can be used to subdivide a group in the grouping tab.
Internally in Sentry issues are called "Groups" or "grouped messages" and represented by the Group model.
After the fingerprints were calculated Sentry makes a decision if an already existing group shall be reused or if a new group is created. This is done via the GroupHash model. Most importantly if a group must be created it will be created immediately. While the data model supports events without issues, the user interface does not really support this for a range of things that a user expects. This means that at the moment 100% of events are associated with a group.
When the group has been created or found, the event is associated with that group_id. This means that the event as it flows further through the system to make it's way towards snuba, is associated with that group which also means that the group is persisted in snuba along with the event.
Upon group creation, additional code runs such as the triggering of alerts, regression detection and more. It is thus relatively expensively to create a group due to the number of additional actions that can be triggered from it.
The system does not cope particularly well with merges and splits because the events in Snuba are generally considered immutable. When a user triggers a merge a task is issued that initiates this merge. A merge can also be partially undone, but a split fails to fully reconstruct the original state as some information (such as the original >90 day counts) were lost in the process. The merge is also expensive and frequent merges can cause a significant backlog on the snuba task queue.
Sentry applies some general principles to grouping in an attempt to group the right types of events together. There are however restrictions to the current system where Sentry has to find "the one group" as an event can only ever get to one. To understand how Sentry understands groups, it's important to understand that to find the right group one has to realize that the problem is fundamentally subjective and in fact depends where the source of the error is.
Sentry's primary way of grouping is the stack trace. Equivalent stack traces (with a matching error type) indicate the same error. There is however a fundamental variance to stack traces to actually creating a fingerprint of the stack trace has some challenges associated with it.
A stack trace consists of multiple frames and each frame contributes to the fingerprint. Which parts of it do and which ones do not depends quite a bit on the platform and on the different rules we apply to it.
In Python as an example each frame contributes module name, function name and context-line (that is the source code of the line where the frame pointer pointed with leading and trailing whitespace removed). The motivation here is the following: modules and functions are relatively coarse indicators and a function can often fail from different branches, so taking the source code into account in addition is less likely to over-group. This however also means that when a refactoring takes places in a line that does not change the functionality but the source code, it can cause a new line to be created unnecessarily. The other consequence of this is that we require source code to be available for grouping. This works well in Python as the Python SDK generally submits the source code along, but for instance we cannot use the same rule in C++ for instance where the availability of the source code is not guaranteed and one release can come with source, whereas another one might not.
On the other hand for JavaScript we need to apply different rules. First of all we are currently having challenges with determining a reliable function name due to limitations with the source maps. This can cause minified function names to show up at times which are unstable between builds. Because of this we instead feed module name, filename (that is the base name of the path only converted to lowercase) as well as the context-line. This again is risky as the context-line might slightly change over time. An additional complication with JavaScript is that the browser supplied stack trace is not guaranteed to be particularly stable. Different browsers have very different behavior in how the stack trace looks like when browser native functions are involved. For instance the use of Array.forEach can produce vastly different stack traces between browsers. Some will show Array.forEach as a stack frame, some will not. As such the grouping algorithm is trying it's very best to consistently discard frames which some other browsers cannot produce. However there are limitations and the same error can produce different stack traces from different browsers.
The native platforms are the most tricky to get right. Here we are working with limitations of the underlying debug information when creating stack traces. The first level of complication is that a natively compiled language is likely to inline source code into the calling function. One some platforms this will fundamentally change the available information. As an example a Microsoft compiler will provide the full demangled name for a non inlined function, but it only provides the local function name for an inlined function (that's a very simplified explanation, the actual difference is more nuanced). It's also that different compilers will mangle and format names very differently. This for instance can cause the very same error to fingerprint very differently when compiled and run on Linux vs macOS. Additionally source code is generally not available thus we are not relying on it. We thus largely only feed function names into the grouping algorithm with a lot of cleaning up being applied (eg: we remove generics, parameters and return values from the demangled function name before feeing it to the grouping code).
Because based on the fingerprint we can only create a single group we thus are trying quite hard to eliminate unnecessary noise between stacks. This is to a large degree achieved by removing entire frames from the stack for grouping. There are two ways by which frames are removed: they are either removed from grouping entirely or they are marked as "out of app" which means that they contain code which is unrelated to the application that the developer created. This for instance means that if you are using the Django framework we will mark frames from the framework as not application code which cause them to be "ignored" for grouping.
The difference between non application code and code entirely removed from grouping is that the former will still create a secondary hash which we also associate with groups. However because that hash contains "more information" than the other, it's generally never used for grouping except for a form of implied merge. To understand this better consider a stack trace with three frames "A1 B1 A2 A3". In the beginning they are all considered in app thus they are all feeding into the fingerprint [A1, B1, A2, A3]. At a later point either an SDK update or a change in the configuration on the server marks B1 as not in-app. The grouping algorithm if it were to fully ignore the B1 frame now would create a new hash which is not found on the existing groups and would create a new group. However because we still create the full hash anyways the new event that comes in would still find the already existing group. The hashes created are [A1, B1, A2, A3] as well as [A1, A2, A3]. Likewise if you later also remove A3 from in-app it would create the hashes [A1, B1, A2, A3] as well as now [A1, A2] and the first hash would still be a match against an already existing group.
Sentry at the moment feeds the entire stack into the group. There is a way to limit the number of frames contributed down to a smaller set by setting a maximum number of frames that should be considered. This has a hypothetical advantage when working with different paths that lead to a bug but have the consequence that very large groups can be created. Creating larger groups is tricker in Sentry as without hierarchical grouping there are no good ways to dive into the different stacks in an issue.
When a stack trace is unavailable the system needs to fall back to some sort of alternative grouping method. The fallback is what we call "message based grouping" and it's a pretty limited method. We take the first line of the message and apply some clean up logic. For instance if numbers are encountered they are replaced by a static placeholder. Same with known timestamps, UUIDs and similar things. However in many cases the source of these strings is impossible to clean up so fallback grouping is very likely to create a high number of independent groups.
Sentry has a general tendency to group different paths towards an issue separarately. Even if the grouping algorithm is perfect, it will consider different ways to end up at a bug independent problems. Take a hypothetical function get_current_user. Let's imagine this function has a bug where it now starts failing with a DataConsistencyError exception. If we consider that this function is used in a lot of different places of the application, each of these callers will now create a different group. We can thus think of grouping as a problem of the source of the error. At any point the question can be asked if the source of the error is "how we call a function" (caller error) or "in the function" (callee error). Making this decision is impossible to make in a general sense, but over time it can be easier to make this call.
Sentry has an experimental grouping system called "hierarchical" grouping where we allow diving into the different paths towards a bug by preferring to over-group and then provide a grouping tab that can show the different paths by which we came to the error. However the limitation of this with the current group system is that since the group has been created, there is now way to split out, you can end up with a much larger group than you desired.
The grouping system as implemented relates very close to the work flow that is established with the groups that are created. It is the creator of the groups and as the creator, it drives a big part of the user experience that derives from it. If it were to create a single issue per event, or a single issue for all events nothing in Sentry would properly function any more. It is thus our first point of balancing the quality of the workflow. Unfortunately with the tools available today there we are sitting in a pretty tough spot at the time of grouping. If we get it wrong, the user is likely stuck.
The following general paths forward are current envisioned:
The consequences of making too many groups today are alert spam and the inability to work with multiple issues at once. If Sentry were to no longer be alerting on all new groups and tools existed to work across multiple groups more opportunies arise. In particular the grouping algorithm could continue to just fingerprint the stack trace but a secondary process could come in peridically and sweep up related fingerprints into a larger group. If we take the get_current_user example the creation of 50 independent groups is not much of an issue if no alerts are fired. If after 5 minute the system detected that they are in fact all very related (eg: the bug is "in get_current_user") it could leave the 50 generated groups alone but create a new group that links the other 50 groups, hide/deemphasize the individual 50 groups in the UI and let the user work with the larger group instead.
We also have the hierarchical grouping prototype which tries to group on fewer inputs. This system has some limitations but it's less likely to create many groups. Unfortunately the user experience is not fleshed out as working with parts of the group is not an option.
Our documentation is open source and available on GitHub. Your contributions are welcome, whether fixing a typo (drat!) or suggesting an update ("yeah, this would be better").