CodeFusion

CodeFusion is a tool that helps coding diagnoses and medical phrases using ICD or other WHOFIC classifications.

The tool can read Excel (.xlsx) and tab-separated text files (.txt) and matches the text found in these files to codes and URIs

It is a command-line tool that can be run from a terminal window using parameters that are explained below. Alternatively, you could provide the parameters in a file named parameters.json in the same folder as the executable. A sample parameters.json file is provided together with the executable.

IMPORTANT! It is recommended to have the automated coding reviewed by a human coder especially when the matching level is not GoodMatch (see below MatchLevel at the Output File section).

Downloading the Tool

The most recent version of the tool is version:1.0.3. The tool can be downloaded from the following links. Please use the operating system and architecture that matches your computer.

After downloading the zip file, you need to unzip it to a folder.

When using in Mac or Linux, you need to make the program file (CodeFusion) executable. You can do this by running the following command in the terminal window. chmod +x CodeFusion

Running the tool

You can run CodeFusion in two ways. Running with commandline parameters requires using the terminal. Using parameters file requires editing the parameters located in a file at the same folder as the CodeFusion program and running CodeFusion. The set of parameters that you can use to configure CodeFusion are the same in both modes.

Running with command line parameters

You need to open a terminal window and navigate to the folder where the executable is located. Then you can run the tool with the following parameters. Most of these parameters are optional. If not provided, the default values will be used. The inputFile is mandatory.

Parameter
Explanation
--inputFile This is the input file to be used. It could be an excel (.xlsx) or a tab-separated text file (.txt)
--columnNo the column number in the input file that contains the text to be matched. The default is 1
--fileContainsHeader true if the input file contains a header row. If true, the first row will be ignored during the processing. The default is true
--version this is the version of the classification to be used.
Possible values are: 2025, 2024, 2023, etc. for the releases. beta or daily could be used to use the daily generated version that is available in the WHOFIC maintenance platform. If not specified, the latest release version will be used
--source specifies whether a particular linearization or foundation to be used during matching. "Allowed values are MMS, ICF, ICHI or foundation. If not provided MMS will be used.
--language language to be used. The default is English. 2 character language codes could be used to specify the language. You may find available versions from this link
--subtreeFilter Comma separated list of foundation URIs and their descendants to be searched. For example, having http://id.who.int/icd/entity/118383231 will search only the Mental and behavioural disorders and its descendants. If not provided, MMS, ICHI and ICF uses a default subtreeFilter. In this case the system will which search the classifcation without the extension codes. MMS excludes Traditional Medicine and V chapters as well
--includeScoreInOutput score is a value between 0 and 1 that indicates the similarity between the input text and the matched term. If you want to see the score in the output, set this to true. The default is false
--matchThreshold the minimum score to be included in the output. The default is 0.43
--exitWhenFinished By default, the program waits for a key press before exiting once it has completed its task. To change this behavior, set the exitOnCompletion field to true. This will allow the program to exit automatically after finishing the task, without waiting for a key press
--useFreePostcoordinationMatching By default, CodeFusion can find postcoordination combinations suggested in the classification for some of the axes. This setting, makes CodeFusion look for all postcoordination combinations. See Postcoordination section Matching Postcoordination combinations for details
--mappingMode This parameter enables mapping mode. When enabled, in addition to the standard output CodeFusion generates a mapping output. See Using CodeFusion for Mappings for more details on using CodeFusion for generating mapping suggestions
--idColumn This parameter is required if mapping mode is used. It tells CodeFusion which column of the input file contains the identifiers for the external terminology/classification. See Using CodeFusion for Mappings for more details
--termTypeColumnNo Optional, can be used if mappingMode is true. It is the column number in the input file that contains the term type (e.g. "Title", "Synonym", etc.). It's optional. If provided, term types are included in the generated mapping file.

Examples

.\CodeFusion.exe --inputFile "c:/111/a.xlsx"

In the example above, the latest release version and MMS will be used. The column number is 1 and the file contains a header row

.\CodeFusion.exe --inputFile "c:/111/a.xlsx" --columnNo 2 --fileContainsHeader true --version 2022 --source MMS --language es 

Data is in column 2 and the file contains a header row. The version of the classification to be used is the 2022 release and the language is Spanish

Running with parameters.json file

If you include a file named parameters.json at the same location as the program, you could run the tool without command line parameters. The parameters in the parameters.json file will be used. You may see a sample parameters file below. (also available in the download zip file). The set of parameters are the same as the ones explained above.

Note that inputFile needs to use forward slash / as the folder separator. You need to use this even when using it in Windows.

{
  "inputFile": "c:/111/a.xlsx",
  "columnNo": 2,
  "fileContainsHeader": true,
  "version": "2025"
}

An example with all possible parameters

{
  "inputFile": "c:/111/b.xlsx",
  "columnNo": 2,
  "fileContainsHeader": true,
  "version": "2023",
  "source": "MMS",
  "language": "fr",
  "subtreeFilter": "http://id.who.int/icd/entity/1435254666,http://id.who.int/icd/entity/1630407678",
  "includeScoreInOutput": true,
  "matchThreshold": 0.3
}

Output File

The output file is saved in the same folder as the source file. The name of the output file is the same as the source file with the word -checked appended to it. For example, if the source file is c:/111/a.xlsx the output file will be c:/111/a_checked.xlsx

If the input is an Excel the output will be an Excel file.

If the input is a tab-separated text file, two outputs will be created. One of them will be a tab-separated text file as the input file and the other one will be an Excel file.

The output file contains all of the information in the source file with the additional columns about the matching

Output columns Explanation
MatchLevel This shows how good the text in the document matches the text found in the classification
GoodMatch: shows a very good match. Every word looked or an equivalent is found and there are no additional words other than the words that could be ignored
GoodMatchWithFreePostcoordination: when useFreePostcoordinationMatching option is used, CodeFusion can find multiple entities jointly matching the text that is looked up entity. For more information see Matching Postcoordination combinations
MatchHasAdditionalWords: shows a match where all words that were included in the text were found in the matching text but the matching text has some additional words.
FlexiMatch: shows a match is found using the Flexible search algorithm. Some words used in the text are not included in the search result.
NoMatch: no match is found that has a higher score than requested
Score If includeScoreInOutput is used, this field shows the score of the match. The score is a value between 0 and 1. The higher the score the better the match.
MatchType This shows the type of the match. It is available only if the source is a linearization. i.e. not available when using foundation
Real: shows that the match is from an entity that is in the linearization
UnderShoreline: shows the match is found under the linearization boundary
UnderShoreLineLogicallyDefined: Again, the entity is found under the linearization boundary but since it has a logical definition, the system will provide a code combination to capture the detail
PostCoordinationCombination: The result is not found in the foundation but can be represented as a postcoordination combination using suggested or required postcoordination
BestMatchPhrase The best matching phrase found in the classification.
LinearizationURI The Linearization URI in the linearization for the matching entity. This is not available when the source is foundation
Code The Code in the linearization for the the matching entity. This is not available when the source is foundation
FoundationURI The Foundation URI in the linearization for the matching entity

Matching Postcoordination combinations

Default postcoordination matching

When using postcoordination with a linearization such as MMS, suggested matches may include postcoordinated combinations. However, there are certain constraints to consider:

  • These combinations are limited to a specific subset of axes. For MMS, this includes axes like laterality, specific anatomy, course, infectious agent, causing condition, and severity (including stage, grade, etc.).
  • Only the axes listed among the suggested postcoordination options for a given entity are used. These can be viewed when browsing the entity in the ICD browser

Postcoordination combinations are not provided when the source is set to foundation. Please not that CodeFusion always provide matching information at the granularity of Foundation even when the source is set to a linearization.

Free Postcoordination Matching

When the option of useFreePostcoordinationMatching is set, then CodeFusion tries to find any postcoordination combination without the contraints mentioned above.

Using CodeFusion for Mappings

By default, CodeFusion attempts to match every phrase provided in the input file. However, this method isn't optimal for generating mapping suggestions, as it overlooks concept synonyms from the external system. For example, when mapping an external concept to ICD, a single entity may be represented by multiple synonymous terms in the external system and matching all of them is not necessary to produce a meaningful suggestion.

To leverage synonyms effectively, CodeFusion needs to identify which terms correspond to the same concept. This requires including the identifier of each external concept in a dedicated column within the input file, and specifying the column number using the idColumn parameter.

Enabling the mappingMode parameter prompts CodeFusion to generate an additional output file. This file organizes mappings by the individual concepts from the external terminology. In the mapping output, each concept is represented by a main mapping row followed by several rows detailing the corresponding matches.

IMPORTANT: The output mapping file always presents mappings at the granularity level of the Foundation. However, we recommend specifying a source parameter using a linearization such as MMS. This allows the tool to apply the postcoordination rules defined by the linearization.

Using Foundation as the source will result in mappings that exclude postcoordination.

When generating mapping suggestions, enabling the useFreePostcoordinationMatching option can improve the quality of results—particularly for precoordinated concepts that can only be matched through postcoordinated expressions.

Mapping Output File

When CodeFusion is run in mappingMode, it generates the usual standard output and additionally creates a separate mapping file, which is saved in both tab-separated .txt and Excel formats. The name of the output file is the same as the source file with the word -mapping appended to it

Columns of the mapping output file

Output columns
Explanation
RowType The value in this column is either Mapping, Details or OtherMatches-SameFoundationUri. Rows marked as Mapping contain the actual mappings generated by CodeFusion. The Details rows that follow each Mapping row show the individual term matches that contributed to that mapping. Certain columns in the file are relevant only for Mapping rows (indicated as "M" in this table), while others apply exclusively to Details rows (marked as "D"). The OtherMatches-SameFoundationUri rows (if they exist) show other conflicting matches where concepts with different external system ids match the same foundation entity. In such cases the MappingMatchLevel at the Mapping row reports a conflict
MappedSystemId M This is the identifier of the external terminology/clasification that is being mapped.
FoundationUri M Foundation URI of the mapped foundation entity
MappingMatchLevel M Shows how good the match is. Possible values and its meanings are explained below
bestScore M Best score among different term matches
LabelBeingMathched D Label from the external terminology that is being matched
BestMatchingPhrase D Best matching phrase in the ICD that matches the labelbeingmatched
MatchLevel D MatchLevel, as explained in output File section
score D as explained in output File section
IsBestMatchATitle D Whether the best matching phrase is a title in ICD or not
MatchType D as explained in output File section
Code D Linearization code for the mapped linearization entity (e.g. ICD code). Provided only if the source is a linearization such as MMS.
Values for MappingMatchLevel
  • Good Matches Group::

    • GoodMatch : One or more good matches were found for the external entity and they all point to the same foundation entity
    • GoodMatch_MultiMatchWithSingleSimpleMatch: There are multiple good matches for different synonyms and these matches point to the same foundation entity. Some of them matches the entity without postcoordination and some with additional postcoordination. This generally happes when useFreePostcoordinationMatching option is used and the additonal postcoordination in these cases are generally redundant.
    • GoodMatch_MultiMatchUsingTheSimplest: This is same as above but all matches have postcoordination. Part of the postcoordination is shared. In this case we use simplest combination as all of the matches are good matches. This generally happes when useFreePostcoordinationMatching option is used and the additonal postcoordination in these cases are generally redundant.
  • OtherMatch : None of the synonyms have a GoodMatch but some has other types of matches.

  • NoMatch : None of the synonyms have any match
  • Conflicts : In some cases conflicting matches can be found among the synonyms
    • ConflictingGoodMatch_MultipleFoundationMatchesForSameExternalConcept: Conflicts where synonyms of the same external concept matches different foundation entities
      • .._SimpleFoundationConcepts: Conflicting matches are all simple concepts
      • .._PostcoordinatedFoundationConcepts: Conflicting matches are all postcoordinated concepts
      • .._MixedFoundationConcepts: Some of the conflicting matches are simple concepts and some are postcoordinated concepts
    • ConflictingGoodMatch_SameFoundationMatchesMultipleExternalConcept: Conflicts where synonyms of the same foundation entity matches different external concepts