Search

Magic [probably] behind Hex-Rays

IDA has become the standard for modern disassemblers used in the reverse engineering community. Hex-Rays is a popular plugin to IDA which further simplifies the binary analysis by decompiling native code into a C-like pseudocode. However, Hex-Rays’s strength goes beyond its decompilation quality. It is its overall seamless integration with the interactive disassembler that makes it an invaluable reversing tool.

This article demonstrates how to use the extensive, but often not self-evident, functionality provided by IDA SDK in order to put together a plugin with Hex-Rays-like capabilities. In fact, many of the possibilities discussed here are probably not used outside of Hex-Rays itself.

When you are “grepping” GitHub hoping to find a real-world-usage example of IDA SDK function.

The scope of this article is limited to the GUI-related features within IDA SDK, not the decompilation itself. The article expects the reader has a good knowledge of IDA plugin writing. It focuses on the advanced features it aims to introduce. The basic functionality comprehensible from IDA SDK’s headers or examples may be omitted.

Both IDA and IDA SDK used in this article are of version 7.5. The article discusses the fundamental principles to accomplish the given tasks. The complete working example can be found in an associated GitHub repository. The repository may be improved or updated to work with further IDA SDK versions. Examples in this article will not!

What can Hex-Rays do?

Hex-Rays is a native IDA plugin written in C++ by the guys behind IDA itself. As such, it perfectly uses IDA SDK to gain the following capabilities:

  • Syntax highlighted C-like pseudocode in an IDA-native subview.
  • Displayed content associated with the related disassembly addresses.
  • Cursor-contex-sensitive actions.
  • Utilization of IDA’s navigation mechanisms.
  • Synchronization with IDA disassembly subview.
  • Modification of IDA disassembly.

Demo

Step by step, we will write a plugin implementing all the interactions listed above.

Only the high-level principles are discussed here, see the associated GitHub repository for the complete source code.

1. Decompiler

Implementing a decompiler itself is outside the scope of this article. Therefore, we are going to mock it. All we need is an interface capable of decompiling an address into a function:

class Decompiler
{
public:
   static Function* decompile(ea_t ea);
};Code language: C++ (cpp)

The demo plugin in the repository is able to decompile only main() and ack() functions from the provided ack.x86.gcc.O0.g.elf binary. Their decompiled code is hardcoded in the decompiler module. This is sufficient for our demonstration purposes.

2. YX coordinates

Before we look into function representation, we need to introduce two important building blocks. The first one being YX coordinates:

struct YX
{
   /// lines
   std::size_t y = YX::starting_y;
   /// columns
   std::size_t x = YX::starting_x;

   inline static const std::size_t starting_y = 1;
   inline static const std::size_t starting_x = 0;

   YX(std::size_t y = starting_y, std::size_t x = starting_x);
};Code language: C++ (cpp)

These objects will represent a position into an IDA custom viewer. Because we are going to display a source code, we index lines (Y) from 1 and columns (X) from 0.

3. Tokens

Another piece essential to build functions is Token:

struct Token
{
   enum class Kind
   {
      NEW_LINE = 0,
      WHITE_SPACE,
      PUNCTUATION,
      OPERATOR,
      ID_VAR,
      ID_MEM,
      ID_LAB,
      ID_FNC,
      ID_ARG,
      KEYWORD,
      TYPE,
      PREPROCESSOR,
      INCLUDE,
      LITERAL_BOOL,
      LITERAL_INT,
      LITERAL_FP,
      LITERAL_STR,
      LITERAL_SYM,
      LITERAL_PTR,
      COMMENT,
   };

   Kind kind;
   std::string value;
   ea_t ea;

   const std::string& getColorTag() const;
};Code language: C++ (cpp)

It represents one lexical unit in the decompiled source code. Each such unit has a kind (e.g. ID_FNC), value (e.g. main), and an associated disassembly address (e.g. 0x8048577). See this wiki page for the detailed explanation. There is also a method returning an IDA color tag associated with the Token.

4. Functions

Finally, the representation of a decompiled function:

class Function
{
public:
   const std::string& getName() const;
   ea_t getStart() const;
   ea_t getEnd() const;
   /// Token at YX.
   const Token* getToken(YX yx) const;

   /// YX of the first token.
   YX min_yx() const;
   /// YX of the last token.
   YX max_yx() const;
   /// YX of the token before the token on the given YX.
   YX prev_yx(YX yx) const;
   /// YX of the token after the token on the given YX.
   YX next_yx(YX yx) const;
   /// [Starting] YX of the token which contains the given YX.
   YX adjust_yx(YX yx) const;
   /// Entire colored line containing the given YX.
   /// I.e. concatenation of all the tokens with y == yx.y
   std::string line_yx(YX yx) const;
   /// Address of the given YX.
   ea_t yx_2_ea(YX yx) const;
   /// Addresses of all the XYs with y == yx.y
   std::set<ea_t> yx_2_eas(YX yx) const;
   /// [The first] XY with the given address.
   YX ea_2_yx(ea_t ea) const;
   /// Is address inside this function?
   bool ea_inside(ea_t ea) const;

   /// Lines with associated addresses.
   std::vector<std::pair<std::string, ea_t>> toLines() const;

private:
   std::string _name;
   ea_t _start;
   ea_t _end;
   std::map<YX, Token> _tokens;
   /// Multiple YXs can be associated with the same address.
   /// This stores the first such XY.
   std::map<ea_t, YX> _ea2yx;
};Code language: C++ (cpp)

A function is a list of tokens with a name, start address, and end address. Tokens are indexed by their [starting] YX coordinates. YX coordinates are indexed by their disassembly addresses. Apart from the simple getters, there is a bunch of methods dealing with coordinates, addresses, and lines. These will get quite handy in a moment.

5. Places

Places (i.e. objects derived from abstract place_t) denote locations of data displayed in viewers. IDA SDK defines several classes that could be used to display certain data. simpleline_place_t for string lines, idaplace_t for disassembly locations, hexplace_t for hex dump lines, etc.

If none of these is suitable for the user’s application, the plugin author can implement a custom derivation of the class. In our case, we create a demo_place_t class suited for representing locations and displaying tokens of decompiled functions:

class demo_place_t : public place_t
{
private:
   Function* _fnc = nullptr;
   YX _yx;

public:
   demo_place_t(Function* fnc, YX yx);
   YX yx() const;
   std::size_t y() const;
   std::size_t x() const;
   const Token* token() const;
   Function* fnc() const;

// Inherited from place_t.
//
public:
   /// Generate short location description used @ status bar.
   /// use: toea(), yx()
   void idaapi print(qstring* out_buf, void* ud) const override;
   /// Map the location to a number.
   /// use: y()
   uval_t idaapi touval(void* ud) const override;
   /// Clone the location.
   place_t* idaapi clone(void) const override;
   /// Copy the specified location object to the current object.
   void idaapi copyfrom(const place_t* from) override;
   /// Map a number to a location.
   place_t* idaapi makeplace(
      void* ud, 
      uval_t y, 
      int lnnum) const override;
   /// Compare two locations.
   /// use: yx()
   int idaapi compare2(const place_t* t2, void* ud) const override;
   /// Adjust the current location to point @ displayable obj.
   void idaapi adjust(void* ud) override;
   /// Move to the previous displayable location.
   /// use: Function::prev_yx()
   bool idaapi prev(void* ud) override;
   /// Move to the next displayable location.
   /// use: Function::next_yx()
   bool idaapi next(void* ud) override;
   /// Are we at the first displayable object?
   /// use: Function::min_yx()
   bool idaapi beginning(void* ud) const override;
   /// Are we at the last displayable object?
   /// use: Function::max_yx()
   bool idaapi ending(void* ud) const override;
   /// Generate text lines for the current location.
   /// use:: Function::line_yx()
   int idaapi generate(
      qstrvec_t* out,
      int* out_deflnnum,
      color_t* out_pfx_color,
      bgcolor_t* out_bgcolor,
      void* ud,
      int maxsize) const override;
   /// Serialize this instance.
   void idaapi serialize(bytevec_t* out) const override;
   /// De-serialize into this instance.
   bool idaapi deserialize(
      const uchar** pptr, 
      const uchar* end) override;
   /// Get the place's ID.
   int idaapi id() const override;
   /// Get this place type name.
   const char* idaapi name() const override;
   /// Map the location to an ea_t.
   /// use: Function::yx_2_ea()
   ea_t idaapi toea() const override;
   /// Rebase the place instance.
   bool idaapi rebase(const segm_move_infos_t&) override;
   /// Visit this place, possibly 'unhiding' a section of text.
   place_t* idaapi enter(uint32*) const override;
   /// Leave this place, possibly 'hiding' a section of text.
   void idaapi leave(uint32) const override;
}Code language: C++ (cpp)

YX is the location component, Function is the data component. Using YX, we can easily extract the information needed to implement the place_t interface from the associated Function. demo_place_t is both YX-aware and EA-aware location, something that none of the existing SDK places is.

User defined places need to be registered. We use the PCF_EA_CAPABLE flag to indicate our place is EA-aware. This will enable some handy out-of-the-box features later. Also, since SDK 7.5, all new plugins should use PCF_MAKEPLACE_ALLOCATES flag:

static demo_place_t _template(nullptr, YX());
demo_place_t::ID = register_place_class(
   &_template,
   PCF_EA_CAPABLE | PCF_MAKEPLACE_ALLOCATES,
   &PLUGIN
);Code language: C++ (cpp)

Unfortunately, place_t objects are by default considered to be just a line cursors (i.e. Y-sensitive). Special cases that are also X-sensitive need to implement the custom_viewer_adjust_place_t callback and fine-tune the X location. Otherwise, moving the cursor horizontally in a line will not change the X coordinate, nor the corresponding address:

/// Called whenever the user moves the cursor around.
///
/// See custom_viewer_adjust_place_t 
///
void idaapi cv_adjust_place(
   TWidget* v, 
   lochist_entry_t* loc, 
   void* ud)
{
   // Set X coordinate in loc->place()
   // using loc->renderer_info().pos.cx
}Code language: C++ (cpp)

In cases like these, SDK also recommends to implement a callback named custom_viewer_get_place_xcoord_t. It determines if two places are on the same line, and should prevent unnecessary viewer refreshes:

/// Does the line pointed to by pline include pitem,
/// and if so at what X coordinate?
///
/// See custom_viewer_get_place_xcoord_t
///
int idaapi cv_get_place_xcoord(
   TWidget* v,
   const place_t* pline,
   const place_t* pitem,
   void* ud)
{
   // Compare Y coordinates of pline and pitem.
}Code language: C++ (cpp)

6. Code viewer

Now we have everything to create a viewer displaying decompiled functions in plugins Context::run() method:

ea_t ea = get_screen_ea();
fnc = Decompiler::decompile(ea);

demo_place_t min(fnc, fnc->min_yx());
demo_place_t max(fnc, fnc->max_yx());
demo_place_t cur(fnc, fnc->ea_2_yx(ea));

custViewer = create_custom_viewer(
   title,        // title
   &min,         // minplace
   &max,         // maxplace
   &cur,         // curplace
   nullptr,      // rinfo
   &this,        // ud
   &ui_handlers, // handlers
   &this,        // cvhandlers_ud
   nullptr       // parent widget
);
codeViewer = create_code_viewer(custViewer);
set_code_viewer_is_source(codeViewer);
display_widget(codeViewer, WOPN_DP_TAB | WOPN_RESTORE);Code language: C++ (cpp)

ui_handlers is a set of custom viewer handlers:

static const custom_viewer_handlers_t ui_handlers(
   cv_keyboard,          // keyboard
   nullptr,              // popup
   nullptr,              // mouse_moved
   nullptr,              // click
   cv_double,            // dblclick
   nullptr,              // current position change
   nullptr,              // close
   nullptr,              // help
   cv_adjust_place,      // adjust_place
   cv_get_place_xcoord,  // get_place_xcoord
   cv_location_changed,  // location_changed
   nullptr               // can_navigate
);Code language: C++ (cpp)

We have already discussed cv_adjust_place and cv_get_place_xcoord. The other handlers will be examined in a moment.

With the code presented so far we get a plugin capable of displaying colored decompiled functions, reacting on both Y and X movements, and aware of disassembly addresses associated with the shown lexical units.

7. Place conversion

SDK allows to implement another powerful mechanism on custom places – location conversions. These are used for view synchronization, and if a custom place is PCF_EA_CAPABLE (which our is), then also for address navigation:

/// Register location converter.
///
static demo_place_t _template(nullptr, YX());
static idaplace_t _idaplace;
register_loc_converter(
   _template.name(),
   _idaplace.name(),
   place_converter
);Code language: C++ (cpp)
/// Implement location converter.
///
lecvt_code_t idaapi place_converter(
        lochist_entry_t* dst,
        const lochist_entry_t& src,
        TWidget* view)
{
   // demo_place_t -> idaplace_t
   if (src.place()->name() == _template.name())
   {
      // set dst->set_place() according to src.place()
   }
   // idaplace_t -> demo_place_t
   else if (src.place()->name() == _idaplace.name())
   {
      auto idaEa = src.place()->toea();
      demo_place_t* current = get_custom_viewer_place(/*...*/);
      if (current->fnc()->ea_inside(idaEa))
      {
         // set new place in the currently displayed function
      }
      else if (Function* fnc = Decompiler::decompile(idaEa))
      {
         // set new place in the newly decompiled function
      }
   }
}Code language: C++ (cpp)

Converter’s only job is to set the new place in the destination entry. If such a place is in the newly decompiled function, it is not enough. We also need to switch viewer’s content to this function. We do so in the location changed handler:

/// See custom_viewer_location_changed_t
///
void idaapi cv_location_changed(
        TWidget* v,
        const lochist_entry_t* was,
        const lochist_entry_t* now,
        const locchange_md_t& md,
        void* ud)
{
   Context* ctx = static_cast<Context*>(ud);
   auto* oldp = dynamic_cast<const demo_place_t*>(was->place());
   auto* newp = dynamic_cast<const demo_place_t*>(now->place());

   if (oldp->fnc() != newp->fnc())
   {
      demo_place_t min(newp->fnc(), newp->fnc()->min_yx());
      demo_place_t max(newp->fnc(), newp->fnc()->max_yx());
      set_custom_viewer_range(ctx->custViewer, &min, &max);
      ctx->fnc = newp->fnc();
   }
}Code language: C++ (cpp)

Changing the content is as easy as setting a new viewer’s range.

With these additions, we can now synchronize our view with IDA disassembly view.

Also, because our view is EA-capable, IDA automatically enables some more features like goto-address navigation (G) or navigation toolbar synchronization.

8. Contex-sensitive actions

We demonstrate these in two different but similar scenarios. First, we want to decompile and display a new function, if we double click on its call statement. The catch is, we want to do it only for functions and not for other lexical elements:

/// Implement double-click callback.
///
bool idaapi cv_double(TWidget* cv, int shift, void* ud)
{
   demo_place_t* place = get_custom_viewer_place(/*...*/);
   auto* token = place->token();
   if (token->kind != Token::Kind::ID_FNC)
   {
      return false;
   }
   func_t* fnc = find_fnc_by_name(token->value);
   // Jumping to function's start address triggers
   // the place converter - i.e. possible decompilation.
   jumpto(fnc->start_ea, -1, UIJMP_ACTIVATE);
}Code language: C++ (cpp)

Second, we want to have different right-click-popup options depending on the kind of token under the cursor:

/// Make our plugin class an event listener.
///
class Context : public plugmod_t, public event_listener_t
{
public:
   virtual ssize_t idaapi on_event(ssize_t code, va_list va) override;
}Code language: C++ (cpp)
/// Hook our class to user interface events.
///
bool idaapi Context::run(size_t)
{
   hook_event_listener(HT_UI, this);
}Code language: C++ (cpp)
/// Implement the event listener callback.
///
ssize_t idaapi Context::on_event(int code, va_list va)
{
   switch (code)
   {
      // Attach our actions to context menu.
      //
      case ui_populating_widget_popup:
      {
         demo_place_t* place = get_custom_viewer_place(/*...*/);
         auto* token = place->token();

         if (token->kind == Token::Kind::ID_FNC)
         {
            // something
         }
         else if (token->kind == Token::Kind::ID_VAR)
         {
            // something else
         }

         break;
      }
   }
}Code language: C++ (cpp)

In both cases, we can ask IDA what is the current viewer place. Using its YX we get the token at these coordinates in the associated function. It is then easy to use the token to make context-sensitive actions.

9. Navigation

Because we were using the appropriate SDK mechanisms the whole time, we get another great feature right out of the box – navigation.

10. Synchronization highlighting

Remember that cool green corresponding-line highlighting in the example in Section 7? Well that isn’t an out-of-the-box functionality. We implemented it using new view synchronization features brought by IDA SDK 7.5.

It is now possible to query custom viewer’s synchronization group, and interfere in viewer’s line rendering:

/// Implement the event listener callback.
///
ssize_t idaapi Context::on_event(int code, va_list va)
{
   switch (code)
   {
      // Line rendering.
      //
      case ui_get_lines_rendering_info:
      {
         auto* demoSyncGroup = get_synced_group(custViewer);
         demo_place_t* demoPlace = get_custom_viewer_place(/*...*/);

         // Get callback info.
         lines_rendering_output_t* out = 
               va_arg(va, lines_rendering_output_t*);
         TWidget* view = va_arg(va, TWidget*);
         lines_rendering_input_t* info = 
               va_arg(va, lines_rendering_input_t*);

         // Check view synced with our view is being rendered.
         if (info->sync_group != demoSyncGroup)
         {
            return false;
         }
         
         // Get all the addresses associated with the current
         // line in our view.
         auto eas = demoPlace->fnc()->yx_2_eas(demoPlace->yx());

         // Fill 'out' with entries highlighting lines with 
         // addresses from 'eas' set.
         
         break;
      }
   }
}Code language: C++ (cpp)

Note: This highlighting implementation works a bit differently than the one in Hex-Rays plugin. It collects and highlights all the addresses represented by the current line in our demo viewer. Hex-Rays on the other hand seems to highlight whole continuous blocks delimited by the surrounding lines.

Demo plugin highlights only those addresses for which there is an associated token in the decompiled line. This might create discontinuous highlighting.
Hex-Rays highlights a continuous block up to addresses associated with the previous and next decompiled lines.

How to use this article?

Well, however you wish! We have shown how to use advanced IDA SDK features related to places and movements to create a dummy decompilation plugin. Hopefully, the examples in this article, or the full example in our GitHub repository, will help you make your plugins more interactive.

We will use the mechanisms shown here in our upcoming RetDec IDA plugin v1.0 to create a free and open decompilation plugin on par with Hex-Rays. Check out the project if you want to see a complex real-world usage example, or if you just need a free decompiler for IDA or Radare2.