### **IntQuant** Calculates the integer-quantized values of one input data (Tensor) and produces one output data (Tensor). Additionally, takes three floats as input, which define the scale, zero-point and bit-width of the quantization, which may be scalars or tensors with number of dimensions equal to the input data tensor, for e.g. tensor-wise or channel-wise quantization. The attributes narrow and signed define how the bits of the quantization are interpreted, while the attribute rounding_mode defines how quantized values are rounded. Notes: * This operator was previously named `Quant` but is renamed to `IntQuant` to distinguish it from `FloatQuant`. For a transition period, qonnx will transparently handle `Quant` as `IntQuant` for backwards compatibility reasons, but only `IntQuant` should be used for new models. * This operator does not work for binary or bipolar quantization, for this purpose the simpler `BipolarQuant` node exists. #### Version The description of this operator in this document corresponds to `qonnx.custom_ops.general` opset version 1. #### Attributes
signed : int (default is 1)
Defines if the quantization includes a signed bit. E.g. at 8b unsigned=[0, 255] vs signed=[-128, 127].
narrow : int (default is 0)
Defines if the value range should be interpreted as narrow, when signed=1. E.g. at 8b regular=[-128, 127] vs narrow=[-127, 127].
rounding_mode : string (default is "ROUND")
Defines how rounding should be applied during quantization. Avaiable options are ROUND, CEIL, FLOOR, UP, DOWN, HALF_UP, HALF_DOWN. The rounding modes are described in the table bellow. The names of rounding modes can be upper case or lower case.
#### Inputs
X (differentiable) : tensor(float32)
input tensor to quantize
scale : float32, tensor(float32)
The scale factor, either as a global scalar or with a shape matching the number of dimensions of the X tensor
zeropt : float32, tensor(float32)
The zero-point, either as a global scalar or with a shape matching the number of dimensions of the X tensor
bitwidth : int32, float32
The number of bits used by the quantization, must be a positive integer. If float32 dtype is used for convenience, it must still represent an positive integer number of bits.
#### Outputs
Y (differentiable) : tensor(float32)
Output tensor
#### Rounding modes
rounding modes | **Number \ ROUNDING_MODE** | ROUND=HALF_EVEN | CEIL | FLOOR | UP | DOWN | HALF_UP | HALF_DOWN | |----------------------------|-----------------|------|-------|----|------|---------|-----------| | 5.5 | 6 | 6 | 5 | 6 | 5 | 6 | 5 | | 2.5 | 2 | 3 | 2 | 3 | 2 | 3 | 2 | | 1.6 | 2 | 2 | 1 | 2 | 1 | 2 | 2 | | 1.1 | 1 | 2 | 1 | 2 | 1 | 1 | 1 | | 1.0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | | -1.0 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | | -1.1 | -1 | -1 | -2 | -2 | -1 | -1 | -1 | | -1.6 | -2 | -1 | -2 | -2 | -1 | -2 | -2 | | -2.5 | -2 | -2 | -3 | -3 | -2 | -3 | -2 | | -5.5 | -6 | -5 | -6 | -6 | -5 | -6 | -5 |
#### Examples
IntQuant ```python from onnx import helper import numpy as np # Define node settings and input x = np.random.randn(100).astype(np.float32)*10. scale = np.array(1.) zeropt = np.array(0.) bitwidth = np.array(4) signed = 1 narrow = 0 rounding_mode = "ROUND" # Create node node = helper.make_node( 'IntQuant', domain='finn.custom_op.general', inputs=['x', 'scale', 'zeropt', 'bitwidth'], outputs=['y'], narrow=narrow, signed=signed, rounding_mode=rounding_mode, ) # Execute the same settings with the reference implementation (quant) # See the sample implementation for more details on quant. output_ref = quant(x, scale, zeropt, bitwidth, signed, narrow, rounding_mode) # Execute node and compare expect(node, inputs=[x, scale, zeropt, bitwidth], outputs=[output_ref], name='test_intquant') ```
#### Sample Implementation
IntQuant ```python # SPDX-License-Identifier: Apache-2.0 from __future__ import absolute_import from __future__ import division from __future__ import print_function from __future__ import unicode_literals import numpy as np def quant(inp_tensor, scale, zeropt, bitwidth, signed, narrow, rounding_mode): # Port of IntQuant class from Brevitas: https://bit.ly/2S6qvZJ # Scaling y_int = inp_tensor / scale y_int = y_int + zeropt # Clamping min_int_val = min_int(signed, narrow, bitwidth) max_int_val = max_int(signed, narrow, bitwidth) y_int = np.where(y_int > max_int_val, max_int_val.astype(y_int.dtype), y_int) y_int = np.where(y_int < min_int_val, min_int_val.astype(y_int.dtype), y_int) # Rounding rounding_fx = resolve_rounding_mode(rounding_mode) y_int = rounding_fx(y_int) # Re-scaling out_tensor = y_int - zeropt out_tensor = out_tensor * scale return out_tensor def min_int(signed: bool, narrow_range: bool, bit_width: int) -> int: """Compute the minimum integer representable by a given number of bits. Args: signed (bool): Indicates whether the represented integer is signed or not. narrow_range (bool): Indicates whether to narrow the minimum value represented by 1. bit_width (int): Number of bits available for the representation. Returns: int: Maximum unsigned integer that can be represented according to the input arguments. Examples: >>> min_int(signed=True, narrow_range=True, bit_width=8) int(-127) >>> min_int(signed=False, narrow_range=True, bit_width=8) int(0) >>> min_int(signed=True, narrow_range=False, bit_width=8) int(-128) >>> min_int(signed=False, narrow_range=False, bit_width=8) int(0) """ if signed and narrow_range: value = -(2 ** (bit_width - 1)) + 1 elif signed and not narrow_range: value = -(2 ** (bit_width - 1)) else: value = 0 * bit_width return value def max_int(signed: bool, narrow_range: bool, bit_width: int) -> int: """Compute the maximum integer representable by a given number of bits. Args: signed (bool): Indicates whether the represented integer is signed or not. narrow_range (bool): Indicates whether to narrow the maximum unsigned value represented by 1. bit_width (int): Number of bits available for the representation. Returns: Tensor: Maximum integer that can be represented according to the input arguments. Examples: >>> max_int(signed=True, narrow_range=True, bit_width=8) int(127) >>> max_int(signed=False, narrow_range=True, bit_width=8) int(254) >>> max_int(signed=True, narrow_range=False, bit_width=8) int(127) >>> max_int(signed=False, narrow_range=False, bit_width=8) int(255) """ if not signed and not narrow_range: value = (2 ** bit_width) - 1 elif not signed and narrow_range: value = (2 ** bit_width) - 2 else: value = (2 ** (bit_width - 1)) - 1 return value def resolve_rounding_mode(mode_string): """Resolve the rounding mode string of IntQuant and Trunc ops to the corresponding numpy functions.""" if mode_string == "ROUND": return np.round elif mode_string == "CEIL": return np.ceil elif mode_string == "FLOOR": return np.floor else: raise ValueError(f"Could not resolve rounding mode called: {mode_string}") ```