Documentation
¶
Overview ¶
Package reader contains helpers for reading data and loading to Arrow.
Index ¶
- Constants
- Variables
- func InputMap(a any) (map[string]any, error)
- func TextMarshalerHookFunc() mapstructure.DecodeHookFuncValue
- type DataReader
- func (r *DataReader) Cancel()
- func (r *DataReader) Count() int
- func (r *DataReader) DataSource() DataSource
- func (r *DataReader) Err() error
- func (r *DataReader) InputBufferSize() int
- func (r *DataReader) Mode() int
- func (r *DataReader) Next() bool
- func (r *DataReader) NextBatch(batchSize int) bool
- func (r *DataReader) Opts() []Option
- func (r *DataReader) Peek() (int, int)
- func (r *DataReader) Read(a any) error
- func (r *DataReader) ReadToRecord(a any) (arrow.Record, error)
- func (r *DataReader) RecBufferSize() int
- func (r *DataReader) Record() arrow.Record
- func (r *DataReader) RecordBatch() []arrow.Record
- func (r *DataReader) Release()
- func (r *DataReader) Reset()
- func (r *DataReader) ResetCount()
- func (r *DataReader) Retain()
- func (r *DataReader) Schema() *arrow.Schema
- type DataSource
- type Encoder
- type EncoderConfig
- type Option
Constants ¶
const ( Manual int = iota Scanner )
const DefaultDelimiter byte = byte('\n')
Variables ¶
var ( ErrUndefinedInput = errors.New("nil input") ErrInvalidInput = errors.New("invalid input") )
var (
ErrNullStructData = errors.New("null struct data")
)
Functions ¶
func InputMap ¶
InputMap takes structured input data and attempts to decode it to map[string]any. Input data can be json in string or []byte, or any other Go data type which can be decoded by MapStructure/v2. MapStructure/v2: github.com/go-viper/mapstructure/v2
func TextMarshalerHookFunc ¶
func TextMarshalerHookFunc() mapstructure.DecodeHookFuncValue
TextMarshalerHookFunc returns a DecodeHookFuncValue that checks for the encoding.TextMarshaler interface and calls the MarshalText function if found.
Types ¶
type DataReader ¶ added in v0.3.0
type DataReader struct {
// contains filtered or unexported fields
}
func NewReader ¶ added in v0.3.0
func NewReader(schema *arrow.Schema, source DataSource, opts ...Option) (*DataReader, error)
NewReader creates a new DataReader for reading data from a source and converting it to Arrow records.
func (*DataReader) Cancel ¶ added in v0.3.6
func (r *DataReader) Cancel()
Cancel cancels the Reader's io.Reader scan to Arrow.
func (*DataReader) Count ¶ added in v0.3.1
func (r *DataReader) Count() int
Count returns the number of input data items read so far. It is not the number of records returned by Next() or NextBatch(). It is the number of items read from the input source. This is useful for debugging and monitoring the reading process.
func (*DataReader) DataSource ¶ added in v0.3.1
func (r *DataReader) DataSource() DataSource
DataSource returns the data source type of the Reader. This is useful for debugging and monitoring the reading process.
func (*DataReader) Err ¶ added in v0.3.0
func (r *DataReader) Err() error
Err returns the last error encountered during the reading of data.
func (*DataReader) InputBufferSize ¶ added in v0.3.1
func (r *DataReader) InputBufferSize() int
InputBufferSize returns the size of the input data channel buffer. This is the number of items that can be buffered before blocking on input.
func (*DataReader) Mode ¶ added in v0.3.1
func (r *DataReader) Mode() int
func (*DataReader) Next ¶ added in v0.3.1
func (r *DataReader) Next() bool
Next returns whether a Record can be received from the converted record queue. The user should check Err() after a call to Next that returned false to check if an error took place.
func (*DataReader) NextBatch ¶ added in v0.3.2
func (r *DataReader) NextBatch(batchSize int) bool
NextBatch returns whether a []arrow.Record of a specified size can be received from the converted record queue. Will still return true if the queue channel is closed and last batch of records available < batch size specified. The user should check Err() after a call to NextBatch that returned false to check if an error took place.
func (*DataReader) Opts ¶ added in v0.3.1
func (r *DataReader) Opts() []Option
Opts returns the options used to create the DataReader.
func (*DataReader) Peek ¶ added in v0.3.1
func (r *DataReader) Peek() (int, int)
Peek returns the length of the input data and Arrow Record queues.
func (*DataReader) Read ¶ added in v0.3.1
func (r *DataReader) Read(a any) error
Read loads one datum. If the Reader has an io.Reader, Read is a no-op.
func (*DataReader) ReadToRecord ¶ added in v0.3.0
func (r *DataReader) ReadToRecord(a any) (arrow.Record, error)
ReadToRecord decodes a datum directly to an arrow.Record. The record should be released by the user when done with it.
func (*DataReader) RecBufferSize ¶ added in v0.3.1
func (r *DataReader) RecBufferSize() int
RecordBufferSize returns the size of the record channel buffer. This is the number of records that can be buffered before blocking on output.
func (*DataReader) Record ¶ added in v0.3.1
func (r *DataReader) Record() arrow.Record
Record returns the current Arrow record. It is valid until the next call to Next.
func (*DataReader) RecordBatch ¶ added in v0.3.2
func (r *DataReader) RecordBatch() []arrow.Record
Record returns the current Arrow record batch. It is valid until the next call to NextBatch.
func (*DataReader) Release ¶ added in v0.3.0
func (r *DataReader) Release()
Release decreases the reference count by 1. When the reference count goes to zero, the memory is freed. Release may be called simultaneously from multiple goroutines.
func (*DataReader) Reset ¶ added in v0.3.1
func (r *DataReader) Reset()
Reset resets a Reader to its initial state.
func (*DataReader) ResetCount ¶ added in v0.3.1
func (r *DataReader) ResetCount()
ResetCount resets the input count to zero.
func (*DataReader) Retain ¶ added in v0.3.0
func (r *DataReader) Retain()
Retain increases the reference count by 1. Retain may be called simultaneously from multiple goroutines.
func (*DataReader) Schema ¶ added in v0.3.0
func (r *DataReader) Schema() *arrow.Schema
Schema returns the Arrow schema of the Reader.
type DataSource ¶ added in v0.3.0
type DataSource int
const ( DataSourceGo DataSource = iota DataSourceJSON DataSourceAvro )
type Encoder ¶
type Encoder struct {
// contains filtered or unexported fields
}
An Encoder takes structured data and converts it into an interface following the mapstructure tags.
type EncoderConfig ¶
type EncoderConfig struct {
// EncodeHook, if set, is a way to provide custom encoding. It
// will be called before structs and primitive types.
EncodeHook mapstructure.DecodeHookFunc
}
EncoderConfig is the configuration used to create a new encoder.
type Option ¶ added in v0.3.0
type Option func(config)
Option configures an Avro reader/writer.
func WithAllocator ¶ added in v0.3.0
WithAllocator specifies the Arrow memory allocator used while building records.
func WithChunk ¶ added in v0.3.1
WithChunk specifies the chunk size used while reading data to Arrow records.
If n is zero or 1, no chunking will take place and the reader will create one record per row. If n is greater than 1, chunks of n rows will be read.
func WithContext ¶ added in v0.3.9
WithContext specifies the context used while reading data to Arrow records. Calling reader.Cancel() will cancel the context and stop reading data.
func WithIOReader ¶ added in v0.3.1
WithIOReader provides an io.Reader to Bodkin Reader, along with a delimiter to use to split datum in the data stream. Default delimiter '\n' if delimiter is not provided.
func WithInputBufferSize ¶ added in v0.3.1
WithInputBufferSize specifies the Bodkin Reader's input buffer size.
func WithJSONDecoder ¶ added in v0.3.0
func WithJSONDecoder() Option
WithJSONDecoder specifies whether to use goccy/json-go as the Bodkin Reader's decoder. The default is the Bodkin DataLoader, a linked list of builders which reduces recursive lookups in maps when loading data.
func WithRecordBufferSize ¶ added in v0.3.1
WithRecordBufferSize specifies the Bodkin Reader's record buffer size.