reader

package
v0.3.13 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 29, 2025 License: Apache-2.0 Imports: 24 Imported by: 1

Documentation

Overview

Package reader contains helpers for reading data and loading to Arrow.

Index

Constants

View Source
const (
	Manual int = iota
	Scanner
)
View Source
const DefaultDelimiter byte = byte('\n')

Variables

View Source
var (
	ErrUndefinedInput = errors.New("nil input")
	ErrInvalidInput   = errors.New("invalid input")
)
View Source
var (
	ErrNullStructData = errors.New("null struct data")
)

Functions

func InputMap

func InputMap(a any) (map[string]any, error)

InputMap takes structured input data and attempts to decode it to map[string]any. Input data can be json in string or []byte, or any other Go data type which can be decoded by MapStructure/v2. MapStructure/v2: github.com/go-viper/mapstructure/v2

func TextMarshalerHookFunc

func TextMarshalerHookFunc() mapstructure.DecodeHookFuncValue

TextMarshalerHookFunc returns a DecodeHookFuncValue that checks for the encoding.TextMarshaler interface and calls the MarshalText function if found.

Types

type DataReader added in v0.3.0

type DataReader struct {
	// contains filtered or unexported fields
}

func NewReader added in v0.3.0

func NewReader(schema *arrow.Schema, source DataSource, opts ...Option) (*DataReader, error)

NewReader creates a new DataReader for reading data from a source and converting it to Arrow records.

func (*DataReader) Cancel added in v0.3.6

func (r *DataReader) Cancel()

Cancel cancels the Reader's io.Reader scan to Arrow.

func (*DataReader) Count added in v0.3.1

func (r *DataReader) Count() int

Count returns the number of input data items read so far. It is not the number of records returned by Next() or NextBatch(). It is the number of items read from the input source. This is useful for debugging and monitoring the reading process.

func (*DataReader) DataSource added in v0.3.1

func (r *DataReader) DataSource() DataSource

DataSource returns the data source type of the Reader. This is useful for debugging and monitoring the reading process.

func (*DataReader) Err added in v0.3.0

func (r *DataReader) Err() error

Err returns the last error encountered during the reading of data.

func (*DataReader) InputBufferSize added in v0.3.1

func (r *DataReader) InputBufferSize() int

InputBufferSize returns the size of the input data channel buffer. This is the number of items that can be buffered before blocking on input.

func (*DataReader) Mode added in v0.3.1

func (r *DataReader) Mode() int

func (*DataReader) Next added in v0.3.1

func (r *DataReader) Next() bool

Next returns whether a Record can be received from the converted record queue. The user should check Err() after a call to Next that returned false to check if an error took place.

func (*DataReader) NextBatch added in v0.3.2

func (r *DataReader) NextBatch(batchSize int) bool

NextBatch returns whether a []arrow.Record of a specified size can be received from the converted record queue. Will still return true if the queue channel is closed and last batch of records available < batch size specified. The user should check Err() after a call to NextBatch that returned false to check if an error took place.

func (*DataReader) Opts added in v0.3.1

func (r *DataReader) Opts() []Option

Opts returns the options used to create the DataReader.

func (*DataReader) Peek added in v0.3.1

func (r *DataReader) Peek() (int, int)

Peek returns the length of the input data and Arrow Record queues.

func (*DataReader) Read added in v0.3.1

func (r *DataReader) Read(a any) error

Read loads one datum. If the Reader has an io.Reader, Read is a no-op.

func (*DataReader) ReadToRecord added in v0.3.0

func (r *DataReader) ReadToRecord(a any) (arrow.Record, error)

ReadToRecord decodes a datum directly to an arrow.Record. The record should be released by the user when done with it.

func (*DataReader) RecBufferSize added in v0.3.1

func (r *DataReader) RecBufferSize() int

RecordBufferSize returns the size of the record channel buffer. This is the number of records that can be buffered before blocking on output.

func (*DataReader) Record added in v0.3.1

func (r *DataReader) Record() arrow.Record

Record returns the current Arrow record. It is valid until the next call to Next.

func (*DataReader) RecordBatch added in v0.3.2

func (r *DataReader) RecordBatch() []arrow.Record

Record returns the current Arrow record batch. It is valid until the next call to NextBatch.

func (*DataReader) Release added in v0.3.0

func (r *DataReader) Release()

Release decreases the reference count by 1. When the reference count goes to zero, the memory is freed. Release may be called simultaneously from multiple goroutines.

func (*DataReader) Reset added in v0.3.1

func (r *DataReader) Reset()

Reset resets a Reader to its initial state.

func (*DataReader) ResetCount added in v0.3.1

func (r *DataReader) ResetCount()

ResetCount resets the input count to zero.

func (*DataReader) Retain added in v0.3.0

func (r *DataReader) Retain()

Retain increases the reference count by 1. Retain may be called simultaneously from multiple goroutines.

func (*DataReader) Schema added in v0.3.0

func (r *DataReader) Schema() *arrow.Schema

Schema returns the Arrow schema of the Reader.

type DataSource added in v0.3.0

type DataSource int
const (
	DataSourceGo DataSource = iota
	DataSourceJSON
	DataSourceAvro
)

type Encoder

type Encoder struct {
	// contains filtered or unexported fields
}

An Encoder takes structured data and converts it into an interface following the mapstructure tags.

func New

func New(cfg *EncoderConfig) *Encoder

New returns a new encoder for the configuration.

func (*Encoder) Encode

func (e *Encoder) Encode(input any) (any, error)

Encode takes the input and uses reflection to encode it to an interface based on the mapstructure spec.

type EncoderConfig

type EncoderConfig struct {
	// EncodeHook, if set, is a way to provide custom encoding. It
	// will be called before structs and primitive types.
	EncodeHook mapstructure.DecodeHookFunc
}

EncoderConfig is the configuration used to create a new encoder.

type Option added in v0.3.0

type Option func(config)

Option configures an Avro reader/writer.

func WithAllocator added in v0.3.0

func WithAllocator(mem memory.Allocator) Option

WithAllocator specifies the Arrow memory allocator used while building records.

func WithChunk added in v0.3.1

func WithChunk(n int) Option

WithChunk specifies the chunk size used while reading data to Arrow records.

If n is zero or 1, no chunking will take place and the reader will create one record per row. If n is greater than 1, chunks of n rows will be read.

func WithContext added in v0.3.9

func WithContext(ctx context.Context) Option

WithContext specifies the context used while reading data to Arrow records. Calling reader.Cancel() will cancel the context and stop reading data.

func WithIOReader added in v0.3.1

func WithIOReader(r io.Reader, delim byte) Option

WithIOReader provides an io.Reader to Bodkin Reader, along with a delimiter to use to split datum in the data stream. Default delimiter '\n' if delimiter is not provided.

func WithInputBufferSize added in v0.3.1

func WithInputBufferSize(n int) Option

WithInputBufferSize specifies the Bodkin Reader's input buffer size.

func WithJSONDecoder added in v0.3.0

func WithJSONDecoder() Option

WithJSONDecoder specifies whether to use goccy/json-go as the Bodkin Reader's decoder. The default is the Bodkin DataLoader, a linked list of builders which reduces recursive lookups in maps when loading data.

func WithRecordBufferSize added in v0.3.1

func WithRecordBufferSize(n int) Option

WithRecordBufferSize specifies the Bodkin Reader's record buffer size.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL