dataquality.loggers package#
Subpackages#
- dataquality.loggers.data_logger package
- Subpackages
- Submodules
- dataquality.loggers.data_logger.base_data_logger module
BaseGalileoDataLogger
BaseGalileoDataLogger.MAX_META_COLS
BaseGalileoDataLogger.MAX_STR_LEN
BaseGalileoDataLogger.MAX_DOC_LEN
BaseGalileoDataLogger.LIMIT_NUM_DOCS
BaseGalileoDataLogger.INPUT_DATA_BASE
BaseGalileoDataLogger.STRING_MAX_SIZE_B
BaseGalileoDataLogger.DATA_FOLDER_EXTENSION
BaseGalileoDataLogger.INPUT_DATA_FILE_EXT
BaseGalileoDataLogger.input_data_path
BaseGalileoDataLogger.input_data_file()
BaseGalileoDataLogger.log_data_sample()
BaseGalileoDataLogger.log_data_samples()
BaseGalileoDataLogger.log_dataset()
BaseGalileoDataLogger.validate_ids_for_split()
BaseGalileoDataLogger.add_ids_to_split()
BaseGalileoDataLogger.log()
BaseGalileoDataLogger.export_df()
BaseGalileoDataLogger.support_embs
BaseGalileoDataLogger.support_data_embs
BaseGalileoDataLogger.apply_column_map()
BaseGalileoDataLogger.upload()
BaseGalileoDataLogger.upload_split()
BaseGalileoDataLogger.create_and_upload_data_embs()
BaseGalileoDataLogger.convert_large_string()
BaseGalileoDataLogger.upload_split_from_in_frame()
BaseGalileoDataLogger.create_in_out_frames()
BaseGalileoDataLogger.process_in_out_frames()
BaseGalileoDataLogger.upload_in_out_frames()
BaseGalileoDataLogger.prob_only()
BaseGalileoDataLogger.validate_and_format()
BaseGalileoDataLogger.validate_labels()
BaseGalileoDataLogger.validate_metadata()
BaseGalileoDataLogger.get_data_logger_attr()
BaseGalileoDataLogger.separate_dataframe()
BaseGalileoDataLogger.validate_kwargs()
BaseGalileoDataLogger.set_tagging_schema()
- dataquality.loggers.data_logger.image_classification module
ImageClassificationDataLogger
ImageClassificationDataLogger.logger_config
ImageClassificationDataLogger.DATA_FOLDER_EXTENSION
ImageClassificationDataLogger.support_data_embs
ImageClassificationDataLogger.log_image_dataset()
ImageClassificationDataLogger.convert_large_string()
ImageClassificationDataLogger.process_in_out_frames()
ImageClassificationDataLogger.upload_split_from_in_frame()
ImageClassificationDataLogger.add_cv_smart_features()
- dataquality.loggers.data_logger.object_detection module
GalileoDataLoggerAttributes
ODCols
ObjectDetectionDataLogger
ObjectDetectionDataLogger.logger_config
ObjectDetectionDataLogger.get_valid_attributes()
ObjectDetectionDataLogger.log_dataset()
ObjectDetectionDataLogger.log_image_samples()
ObjectDetectionDataLogger.convert_large_string()
ObjectDetectionDataLogger.prob_only()
ObjectDetectionDataLogger.create_and_upload_data_embs()
ObjectDetectionDataLogger.process_in_out_frames()
ObjectDetectionDataLogger.support_data_embs
- dataquality.loggers.data_logger.semantic_segmentation module
SemanticSegmentationDataLogger
SemanticSegmentationDataLogger.logger_config
SemanticSegmentationDataLogger.INPUT_DATA_FILE_EXT
SemanticSegmentationDataLogger.log_dataset()
SemanticSegmentationDataLogger.log_image_samples()
SemanticSegmentationDataLogger.export_df()
SemanticSegmentationDataLogger.upload_split_from_in_frame()
SemanticSegmentationDataLogger.support_embs
SemanticSegmentationDataLogger.support_data_embs
- dataquality.loggers.data_logger.tabular_classification module
- dataquality.loggers.data_logger.text_classification module
GalileoDataLoggerAttributes
TextClassificationDataLogger
TextClassificationDataLogger.logger_config
TextClassificationDataLogger.log_data_samples()
TextClassificationDataLogger.log_data_sample()
TextClassificationDataLogger.log_dataset()
TextClassificationDataLogger.get_valid_attributes()
TextClassificationDataLogger.validate_and_format()
TextClassificationDataLogger.validate_logged_labels()
TextClassificationDataLogger.separate_dataframe()
TextClassificationDataLogger.validate_labels()
- dataquality.loggers.data_logger.text_multi_label module
- dataquality.loggers.data_logger.text_ner module
GalileoDataLoggerAttributes
GalileoDataLoggerAttributes.texts
GalileoDataLoggerAttributes.text_token_indices
GalileoDataLoggerAttributes.text_token_indices_flat
GalileoDataLoggerAttributes.gold_spans
GalileoDataLoggerAttributes.ids
GalileoDataLoggerAttributes.split
GalileoDataLoggerAttributes.meta
GalileoDataLoggerAttributes.get_valid()
TextNERDataLogger
TextNERDataLogger.DATA_FOLDER_EXTENSION
TextNERDataLogger.logger_config
TextNERDataLogger.get_valid_attributes()
TextNERDataLogger.log_data_samples()
TextNERDataLogger.log_data_sample()
TextNERDataLogger.log_dataset()
TextNERDataLogger.validate_and_format()
TextNERDataLogger.process_in_out_frames()
TextNERDataLogger.separate_dataframe()
TextNERDataLogger.validate_labels()
TextNERDataLogger.is_valid_span_label()
TextNERDataLogger.set_tagging_schema()
TextNERDataLogger.support_data_embs
TextNERDataLogger.create_and_upload_data_embs()
- Module contents
BaseGalileoDataLogger
BaseGalileoDataLogger.MAX_META_COLS
BaseGalileoDataLogger.MAX_STR_LEN
BaseGalileoDataLogger.MAX_DOC_LEN
BaseGalileoDataLogger.LIMIT_NUM_DOCS
BaseGalileoDataLogger.INPUT_DATA_BASE
BaseGalileoDataLogger.STRING_MAX_SIZE_B
BaseGalileoDataLogger.DATA_FOLDER_EXTENSION
BaseGalileoDataLogger.INPUT_DATA_FILE_EXT
BaseGalileoDataLogger.meta
BaseGalileoDataLogger.input_data_path
BaseGalileoDataLogger.input_data_file()
BaseGalileoDataLogger.log_data_sample()
BaseGalileoDataLogger.log_data_samples()
BaseGalileoDataLogger.log_dataset()
BaseGalileoDataLogger.validate_ids_for_split()
BaseGalileoDataLogger.add_ids_to_split()
BaseGalileoDataLogger.log()
BaseGalileoDataLogger.export_df()
BaseGalileoDataLogger.support_embs
BaseGalileoDataLogger.support_data_embs
BaseGalileoDataLogger.apply_column_map()
BaseGalileoDataLogger.upload()
BaseGalileoDataLogger.upload_split()
BaseGalileoDataLogger.create_and_upload_data_embs()
BaseGalileoDataLogger.convert_large_string()
BaseGalileoDataLogger.upload_split_from_in_frame()
BaseGalileoDataLogger.create_in_out_frames()
BaseGalileoDataLogger.process_in_out_frames()
BaseGalileoDataLogger.upload_in_out_frames()
BaseGalileoDataLogger.prob_only()
BaseGalileoDataLogger.validate_and_format()
BaseGalileoDataLogger.validate_labels()
BaseGalileoDataLogger.validate_metadata()
BaseGalileoDataLogger.get_data_logger_attr()
BaseGalileoDataLogger.separate_dataframe()
BaseGalileoDataLogger.validate_kwargs()
BaseGalileoDataLogger.set_tagging_schema()
BaseGalileoDataLogger.split
BaseGalileoDataLogger.inference_name
- dataquality.loggers.logger_config package
- Subpackages
- Submodules
- dataquality.loggers.logger_config.base_logger_config module
BaseLoggerConfig
BaseLoggerConfig.conditions
BaseLoggerConfig.cur_epoch
BaseLoggerConfig.cur_inference_name
BaseLoggerConfig.cur_split
BaseLoggerConfig.dataloader_random_sampling
BaseLoggerConfig.exception
BaseLoggerConfig.existing_run
BaseLoggerConfig.feature_names
BaseLoggerConfig.finish
BaseLoggerConfig.helper_data
BaseLoggerConfig.idx_to_id_map
BaseLoggerConfig.inference_logged
BaseLoggerConfig.input_data_logged
BaseLoggerConfig.int_labels
BaseLoggerConfig.labels
BaseLoggerConfig.last_epoch
BaseLoggerConfig.logged_input_ids
BaseLoggerConfig.metadata_documents
BaseLoggerConfig.ner_labels
BaseLoggerConfig.observed_labels
BaseLoggerConfig.observed_num_labels
BaseLoggerConfig.remove_embs
BaseLoggerConfig.report_emails
BaseLoggerConfig.tagging_schema
BaseLoggerConfig.tasks
BaseLoggerConfig.test_logged
BaseLoggerConfig.training_logged
BaseLoggerConfig.validation_logged
BaseLoggerConfig.reset()
- dataquality.loggers.logger_config.image_classification module
- dataquality.loggers.logger_config.object_detection module
- dataquality.loggers.logger_config.semantic_segmentation module
- dataquality.loggers.logger_config.tabular_classification module
- dataquality.loggers.logger_config.text_classification module
- dataquality.loggers.logger_config.text_multi_label module
- dataquality.loggers.logger_config.text_ner module
- Module contents
- dataquality.loggers.model_logger package
- Subpackages
- Submodules
- dataquality.loggers.model_logger.base_model_logger module
BaseGalileoModelLogger
BaseGalileoModelLogger.log_file_ext
BaseGalileoModelLogger.log()
BaseGalileoModelLogger.write_model_output()
BaseGalileoModelLogger.set_split_epoch()
BaseGalileoModelLogger.upload()
BaseGalileoModelLogger.get_model_logger_attr()
BaseGalileoModelLogger.convert_logits_to_prob_binary()
BaseGalileoModelLogger.convert_logits_to_probs()
- dataquality.loggers.model_logger.image_classification module
- dataquality.loggers.model_logger.object_detection module
- dataquality.loggers.model_logger.semantic_segmentation module
SemanticSegmentationModelLogger
SemanticSegmentationModelLogger.logger_config
SemanticSegmentationModelLogger.validate_and_format()
SemanticSegmentationModelLogger.local_dep_path
SemanticSegmentationModelLogger.local_proj_run_path
SemanticSegmentationModelLogger.local_contours_path
SemanticSegmentationModelLogger.get_polygon_data()
- dataquality.loggers.model_logger.tabular_classification module
- dataquality.loggers.model_logger.text_classification module
- dataquality.loggers.model_logger.text_multi_label module
- dataquality.loggers.model_logger.text_ner module
GalileoModelLoggerAttributes
GalileoModelLoggerAttributes.gold_emb
GalileoModelLoggerAttributes.gold_spans
GalileoModelLoggerAttributes.gold_conf_prob
GalileoModelLoggerAttributes.gold_loss_prob
GalileoModelLoggerAttributes.gold_loss_prob_label
GalileoModelLoggerAttributes.embs
GalileoModelLoggerAttributes.pred_emb
GalileoModelLoggerAttributes.pred_spans
GalileoModelLoggerAttributes.pred_conf_prob
GalileoModelLoggerAttributes.pred_loss_prob
GalileoModelLoggerAttributes.pred_loss_prob_label
GalileoModelLoggerAttributes.probs
GalileoModelLoggerAttributes.logits
GalileoModelLoggerAttributes.ids
GalileoModelLoggerAttributes.split
GalileoModelLoggerAttributes.epoch
GalileoModelLoggerAttributes.log_helper_data
GalileoModelLoggerAttributes.inference_name
GalileoModelLoggerAttributes.get_valid()
TextNERModelLogger
- Module contents
BaseGalileoModelLogger
BaseGalileoModelLogger.log_file_ext
BaseGalileoModelLogger.embs
BaseGalileoModelLogger.logits
BaseGalileoModelLogger.probs
BaseGalileoModelLogger.ids
BaseGalileoModelLogger.split
BaseGalileoModelLogger.inference_name
BaseGalileoModelLogger.log()
BaseGalileoModelLogger.write_model_output()
BaseGalileoModelLogger.set_split_epoch()
BaseGalileoModelLogger.upload()
BaseGalileoModelLogger.get_model_logger_attr()
BaseGalileoModelLogger.convert_logits_to_prob_binary()
BaseGalileoModelLogger.convert_logits_to_probs()
Submodules#
dataquality.loggers.base_logger module#
- class BaseLoggerAttributes(value)#
Bases:
str
,Enum
A collection of all default attributes across all loggers
- texts = 'texts'#
- labels = 'labels'#
- ids = 'ids'#
- split = 'split'#
- meta = 'meta'#
- prob = 'prob'#
- gold_conf_prob = 'gold_conf_prob'#
- gold_loss_prob = 'gold_loss_prob'#
- gold_loss_prob_label = 'gold_loss_prob_label'#
- pred_conf_prob = 'pred_conf_prob'#
- pred_loss_prob = 'pred_loss_prob'#
- pred_loss_prob_label = 'pred_loss_prob_label'#
- gold = 'gold'#
- embs = 'embs'#
- probs = 'probs'#
- logits = 'logits'#
- epoch = 'epoch'#
- aum = 'aum'#
- text_tokenized = 'text_tokenized'#
- gold_spans = 'gold_spans'#
- pred_emb = 'pred_emb'#
- gold_emb = 'gold_emb'#
- pred_spans = 'pred_spans'#
- text_token_indices = 'text_token_indices'#
- text_token_indices_flat = 'text_token_indices_flat'#
- log_helper_data = 'log_helper_data'#
- inference_name = 'inference_name'#
- image = 'image'#
- token_label_str = 'token_label_str'#
- token_label_positions = 'token_label_positions'#
- token_label_offsets = 'token_label_offsets'#
- label = 'label'#
- token_deps = 'token_deps'#
- text = 'text'#
- id = 'id'#
- token_gold_probs = 'token_gold_probs'#
- tokenized_label = 'tokenized_label'#
- input = 'input'#
- target = 'target'#
- generated_output = 'generated_output'#
- input_cutoff = 'input_cutoff'#
- target_cutoff = 'target_cutoff'#
- system_prompts = 'system_prompts'#
- x = 'x'#
- y = 'y'#
- data_x = 'data_x'#
- data_y = 'data_y'#
- static get_valid()#
- Return type:
List
[str
]
- class BaseGalileoLogger#
Bases:
object
An abstract base class that all model logger and data loggers inherit from
- LOG_FILE_DIR = '/home/runner/.galileo/logs'#
-
logger_config:
BaseLoggerConfig
= BaseLoggerConfig(labels=None, tasks=None, observed_num_labels=None, observed_labels=None, tagging_schema=None, last_epoch=0, cur_epoch=None, cur_split=None, cur_inference_name=None, training_logged=False, validation_logged=False, test_logged=False, inference_logged=False, exception='', helper_data={}, input_data_logged=defaultdict(<class 'int'>, {}), logged_input_ids=defaultdict(<class 'set'>, {}), idx_to_id_map=defaultdict(<class 'list'>, {}), conditions=[], report_emails=[], ner_labels=[], int_labels=False, feature_names=[], metadata_documents=set(), finish=<function BaseLoggerConfig.<lambda>>, existing_run=False, dataloader_random_sampling=False, remove_embs=False)#
- property proj_run: str#
Returns the project and run id
Example
proj-id/run-id
- property write_output_dir: str#
Returns the path to the output directory for the current run
Example
/Users/username/.galileo/logs/proj-id/run-id
- property split_name: str#
Returns the name of the current split
If the split is inference, it will return the name of the inference concatenated to the end of the split name
Example
training inference_inf-name1
- property split_name_path: str#
Returns the path part of the current split
If the split is inference, it will return the name of the inference run after the split name
Example
training inference/inf-name1
- static get_valid_attributes()#
- Return type:
List
[str
]
- abstract validate_and_format()#
Validates params passed in during logging. Implemented by child
- Return type:
None
- set_split_epoch()#
Sets the split for the current logger
If the split is not set, it will use the split set in the logger config
- Return type:
None
- is_valid()#
- Return type:
bool
- classmethod non_inference_logged()#
Return true if training, test, or validation data is logged
If just inference data is logged then append data rather than overwriting. This flag is also used by the api to know which processing jobs to run.
- Return type:
bool
- abstract log()#
- Return type:
None
- static validate_task(task_type)#
Raises error if task type is not a valid TaskType
- Return type:
- upload()#
- Return type:
None
- classmethod get_all_subclasses()#
- Return type:
List
[Type
[TypeVar
(T
, bound= BaseGalileoLogger)]]
- classmethod get_logger(task_type)#
- Return type:
Type
[TypeVar
(T
, bound= BaseGalileoLogger)]
- classmethod doc()#
- Return type:
None
- classmethod validate_split(split)#
Raises error if split is not a valid Split
- Return type:
str
- classmethod check_for_logging_failures()#
When a threaded logging call fails, it sets the logger_config.exception
If that field is set, raise an exception here and stop the main process
- Return type:
None
- classmethod is_hf_dataset(df)#
- Return type:
bool
Module contents#
- class BaseGalileoLogger#
Bases:
object
An abstract base class that all model logger and data loggers inherit from
- LOG_FILE_DIR = '/home/runner/.galileo/logs'#
-
logger_config:
BaseLoggerConfig
= BaseLoggerConfig(labels=None, tasks=None, observed_num_labels=None, observed_labels=None, tagging_schema=None, last_epoch=0, cur_epoch=None, cur_split=None, cur_inference_name=None, training_logged=False, validation_logged=False, test_logged=False, inference_logged=False, exception='', helper_data={}, input_data_logged=defaultdict(<class 'int'>, {}), logged_input_ids=defaultdict(<class 'set'>, {}), idx_to_id_map=defaultdict(<class 'list'>, {}), conditions=[], report_emails=[], ner_labels=[], int_labels=False, feature_names=[], metadata_documents=set(), finish=<function BaseLoggerConfig.<lambda>>, existing_run=False, dataloader_random_sampling=False, remove_embs=False)#
-
split:
Optional
[str
]#
-
inference_name:
Optional
[str
]#
- property proj_run: str#
Returns the project and run id
Example
proj-id/run-id
- property write_output_dir: str#
Returns the path to the output directory for the current run
Example
/Users/username/.galileo/logs/proj-id/run-id
- property split_name: str#
Returns the name of the current split
If the split is inference, it will return the name of the inference concatenated to the end of the split name
Example
training inference_inf-name1
- property split_name_path: str#
Returns the path part of the current split
If the split is inference, it will return the name of the inference run after the split name
Example
training inference/inf-name1
- static get_valid_attributes()#
- Return type:
List
[str
]
- abstract validate_and_format()#
Validates params passed in during logging. Implemented by child
- Return type:
None
- set_split_epoch()#
Sets the split for the current logger
If the split is not set, it will use the split set in the logger config
- Return type:
None
- is_valid()#
- Return type:
bool
- classmethod non_inference_logged()#
Return true if training, test, or validation data is logged
If just inference data is logged then append data rather than overwriting. This flag is also used by the api to know which processing jobs to run.
- Return type:
bool
- abstract log()#
- Return type:
None
- static validate_task(task_type)#
Raises error if task type is not a valid TaskType
- Return type:
- upload()#
- Return type:
None
- classmethod get_all_subclasses()#
- Return type:
List
[Type
[TypeVar
(T
, bound= BaseGalileoLogger)]]
- classmethod get_logger(task_type)#
- Return type:
Type
[TypeVar
(T
, bound= BaseGalileoLogger)]
- classmethod doc()#
- Return type:
None
- classmethod validate_split(split)#
Raises error if split is not a valid Split
- Return type:
str
- classmethod check_for_logging_failures()#
When a threaded logging call fails, it sets the logger_config.exception
If that field is set, raise an exception here and stop the main process
- Return type:
None
- classmethod is_hf_dataset(df)#
- Return type:
bool