dataquality.loggers package#
Subpackages#
- dataquality.loggers.data_logger package
- Subpackages
- Submodules
- dataquality.loggers.data_logger.base_data_logger module
BaseGalileoDataLoggerBaseGalileoDataLogger.MAX_META_COLSBaseGalileoDataLogger.MAX_STR_LENBaseGalileoDataLogger.MAX_DOC_LENBaseGalileoDataLogger.LIMIT_NUM_DOCSBaseGalileoDataLogger.INPUT_DATA_BASEBaseGalileoDataLogger.STRING_MAX_SIZE_BBaseGalileoDataLogger.DATA_FOLDER_EXTENSIONBaseGalileoDataLogger.INPUT_DATA_FILE_EXTBaseGalileoDataLogger.input_data_pathBaseGalileoDataLogger.input_data_file()BaseGalileoDataLogger.log_data_sample()BaseGalileoDataLogger.log_data_samples()BaseGalileoDataLogger.log_dataset()BaseGalileoDataLogger.validate_ids_for_split()BaseGalileoDataLogger.add_ids_to_split()BaseGalileoDataLogger.log()BaseGalileoDataLogger.export_df()BaseGalileoDataLogger.support_embsBaseGalileoDataLogger.support_data_embsBaseGalileoDataLogger.apply_column_map()BaseGalileoDataLogger.upload()BaseGalileoDataLogger.upload_split()BaseGalileoDataLogger.create_and_upload_data_embs()BaseGalileoDataLogger.convert_large_string()BaseGalileoDataLogger.upload_split_from_in_frame()BaseGalileoDataLogger.create_in_out_frames()BaseGalileoDataLogger.process_in_out_frames()BaseGalileoDataLogger.upload_in_out_frames()BaseGalileoDataLogger.prob_only()BaseGalileoDataLogger.validate_and_format()BaseGalileoDataLogger.validate_labels()BaseGalileoDataLogger.validate_metadata()BaseGalileoDataLogger.get_data_logger_attr()BaseGalileoDataLogger.separate_dataframe()BaseGalileoDataLogger.validate_kwargs()BaseGalileoDataLogger.set_tagging_schema()
- dataquality.loggers.data_logger.image_classification module
ImageClassificationDataLoggerImageClassificationDataLogger.logger_configImageClassificationDataLogger.DATA_FOLDER_EXTENSIONImageClassificationDataLogger.support_data_embsImageClassificationDataLogger.log_image_dataset()ImageClassificationDataLogger.convert_large_string()ImageClassificationDataLogger.process_in_out_frames()ImageClassificationDataLogger.upload_split_from_in_frame()ImageClassificationDataLogger.add_cv_smart_features()
- dataquality.loggers.data_logger.object_detection module
GalileoDataLoggerAttributesODColsObjectDetectionDataLoggerObjectDetectionDataLogger.logger_configObjectDetectionDataLogger.get_valid_attributes()ObjectDetectionDataLogger.log_dataset()ObjectDetectionDataLogger.log_image_samples()ObjectDetectionDataLogger.convert_large_string()ObjectDetectionDataLogger.prob_only()ObjectDetectionDataLogger.create_and_upload_data_embs()ObjectDetectionDataLogger.process_in_out_frames()ObjectDetectionDataLogger.support_data_embs
- dataquality.loggers.data_logger.semantic_segmentation module
SemanticSegmentationDataLoggerSemanticSegmentationDataLogger.logger_configSemanticSegmentationDataLogger.INPUT_DATA_FILE_EXTSemanticSegmentationDataLogger.log_dataset()SemanticSegmentationDataLogger.log_image_samples()SemanticSegmentationDataLogger.export_df()SemanticSegmentationDataLogger.upload_split_from_in_frame()SemanticSegmentationDataLogger.support_embsSemanticSegmentationDataLogger.support_data_embs
- dataquality.loggers.data_logger.tabular_classification module
- dataquality.loggers.data_logger.text_classification module
GalileoDataLoggerAttributesTextClassificationDataLoggerTextClassificationDataLogger.logger_configTextClassificationDataLogger.log_data_samples()TextClassificationDataLogger.log_data_sample()TextClassificationDataLogger.log_dataset()TextClassificationDataLogger.get_valid_attributes()TextClassificationDataLogger.validate_and_format()TextClassificationDataLogger.validate_logged_labels()TextClassificationDataLogger.separate_dataframe()TextClassificationDataLogger.validate_labels()
- dataquality.loggers.data_logger.text_multi_label module
- dataquality.loggers.data_logger.text_ner module
GalileoDataLoggerAttributesGalileoDataLoggerAttributes.textsGalileoDataLoggerAttributes.text_token_indicesGalileoDataLoggerAttributes.text_token_indices_flatGalileoDataLoggerAttributes.gold_spansGalileoDataLoggerAttributes.idsGalileoDataLoggerAttributes.splitGalileoDataLoggerAttributes.metaGalileoDataLoggerAttributes.get_valid()
TextNERDataLoggerTextNERDataLogger.DATA_FOLDER_EXTENSIONTextNERDataLogger.logger_configTextNERDataLogger.get_valid_attributes()TextNERDataLogger.log_data_samples()TextNERDataLogger.log_data_sample()TextNERDataLogger.log_dataset()TextNERDataLogger.validate_and_format()TextNERDataLogger.process_in_out_frames()TextNERDataLogger.separate_dataframe()TextNERDataLogger.validate_labels()TextNERDataLogger.is_valid_span_label()TextNERDataLogger.set_tagging_schema()TextNERDataLogger.support_data_embsTextNERDataLogger.create_and_upload_data_embs()
- Module contents
BaseGalileoDataLoggerBaseGalileoDataLogger.MAX_META_COLSBaseGalileoDataLogger.MAX_STR_LENBaseGalileoDataLogger.MAX_DOC_LENBaseGalileoDataLogger.LIMIT_NUM_DOCSBaseGalileoDataLogger.INPUT_DATA_BASEBaseGalileoDataLogger.STRING_MAX_SIZE_BBaseGalileoDataLogger.DATA_FOLDER_EXTENSIONBaseGalileoDataLogger.INPUT_DATA_FILE_EXTBaseGalileoDataLogger.metaBaseGalileoDataLogger.input_data_pathBaseGalileoDataLogger.input_data_file()BaseGalileoDataLogger.log_data_sample()BaseGalileoDataLogger.log_data_samples()BaseGalileoDataLogger.log_dataset()BaseGalileoDataLogger.validate_ids_for_split()BaseGalileoDataLogger.add_ids_to_split()BaseGalileoDataLogger.log()BaseGalileoDataLogger.export_df()BaseGalileoDataLogger.support_embsBaseGalileoDataLogger.support_data_embsBaseGalileoDataLogger.apply_column_map()BaseGalileoDataLogger.upload()BaseGalileoDataLogger.upload_split()BaseGalileoDataLogger.create_and_upload_data_embs()BaseGalileoDataLogger.convert_large_string()BaseGalileoDataLogger.upload_split_from_in_frame()BaseGalileoDataLogger.create_in_out_frames()BaseGalileoDataLogger.process_in_out_frames()BaseGalileoDataLogger.upload_in_out_frames()BaseGalileoDataLogger.prob_only()BaseGalileoDataLogger.validate_and_format()BaseGalileoDataLogger.validate_labels()BaseGalileoDataLogger.validate_metadata()BaseGalileoDataLogger.get_data_logger_attr()BaseGalileoDataLogger.separate_dataframe()BaseGalileoDataLogger.validate_kwargs()BaseGalileoDataLogger.set_tagging_schema()BaseGalileoDataLogger.splitBaseGalileoDataLogger.inference_name
- dataquality.loggers.logger_config package
- Subpackages
- Submodules
- dataquality.loggers.logger_config.base_logger_config module
BaseLoggerConfigBaseLoggerConfig.conditionsBaseLoggerConfig.cur_epochBaseLoggerConfig.cur_inference_nameBaseLoggerConfig.cur_splitBaseLoggerConfig.dataloader_random_samplingBaseLoggerConfig.exceptionBaseLoggerConfig.existing_runBaseLoggerConfig.feature_namesBaseLoggerConfig.finishBaseLoggerConfig.helper_dataBaseLoggerConfig.idx_to_id_mapBaseLoggerConfig.inference_loggedBaseLoggerConfig.input_data_loggedBaseLoggerConfig.int_labelsBaseLoggerConfig.labelsBaseLoggerConfig.last_epochBaseLoggerConfig.logged_input_idsBaseLoggerConfig.metadata_documentsBaseLoggerConfig.ner_labelsBaseLoggerConfig.observed_labelsBaseLoggerConfig.observed_num_labelsBaseLoggerConfig.remove_embsBaseLoggerConfig.report_emailsBaseLoggerConfig.tagging_schemaBaseLoggerConfig.tasksBaseLoggerConfig.test_loggedBaseLoggerConfig.training_loggedBaseLoggerConfig.validation_loggedBaseLoggerConfig.reset()
- dataquality.loggers.logger_config.image_classification module
- dataquality.loggers.logger_config.object_detection module
- dataquality.loggers.logger_config.semantic_segmentation module
- dataquality.loggers.logger_config.tabular_classification module
- dataquality.loggers.logger_config.text_classification module
- dataquality.loggers.logger_config.text_multi_label module
- dataquality.loggers.logger_config.text_ner module
- Module contents
- dataquality.loggers.model_logger package
- Subpackages
- Submodules
- dataquality.loggers.model_logger.base_model_logger module
BaseGalileoModelLoggerBaseGalileoModelLogger.log_file_extBaseGalileoModelLogger.log()BaseGalileoModelLogger.write_model_output()BaseGalileoModelLogger.set_split_epoch()BaseGalileoModelLogger.upload()BaseGalileoModelLogger.get_model_logger_attr()BaseGalileoModelLogger.convert_logits_to_prob_binary()BaseGalileoModelLogger.convert_logits_to_probs()
- dataquality.loggers.model_logger.image_classification module
- dataquality.loggers.model_logger.object_detection module
- dataquality.loggers.model_logger.semantic_segmentation module
SemanticSegmentationModelLoggerSemanticSegmentationModelLogger.logger_configSemanticSegmentationModelLogger.validate_and_format()SemanticSegmentationModelLogger.local_dep_pathSemanticSegmentationModelLogger.local_proj_run_pathSemanticSegmentationModelLogger.local_contours_pathSemanticSegmentationModelLogger.get_polygon_data()
- dataquality.loggers.model_logger.tabular_classification module
- dataquality.loggers.model_logger.text_classification module
- dataquality.loggers.model_logger.text_multi_label module
- dataquality.loggers.model_logger.text_ner module
GalileoModelLoggerAttributesGalileoModelLoggerAttributes.gold_embGalileoModelLoggerAttributes.gold_spansGalileoModelLoggerAttributes.gold_conf_probGalileoModelLoggerAttributes.gold_loss_probGalileoModelLoggerAttributes.gold_loss_prob_labelGalileoModelLoggerAttributes.embsGalileoModelLoggerAttributes.pred_embGalileoModelLoggerAttributes.pred_spansGalileoModelLoggerAttributes.pred_conf_probGalileoModelLoggerAttributes.pred_loss_probGalileoModelLoggerAttributes.pred_loss_prob_labelGalileoModelLoggerAttributes.probsGalileoModelLoggerAttributes.logitsGalileoModelLoggerAttributes.idsGalileoModelLoggerAttributes.splitGalileoModelLoggerAttributes.epochGalileoModelLoggerAttributes.log_helper_dataGalileoModelLoggerAttributes.inference_nameGalileoModelLoggerAttributes.get_valid()
TextNERModelLogger
- Module contents
BaseGalileoModelLoggerBaseGalileoModelLogger.log_file_extBaseGalileoModelLogger.embsBaseGalileoModelLogger.logitsBaseGalileoModelLogger.probsBaseGalileoModelLogger.idsBaseGalileoModelLogger.splitBaseGalileoModelLogger.inference_nameBaseGalileoModelLogger.log()BaseGalileoModelLogger.write_model_output()BaseGalileoModelLogger.set_split_epoch()BaseGalileoModelLogger.upload()BaseGalileoModelLogger.get_model_logger_attr()BaseGalileoModelLogger.convert_logits_to_prob_binary()BaseGalileoModelLogger.convert_logits_to_probs()
Submodules#
dataquality.loggers.base_logger module#
- class BaseLoggerAttributes(value)#
Bases:
str,EnumA collection of all default attributes across all loggers
- texts = 'texts'#
- labels = 'labels'#
- ids = 'ids'#
- split = 'split'#
- meta = 'meta'#
- prob = 'prob'#
- gold_conf_prob = 'gold_conf_prob'#
- gold_loss_prob = 'gold_loss_prob'#
- gold_loss_prob_label = 'gold_loss_prob_label'#
- pred_conf_prob = 'pred_conf_prob'#
- pred_loss_prob = 'pred_loss_prob'#
- pred_loss_prob_label = 'pred_loss_prob_label'#
- gold = 'gold'#
- embs = 'embs'#
- probs = 'probs'#
- logits = 'logits'#
- epoch = 'epoch'#
- aum = 'aum'#
- text_tokenized = 'text_tokenized'#
- gold_spans = 'gold_spans'#
- pred_emb = 'pred_emb'#
- gold_emb = 'gold_emb'#
- pred_spans = 'pred_spans'#
- text_token_indices = 'text_token_indices'#
- text_token_indices_flat = 'text_token_indices_flat'#
- log_helper_data = 'log_helper_data'#
- inference_name = 'inference_name'#
- image = 'image'#
- token_label_str = 'token_label_str'#
- token_label_positions = 'token_label_positions'#
- token_label_offsets = 'token_label_offsets'#
- label = 'label'#
- token_deps = 'token_deps'#
- text = 'text'#
- id = 'id'#
- token_gold_probs = 'token_gold_probs'#
- tokenized_label = 'tokenized_label'#
- input = 'input'#
- target = 'target'#
- generated_output = 'generated_output'#
- input_cutoff = 'input_cutoff'#
- target_cutoff = 'target_cutoff'#
- system_prompts = 'system_prompts'#
- x = 'x'#
- y = 'y'#
- data_x = 'data_x'#
- data_y = 'data_y'#
- static get_valid()#
- Return type:
List[str]
- class BaseGalileoLogger#
Bases:
objectAn abstract base class that all model logger and data loggers inherit from
- LOG_FILE_DIR = '/home/runner/.galileo/logs'#
-
logger_config:
BaseLoggerConfig= BaseLoggerConfig(labels=None, tasks=None, observed_num_labels=None, observed_labels=None, tagging_schema=None, last_epoch=0, cur_epoch=None, cur_split=None, cur_inference_name=None, training_logged=False, validation_logged=False, test_logged=False, inference_logged=False, exception='', helper_data={}, input_data_logged=defaultdict(<class 'int'>, {}), logged_input_ids=defaultdict(<class 'set'>, {}), idx_to_id_map=defaultdict(<class 'list'>, {}), conditions=[], report_emails=[], ner_labels=[], int_labels=False, feature_names=[], metadata_documents=set(), finish=<function BaseLoggerConfig.<lambda>>, existing_run=False, dataloader_random_sampling=False, remove_embs=False)#
- property proj_run: str#
Returns the project and run id
Example
proj-id/run-id
- property write_output_dir: str#
Returns the path to the output directory for the current run
Example
/Users/username/.galileo/logs/proj-id/run-id
- property split_name: str#
Returns the name of the current split
If the split is inference, it will return the name of the inference concatenated to the end of the split name
Example
training inference_inf-name1
- property split_name_path: str#
Returns the path part of the current split
If the split is inference, it will return the name of the inference run after the split name
Example
training inference/inf-name1
- static get_valid_attributes()#
- Return type:
List[str]
- abstract validate_and_format()#
Validates params passed in during logging. Implemented by child
- Return type:
None
- set_split_epoch()#
Sets the split for the current logger
If the split is not set, it will use the split set in the logger config
- Return type:
None
- is_valid()#
- Return type:
bool
- classmethod non_inference_logged()#
Return true if training, test, or validation data is logged
If just inference data is logged then append data rather than overwriting. This flag is also used by the api to know which processing jobs to run.
- Return type:
bool
- abstract log()#
- Return type:
None
- static validate_task(task_type)#
Raises error if task type is not a valid TaskType
- Return type:
- upload()#
- Return type:
None
- classmethod get_all_subclasses()#
- Return type:
List[Type[TypeVar(T, bound= BaseGalileoLogger)]]
- classmethod get_logger(task_type)#
- Return type:
Type[TypeVar(T, bound= BaseGalileoLogger)]
- classmethod doc()#
- Return type:
None
- classmethod validate_split(split)#
Raises error if split is not a valid Split
- Return type:
str
- classmethod check_for_logging_failures()#
When a threaded logging call fails, it sets the logger_config.exception
If that field is set, raise an exception here and stop the main process
- Return type:
None
- classmethod is_hf_dataset(df)#
- Return type:
bool
Module contents#
- class BaseGalileoLogger#
Bases:
objectAn abstract base class that all model logger and data loggers inherit from
- LOG_FILE_DIR = '/home/runner/.galileo/logs'#
-
logger_config:
BaseLoggerConfig= BaseLoggerConfig(labels=None, tasks=None, observed_num_labels=None, observed_labels=None, tagging_schema=None, last_epoch=0, cur_epoch=None, cur_split=None, cur_inference_name=None, training_logged=False, validation_logged=False, test_logged=False, inference_logged=False, exception='', helper_data={}, input_data_logged=defaultdict(<class 'int'>, {}), logged_input_ids=defaultdict(<class 'set'>, {}), idx_to_id_map=defaultdict(<class 'list'>, {}), conditions=[], report_emails=[], ner_labels=[], int_labels=False, feature_names=[], metadata_documents=set(), finish=<function BaseLoggerConfig.<lambda>>, existing_run=False, dataloader_random_sampling=False, remove_embs=False)#
-
split:
Optional[str]#
-
inference_name:
Optional[str]#
- property proj_run: str#
Returns the project and run id
Example
proj-id/run-id
- property write_output_dir: str#
Returns the path to the output directory for the current run
Example
/Users/username/.galileo/logs/proj-id/run-id
- property split_name: str#
Returns the name of the current split
If the split is inference, it will return the name of the inference concatenated to the end of the split name
Example
training inference_inf-name1
- property split_name_path: str#
Returns the path part of the current split
If the split is inference, it will return the name of the inference run after the split name
Example
training inference/inf-name1
- static get_valid_attributes()#
- Return type:
List[str]
- abstract validate_and_format()#
Validates params passed in during logging. Implemented by child
- Return type:
None
- set_split_epoch()#
Sets the split for the current logger
If the split is not set, it will use the split set in the logger config
- Return type:
None
- is_valid()#
- Return type:
bool
- classmethod non_inference_logged()#
Return true if training, test, or validation data is logged
If just inference data is logged then append data rather than overwriting. This flag is also used by the api to know which processing jobs to run.
- Return type:
bool
- abstract log()#
- Return type:
None
- static validate_task(task_type)#
Raises error if task type is not a valid TaskType
- Return type:
- upload()#
- Return type:
None
- classmethod get_all_subclasses()#
- Return type:
List[Type[TypeVar(T, bound= BaseGalileoLogger)]]
- classmethod get_logger(task_type)#
- Return type:
Type[TypeVar(T, bound= BaseGalileoLogger)]
- classmethod doc()#
- Return type:
None
- classmethod validate_split(split)#
Raises error if split is not a valid Split
- Return type:
str
- classmethod check_for_logging_failures()#
When a threaded logging call fails, it sets the logger_config.exception
If that field is set, raise an exception here and stop the main process
- Return type:
None
- classmethod is_hf_dataset(df)#
- Return type:
bool