Analytics Overview

In the Dataloop platform, we have an analytics screen where different metrics like Active users, Annotator's performance, Total working time, Item annotation time, etc. can be tracked at a project, dataset, or task level which would help the business.

To retrieve metrics from the API, you'll need to send a JSON payload containing the desired metrics and time period for analysis. This payload is essential for specifying which analytics data you want to track and analyze.

Example of Fetching Active Users in a Project

Copy
Copied
import dtlpy as dl

project = dl.projects.get(project_name='my project')
payload = {
    "startTime": project.created_at,
    "endTime": None,
    "context": {"projectId": [project.id]},
    "measures": [{"measureType": "activeUsers"}]
}

success, resp = dl.client_api.gen_request(req_type="post", 
                                          path="/analytics/query", 
                                          json=payload)
samples = resp.json()

Understanding the Payload

The JSON payload mainly has 4 keys:

  • startTime : A mandatory key representing the time from which the metrics need to be fetched. The API throws an error if this key is missing.
  • endTime : An optional key. If not provided, the API uses the current_timestamp by default.
  • context : The context refers to the dimensions for which the metrics need to be fetched. Below is the list of dimensions that can be passed to the API for fetching the metrics. These dimensions are optional and may vary from query to query depending on the use case. The dimensions must be passed as a list of strings; otherwise, the API will throw an ERROR..

    Dimensions for Context

    • userId : string[]
    • orgId : string[]
    • projectId : string[]
    • accountId : string[]
    • datasetId : string[]
    • taskId : string[]
    • assignmentId : string[]
    • itemId : string[]
    • serviceId : string[]
    • podId : string[]
    • modelId : string[]
    • snapshotId : string[]
    • pipelineId : string[]
    • triggerId : string[]
    • pipelineExecutionId : string[]
    • nodeId : string[]
    • ontologyId : string[]
  • measures : Specifies the metrics the users wish to fetch. It includes keys like measureType , params , page , pageSize , sortDirection , and timeGranularity .

    TimeGranularity Type

    • SECOND = 'second'
    • MINUTE = 'minute'
    • HOUR = 'hour'
    • DAY = 'day'
    • WEEK = 'week'
    • MONTH = 'month'

    List of Measure Types

    • ANNOTATION_TIMELINE = 'annotationTimeline'
    • ITEM STATUS TIMELINE = 'itemStatusTimeline'
    • AVG ANNOTATION TIME PER LABEL = 'avgAnnotationTimePerLabel'
    • ITEM ANNOTATION DURATION = 'itemAnnotationDuration'
    • COUNT ITEM IN ANNOTATION TIME_BUCKET = 'countItemInAnnotationTimeBucket'
    • AVG ITEM ANNOTATION TIME PER_ANNOTATOR = 'avgItemAnnotationTimePerAnnotator'
    • ASSIGNMENT STATS COMPLETED_STATUS = 'assignmentStatsCompletedStatus'
    • ASSIGNMENT STATS ITEM ACTIVE TIME_STATS = 'assignmentStatsItemActiveTimeStats'
    • ASSIGNMENT STATS ITEM TOTAL TIME = 'assignmentStatsItemTotalTime'
    • ASSIGNMENT STATS ANNOTATION ACTION TIME_STATS = 'assignmentStatsAnnotationActionTimeStats'
    • ASSIGNMENT STATS ANNOTATION CLASSIFY BULK_STATS = 'assignmentStatsAnnotationClassifyBulkStats'
    • ASSIGNMENT STATS ACTIVE_TIME = 'assignmentStatsActiveTime'
    • ASSIGNMENT STATS STUDIO ACTIVE TIME = 'assignmentStatsStudioActiveTime'
    • ASSIGNMENT START TIME = 'assignmentStartTime'
    • ACTIVE_USERS = 'activeUsers'
    • LABELING_COUNTERS = 'labelingCounters'
    • LABELING ACTION PER_LABEL = 'labelingActionsPerLabel'
    • LABELING TIME PER_LABEL = 'labelingTimePerLabel'
    • LABELING AVG TIME PER LABEL = 'labelingAvgTimePerLabel'
    • USER STATS TASK ACTIVITY TIME = 'userStatsTaskActivityTime'
    • USER STATS ACTIVITY_TIME = 'userStatsActivityTime'
    • USER STATS ACTIVITY TIME BY_ROLE = 'userStatsActivityTimeByRole'
    • USER STATS ACTIVITY TIME BY_FIELD = 'userStatsActivityTimeByField'
    • USER STATS TOTAL ACTIVITY TIME = 'userStatsTotalActivityTime'
    • USER STATS STUDIO_TIME = 'userStatsStudioTime'
    • ISSUE_COUNTERS = 'issueCounters'
    • ISSUE CORRECTION TIME = 'issueCorrectionTime'
    • ISSUE RAISE TIME = 'issueRaiseTime'
    • ISSUE RESOLVE TIME = 'issueResolveTime'
    • ISSUE APPROVAL TIME = 'issueApprovalTime'
    • ISSUE_TIMELINE = 'issueTimeline'
    • ISSUE PER LABEL = 'issuePerLabel'
    • ISSUE PER ANNOTATOR = 'issuePerAnnotator'
    • SERVICE REPLICA STATUS = 'serviceReplicaStatus'
    • SERVICE QUEUE SIZE = 'serviceQueueSize'
    • SERVICE NUMBER OF_REPLICAS = 'serviceNumberOfReplicas'
    • SERVICE_USAGE = 'serviceUsage'
    • SERVICE USAGE PROJECTS = 'serviceUsageProjects'
    • SNAPSHOT_DATA = 'snapshotData'
    • EXECUTION OVER TIME = 'executionOverTime'
    • EXECUTION_DURATION = 'executionDuration'
    • EXECUTION COUNT BY_FUNCTIONS = 'executionCountByFunction'
    • EXECUTION AVG DURATION BY NODE = 'executionAvgDurationByNode'
    • PIPELINE EXECUTION AVG_DURATION = 'pipelineExecutionAvgDuration'

The following code is used to extract the metrics shown in the image above. In the payload, datasetId and userId are optional parameters.

  • If timeGranularity is not provided :
    By default, it will pick "hour" as the timeGranularity . In the example code below, "hour" and "day" are passed as timeGranularity , and the response will include both hour-level and day-level data.
  • If neither datasetId nor userId is provided : The data will be fetched for all items available in the datasets for the projectId .
  • If both parameters are provided : The data will be extracted specifically based on the given parameters.
Copy
Copied
import dtlpy as dl
import pandas as pd


project = dl.projects.get(project_name='my project')
dataset = project.datasets.get(dataset_name='my dataset')

payload = {
    "startTime": project.created_at,
    "endTime": None,
    "context": {
        "projectId": [project.id],
        "datasetId": [dataset.id]
    },
    "measures": [
        {
            "measureType": "countItemInAnnotationTimeBucket", 
            "sortDirection": "descending", 
            "timeGranularity": ["hour", "day"]
            }
    ]
}

success, resp = dl.client_api.gen_request(req_type="post", 
                                          path="/analytics/query", 
                                          json=payload)
samples = resp.json()
if samples[0]['response']:
    hour_data = samples[0]['response']
    hour_df = pd.DataFrame.from_dict(data=hour_data)

if samples[1]['response']:
    day_data = samples[1]['response']
    day_df = pd.DataFrame.from_dict(data=day_data)