Selected Publications
-
Flexibility is essential for optimizing crowdworker performance in the digital labor market, and prior research shows that integrating diverse devices can enhance this flexibility. While studies on Amazon Mechanical Turk show the need for tailored workflows and varied device usage and preferences, it remains unclear if these insights apply to other platforms. To explore this, we conducted a survey on another major crowdsourcing platform, Prolific, involving 1,000 workers. Our findings reveal that desktops are still the primary devices for crowdwork, but Prolific workers display more diverse usage patterns and a greater interest in adopting smartwatches, smart speakers, and tablets compared to MTurk workers. While current use of these newer devices is limited, there is growing interest in employing them for future tasks. These results underscore the importance for crowdsourcing platforms to develop platform-specific strategies that promote more flexible and engaging workflows, better aligning with the diverse needs of their crowdworkers.
-
Crowdsourcing platforms have traditionally been designed with a focus on workstation interfaces, restricting the flexibility that crowdworkers need. Recognizing this limitation and the need for more adaptable platforms, prior research has highlighted the diverse work processes of crowdworkers, influenced by factors such as device type and work stage. However, these variables have largely been studied in isolation. Our study is the first to explore the interconnected variabilities among these factors within the crowdwork community. Through a survey involving 150 Amazon Mechanical Turk crowdworkers, we uncovered three distinct groups characterized by their interrelated variabilities in key work aspects. The largest group exhibits a reliance on traditional devices, showing limited interest in integrating smartphones and tablets into their work routines. The second-largest group also primarily uses traditional devices but expresses a desire for supportive tools and scripts that enhance productivity across all devices, particularly smartphones and tablets. The smallest group actively uses and strongly prefers non-workstation devices, especially smartphones and tablets, for their crowdworking activities. We translate our findings into design insights for platform developers, discussing the implications for creating more personalized, flexible, and efficient crowdsourcing environments. Additionally, we highlight the unique work practices of these crowdworker clusters, offering a contrast to those of more traditional and established worker groups.
-
Human feedback plays a critical role in learning and refining reward models for text-to-image generation, but the optimal form the feedback should take for learning an accurate reward function has not been conclusively established. This paper investigates the effectiveness of fine-grained feedback which captures nuanced distinctions in image quality and prompt-alignment, compared to traditional coarse-grained feedback (for example, thumbs up/down or ranking between a set of options). While fine-grained feedback holds promise, particularly for systems catering to diverse societal preferences, we show that demonstrating its superiority to coarse-grained feedback is not automatic. Through experiments on real and synthetic preference data, we surface the complexities of building effective models due to the interplay of model choice, feedback type, and the alignment between human judgment and computational interpretation. We identify key challenges in eliciting and utilizing fine-grained feedback, prompting a reassessment of its assumed benefits and practicality. Our findings -- e.g., that fine-grained feedback can lead to worse models for a fixed budget, in some settings; however, in controlled settings with known attributes, fine grained rewards can indeed be more helpful -- call for careful consideration of feedback attributes and potentially beckon novel modeling approaches to appropriately unlock the potential value of fine-grained feedback in-the-wild.
-
Despite a plethora of research dedicated to designing HITs for non-workstations, there is a lack of research looking specifically into workers' perceptions of the suitability of these devices for managing and completing work. In this work, we fill this research gap by conducting an online survey of 148 workers on Amazon Mechanical Turk to explore 1. how crowdworkers currently use their non-workstation devices to complete and manage crowdwork, 2. what challenges they face using those devices, and 3. to what extent they wish they could use those devices if their concerns were addressed. Our results show that workers unanimously favor using a desktop to complete and manage crowdwork. While workers occasionally use smartphones or tablets, they find their suitability marginal at best and have little interest in smart speakers and smartwatches, viewing them as unsuitable for crowdwork. When investigating the reason for these views, we find that the key issue is that non workstation devices lack the tooling necessary to automatically find and accept HITs, tooling that workers view as essential in their efforts to compete with bots in accepting high paying work. To address this problem, we propose a new paradigm for finding, accepting, and completing crowdwork that puts crowdworkers on equal footing with bots in these tasks. We also describe future research directions for tailoring HITs to non workstation devices and definitely answering whether smart speakers and smartwatches have a place in crowdwork.
-
In our era of rapid technological advancement, the research landscape for writing assistants has become increasingly fragmented across various research communities. We seek to address this challenge by proposing a design space as a structured way to examine and explore the multidimensional space of intelligent and interactive writing assistants. Through community collaboration, we explore five aspects of writing assistants: task, user, technology, interaction, and ecosystem. Within each aspect, we define dimensions and codes by systematically reviewing 115 papers, while leveraging the expertise of researchers in various disciplines. Our design space aims to offer researchers and designers a practical tool to navigate, comprehend, and compare the various possibilities of writing assistants, and aid in the design of new writing assistants.
-
Image generation models are poised to become ubiquitous in a range of applications. These models are often fine-tuned and evaluated using human quality judgments that assume a universal standard, failing to consider the subjectivity of such tasks. To investigate how to quantify subjectivity, and the scale of its impact, we measure how assessments differ among human annotators across different use cases. Simulating the effects of ordinarily latent elements of annotators subjectivity, we contrive a set of motivations (t-shirt graphics, presentation visuals, and phone background images) to contextualize a set of crowdsourcing tasks. Our results show that human evaluations of images vary within individual contexts and across combinations of contexts. Three key factors affecting this subjectivity are image appearance, image alignment with text, and representation of objects mentioned in the text. Our study highlights the importance of taking individual users and contexts into account, both when building and evaluating generative models.
-
Large Language Models (LLMs) are being increasingly utilized in various applications, with code generations being a notable example. While previous research has shown that LLMs have the capability to generate both secure and insecure code, the literature does not take into account what factors help generate secure and effective code. Therefore in this paper we focus on identifying and understanding the conditions and contexts in which LLMs can be effectively and safely deployed in real-world scenarios to generate quality code. We conducted a comparative analysis of four advanced LLMs--GPT-3.5 and GPT-4 using ChatGPT and Bard and Gemini from Google--using 9 separate tasks to assess each model's code generation capabilities. We contextualized our study to represent the typical use cases of a real-life developer employing LLMs for everyday tasks as work. Additionally, we place an emphasis on security awareness which is represented through the use of two distinct versions of our developer persona. In total, we collected 61 code outputs and analyzed them across several aspects: functionality, security, performance, complexity, and reliability. These insights are crucial for understanding the models' capabilities and limitations, guiding future development and practical applications in the field of automated code generation.
-
The prevalence and impact of toxic discussions online have made content moderation this http URL systems can play a vital role in identifying toxicity, and reducing the reliance on human this http URL, identifying toxic comments for diverse communities continues to present challenges that are addressed in this this http URL two-part goal of this study is to(1)identify intuitive variances from annotator disagreement using quantitative analysis and (2)model the subjectivity of these this http URL achieve our goal, we published a new dataset\footnote{\url{this https URL}} with expert annotators' annotations and used two other public datasets to identify the subjectivity of this http URL leveraging the Large Language Model(LLM),we evaluate the model's ability to mimic diverse viewpoints on toxicity by varying size of the training data and utilizing same set of annotators as the test set used during model training and a separate set of annotators as the test this http URL conclude that subjectivity is evident across all annotator groups, demonstrating the shortcomings of majority-rule voting. Moving forward, subjective annotations should serve as ground truth labels for training models for domains like toxicity in diverse communities.
-
Non-workstation devices provide mobility in the sense that they enable work and play in less constrained environments and configurations. Workers have different needs and preferences in terms of work practices that impact work-life balance. One way to increase mobility is to allow for more choices in both workstation and non-workstation devices. This paper investigates whether workers’ preferences for using different devices are distinct. To understand whether these differences in preference exist, we use data from an exploratory qualitative survey of 150 crowdworkers from Amazon Mechanical Turk. We first identify the aspects that influence crowdworkers’ preferences in using non-workstation devices. Next, our thematic analysis of open-ended data indicates that different workers have different preferences for multi-device configuration. Finally, this leads to a discussion of how people have different preferences which would require different non-workstation devices.
-
There is a growing interest in extending crowdwork beyond traditional desktop-centric design to include mobile devices (e.g., smartphones). However, mobilizing crowdwork remains significantly tedious due to a lack of understanding about the mobile usability requirements of human intelligence tasks (HITs). We present a taxonomy of characteristics that defines the mobile usability of HITs for smartphone devices. The taxonomy is developed based on findings from a study of three consecutive steps. In Step 1, we establish an initial design of our taxonomy through a targeted literature analysis. In Step 2, we verify and extend the taxonomy through an online survey with Amazon Mechanical Turk crowdworkers. Finally, in Step 3 we demonstrate the taxonomy’s utility by applying it to analyze the mobile usability of a dataset of scraped HITs. In this paper, we present the iterative development of the taxonomy, highlighting the observed practices and preferences around mobile crowdwork. We conclude with the implications of our taxonomy for accessibly and ethically mobilizing crowdwork not only within the context of smartphone devices, but beyond them.
-
We present a design fiction, which is set in the near future as significant Mars habitation begins. Our goal in creating this fiction is to address current work-life issues on Earth and Mars in the future. With shelter-in-place measures, established norms of productivity and relaxation have been shaken. The fiction creates an opportunity to explore boundaries between work and life, which are changing with shelter-in-place and will continue to change. Our work includes two primary artifacts: (1) a propaganda recruitment poster and (2) a fictional narrative account. The former paints the work-life on Mars as heroic, fulfilling, and fun. The latter provides a contrast that depicts the lived experience of early Mars inhabitants. Our statement draws from our design fiction in order to reflect on the structure of work, stress identification and management, family and work-family communication, and the role of automation.
-
Crowdsourcing enables users (task requesters) to outsource complex tasks to an unspecified crowd of workers. To guarantee the quality of crowdsourcing service, it is necessary to select the most appropriate task workers to complete the tasks. To this end, the crowdsourcing platform (broker) must conduct the mutual matching between tasks and workers based on the task requirements and worker preferences. However, both task requirements and worker preferences may contain sensitive information (e.g., time, location of the task, etc.), which should not be revealed to the broker and other adversaries. In this paper, we propose a secure and efficient task matching scheme to enable the broker to conduct the mutual matching between tasks and workers, according to task requirements and worker preferences with multiple keywords, while preserving the privacy of keywords contained in task requirements and worker preferences. Specifically, we design a new multi-reader and multi-writer searchable encryption primitive that can support the batch matching of multiple keywords. The security proof shows that our proposed task matching scheme is provably secure in the random oracle model under the Bilinear Diffie-Hellman (BDH) assumption. The performance evaluation shows that our multi-keyword batch matching can significantly reduce the computation cost compared to existing methods.
-
University campuses in India lack assistive systems that facilitate smooth navigation of visually impaired persons. This paper proposes design of a system called Divya-Dristi, that helps visually impaired persons navigate familiar environments such as a university campus. The system requires users to carry only a Android based smart phone. Based on a 3-tier architecture, we aim to develop a conveyable, self-contained system that provide dynamic interactions. The major functional components of this way finding system are (a) an Android application for determining the users position and orientation in space and for generating audio feedback (b) a cloud based geo-spatial data store for holding key location information and answering location queries, (c) a SONAR based sensor module attachment that helps users avoid possible obstacles during outdoor navigation through audio and haptic alerts. The system has been implemented and found to be convenient for the users.
-
What are dimensions of human intent, and how do writing tools shape and augment these expressions? From papyrus to auto-complete, a major turning point was when Alan Turing famously asked, “Can Machines Think [30 ]?” If so, should we offload aspects of our thinking to machines, and what impact do they have in enabling the
intentions we have? This paper adapts the Authorial Leverage framework [5], from the Intelligent Narrative Technologies literature, for evaluating recent generative model advancements. With increased widespread access to Large Language Models (LLMs), the evolution of our evaluative frameworks follow suit. To do this, we discuss previous expert studies of deep generative models for fiction writers [ 6, 34 ] and playwrights [16], and propose both author- and audience-focused directions for furthering our understanding of Authorial Leverage of LLMs, particularly in the domain of comedy writing.
-
Crowdsourcing enables users (task requesters) to outsource complex tasks to an unspecified crowd of workers. To select the most appropriate task workers, the crowdsourcing platform (broker) must conduct the mutual matching between task requesters and workers based on the task requirements and worker preferences. However, both task requirements and worker preferences may contain sensitive information (e.g.,time, location of the task, etc.), which should not be revealed to the broker and other adversaries. To this end, we propose a privacy-preserving task matching scheme to enable the broker to conduct the mutual matching between tasks and workers according to the task requirements and worker preferences while preserving the privacy of keywords contained in the task requirements and worker preferences. We first design a privacy-preserving task matching scheme with a single keyword matching for multiple requesters and multiple workers. Specifically, new searchable encryption primitive is designed to support privacy-preserving equality matching of single keyword among multiple users. We further propose an efficient and privacy-preserving task matching scheme for conjunctive keyword matching, which not only improves the matching efficiency by batch matching but also hides the information that whether a keyword is associated with a task. The security proof shows that our proposed task matching schemes are provably secure in the random oracle model under the Bilinear Diffie-Hellman (BDH) assumption. The performance evaluation shows that the proposed scheme is efficient.
-
Crowdworkers are drawn to the profession in part due to the flexibility it affords. However, the current design of crowdsourcing platforms limits this flexibility. Therefore, it is important to support the overall flexibility of crowdworkers. Incorporating a variety of device types in the workflow plays an important role in supporting the flexibility of crowdworkers, however each device type requires a tailored workflow. The standard workflow of crowdworkers consists of stages of work such as managing and completing tasks. I hypothesize that different devices will have unique traits for task completion and task management. Therefore in this dissertation, I explore what those traits are. Future work can build upon this research by creating tailored workflows and interfaces to best support each device type. To achieve this, this dissertation introduces four pivotal innovations : (1) understanding traits of task completion on smartphones to support the tailored workflow on smartphones in crowdwork (2) understanding of crowdworkers' current task completion and task management practices and expectations when working on smartphone, tablet, speaker and smartwatch to support the flexibility of crowdworkers on all these devices based on crowdworkers’ work practices and expectations. (3) After a broad understanding of crowdworkers’ practices and expectations across different devices, this thesis identifies the systematic differences among crowdworkers in order to develop customizable support depending on workers' individual need for flexibility in crowdsourcing platforms (4) Finally, this dissertation looks into other popular crowdsourcing platform named Prolific to understand work practices of Prolific workers as well as compare Prolific with Amazon MTurk to gain a comprehensive understanding of the traits that support flexibility in different crowdsourcing environments.