As the use of Artificial Intelligence (AI) continues to grow across various sectors, researchers are increasingly studying its relevance, biases, and other aspects. While AI tools have made our tasks more convenient, there is growing concern about the data used to train them, as it can influence the types of results they produce.
Manish Gupta, the director of Google Research India, recently addressed these issues during a developer event in Bengaluru hosted by Google. He discussed the improvement of data quality in Indian languages, Google’s efforts to address computing access challenges for Indian researchers, and the differences in handling AI bias between India and Western countries.
During the event, Google also announced the opening of access to its Pathways Language Model (PaLM) large language model for Indian developers. Gupta acknowledged the challenges faced by researchers in Indian institutes regarding the availability of digitized datasets in local languages. However, he highlighted the significant progress made by multinational tech companies collaborating with Indian institutes to digitize large datasets in various Indian languages.
Gupta assured attendees that the data is now open-sourced, meaning it is freely accessible to academic researchers, startups, and large corporations. He emphasized that this is only the beginning, as more Indian language data will be added to their databases over the coming months and the next year, with several tech giants contributing.
Regarding AI bias, Gupta explained that the first step was to understand the issue within a non-Western context. He pointed out that most AI literature on bias, including research on race and gender-based biases, primarily focused on Western contexts until about two years ago. Gupta recognized the significance of societal context in India, where additional biases based on caste, religion, and other factors exist.
The technological gap in language models for Indian languages, compared to more established languages like English, posed a challenge in this regard. Google aimed to gain a deeper understanding of these biases and address the limitations of language models in Indian languages, he added.