Average number of downloads per day, month, and year
To identify how many times and by how many users are datasets downloaded from FORSbase (RQ1), we analyse the download date, user id, and dataset id ( Table 1 , variables 1–3). Concerning the number of downloads, we analyse the full number and share of downloaded datasets and unique user-dataset downloads ( Table 1 , variable 3) to control for downloading dataset updates and to exclude duplicate downloads. By analysing the unique dataset downloads we can identify whether the same user downloaded the same dataset twice or two versions of it. Concerning the number of users, we analyse the average number of downloads for the registered users and the active users ( Table 1 , variable 2). Registered users are those who have registered to FORSbase for archiving and downloading data. The number of registered users was asked from the archive personnel in time of the data collection in 2020. Active users are those who downloaded data during the time window of the data collection. Each user is identified in the data with a unique user ID number automatically provided by the system during registration ( Table 1 , variable 2).
To identify what type of datasets are downloaded from the archive most often (RQ2) we use the id of the dataset , the type of dataset (quantitative or qualitative data) and the name of the dataset ( Table 1 , variables 3, 4, 5). The name of the downloaded dataset ( Table 1 , variable 5) was also used to study the 10 most downloaded datasets in more detail. For these datasets, information (i.e., descriptive details) were traced from the FORSbase online catalogue.
To analyse what roles do the users of the archive represent (RQ3), we use the role of the downloading user ( Table 1 ). Originally, users were provided a list of 11 roles from which they selected the most suitable one. For the analyses, some categories were combined to form a shorter list of seven different roles (i.e., student, doctoral student, lecturer/post doc, professor, other research/project manager, teacher, and non-academic).
Finally, to identify for what purposes datasets are downloaded (RQ4), we use information on the use purpose of the data , the research description and whether a publication is expected ( Table 1 , variables 7,8, 9). When users were downloading datasets from FORSbase, they were asked whether the dataset was downloaded either for research or for teaching purposes ( Table 1 , variable 7). Although these categories did not serve well for the students downloading datasets for their course work, they were forced to choose between the two options. Therefore, for the means of this study, a new use purpose type “studying” was constructed manually in two steps. First, all the users that identified themselves as students were identified from the data ( Table 1 , variable 6). In the second step, the coding was assigned by thoroughly reading the research descriptions ( Table 1 , variable 8) written by the students to find out the purpose of the download. Based on these descriptions we also categorised the sub-type of studying purpose if possible (e.g., bachelor theses, master’s theses). However, the research description was asked only for those downloads where the users were indicating research ( Table 1 , variable 7) as the purpose for the download. Consequently, this information is missing for the downloads where users indicated teaching as purpose. Obviously, this applies also to students who had selected teaching as use purpose. These were categorised as studying as we assume that students do not teach yet but chose teaching as there was no option for studying. Downloading data for doctoral dissertation were categorized as “research” purpose.
Variable nine ( Table 1 ) was used to study the purpose of research use of the dataset by asking whether the user was expecting a publication resulting from the downloaded dataset. This information was asked only for those downloading data for research purposes. Thus, this information is missing for the downloads the users indicated teaching as the purpose.
For the analyses step the data were gathered into one dataset and analysed with Stata 16. Given that we analyse full data, we do not apply inferential statistics. Whenever we are interested in differences between groups, we apply bootstrapped 95 per cent stability intervals to indicate the precision of the estimates. Differences were then tested also using bootstrapping procedures either with regression models (numbers of downloads per user group) or tests on the equality of proportions [ 55 ] for the intention to publish across user groups.
In February 2020, at the time of our data collection, FORSbase had 6628 registered users. The archive contained 725 datasets, the majority of which were quantitative. Within the time window that covers 49 months, a total of 6656 downloads were made from FORSbase ( Table 2 ). This results in an average 136 downloads per month or 5 downloads per day. When excluding incomplete months from 2016 and 2020 in our dataset, we cover a total of 6593 downloads over 47 months, leading to a mean of 140 downloads per month ( range = 40–286, median = 122). From 2017 to 2018, the number of downloads increased by 18 per cent and from 2018 to 2019 by 16 per cent. The downloads per months show a high volatility as can be seen from Fig 2 that shows the downloads per month for fully covered months, i.e., March 2016 to January 2020, and a smoothed moving average. The figure makes visible an increase of downloads over time with a tendency to stabilise. Note that March, April and October, November show the highest downloads while July and August show the lowest downloads, reflecting semester beginnings for highs and semester break for lows.
Smoothed moving average is calculated using weights as suggested in [ 56 ].
Year | Frequency | Percent | Avg. per month |
---|---|---|---|
2016 | 839 | 12.6 | 84 |
2017 | 1577 | 23.7 | 131 |
2018 | 1860 | 27.9 | 155 |
2019 | 2161 | 32.5 | 180 |
2020 | 219 | 3.3 | n/a |
Total | 6656 | 100.0 |
Notes . * The time window does not cover the full year for 2016 (February 29 th –December 31 st ) and 2020 (January 1 st -February 9 th ).
** Only full months are taken into account. January-February 2016 and January and February are excluded from the calculations.
Of the 725 datasets archived in FORSbase, 470 datasets were downloaded at least once representing 65 per cent of all archived datasets. One fifth of the downloaded datasets were downloaded once and 13 per cent twice. Consequently, 67 per cent were downloaded three times or more (see Table 3 ). Datasets, however, can be updated and new versions are released. Users are informed so that they can download the new version. This leads to the fact that some datasets are downloaded more often than others. Additionally, users can download the same dataset twice (e.g., on two different workstations). To control for updates and to have a measure that reflects better the number of times a dataset is used (as opposed to downloaded), we identified duplicates, i.e., if the same user downloaded the same dataset twice or two versions of it. This was counted as one unique user-dataset download (see Table 3 , columns on the right). Both measures are somewhat imperfect because, on the one hand, regarding the full count measure, a dataset that is published quickly and corrected afterwards will score more downloads than one that is not updated. On the other hand, regarding the corrected measure, it might be that a same user downloads the same data multiple times for different persons, e.g., as teacher and student (a situation that is not compliant to the user agreement) or for different uses. Additionally, it is not clearly defined by the database what a “version” is. It is usually an update of the same dataset, but it could also be used to have a dataset updated with new waves while another dataset would create a new dataset for each new wave added. We did our best to control for the later and try to treat a study (and each wave) as a dataset if archived separately.
Number of Downloads | Frequency | Percentage of Total Archived Datasets | Percentage of Datasets at Least Downloaded Once | Frequency of Unique User-Dataset Downloads | Percentage of Unique User-Datasets Downloads | Percentage of Unique User-Datasets at Least Downloaded Once |
---|---|---|---|---|---|---|
0 | 255 | 35.2 | n/a | 255 | 35.2 | n/a |
1 | 101 | 13.9 | 21.5 | 106 | 14.6 | 22.6 |
2 | 55 | 7.6 | 11.7 | 68 | 9.4 | 14.5 |
3+ | 314 | 43.3 | 66.8 | 296 | 40.8 | 63.0 |
Total | 725 | 100.0 | 100.0 | 725 | 100 | 100 |
* Column sums exceeding 100% are due to rounding
Table 4 shows that the main download statistics between the two measures differ only slightly. The mean amounts to 9 downloads per data (8 if only unique user-dataset downloads are counted), but the distribution is highly skewed with a first quartile of 0 downloads, a median of 2 downloads and a third quartile of 6 downloads irrespective of how to count dataset-downloads.
Mean | 1 Quartile | Median | 3 Quartile | Min | Max | |
---|---|---|---|---|---|---|
All Downloads | 9.2 | 0 | 2 | 6 | 0 | 638 |
Unique User-Dataset Downloads | 8.1 | 0 | 2 | 6 | 0 | 527 |
FORSbase allows the archiving of both quantitative and qualitative data. Qualitative data can be archived since 2017 only. From the 725 datasets, only 15 datasets were archived as qualitative datasets, which corresponds to 2 per cent. Of the 470 datasets that were downloaded at least once, 5 were qualitative datasets (1%). On the level of downloads, the vast majority (98%) of the downloads concerned quantitative datasets. Qualitative datasets were downloaded only 15 times (13 times if we consider only unique user-dataset downloads). Two of which were downloaded once, two twice and one nine times (7 times if only unique user-dataset downloads are counted).
Ten datasets were downloaded more than 100 times (see Table 5 ). Downloads for these 10 datasets represent almost 40 per cent of all downloads from FORSbase in the given time window. FORS was the collector of eight out of the ten most downloaded datasets. The other two datasets were collected by Swiss universities. The most downloaded datasets were all quantitative and either cumulative datasets or single year issues of longitudinal (cross-sectional or panel) surveys collected at regular intervals. Those surveys can be considered social sciences data infrastructures of national or even international importance and are designed for secondary data analysis.
Title of the dataset | Number of downloads | Percentage of total downloads (N = 6656) | Number of unique user-dataset downloads | Percentage of unique user-dataset downloads (N = 5842) | Collector |
---|---|---|---|---|---|
1. SHP Data Waves 1–19 | 638 | 9.6 (8.9–10.3) | 527 | 9.0 (8.3–9.8) | FORS |
2. Selects 2015 Post-electoral study | 400 | 6.0 (5.5–6.6) | 308 | 5.3 (4.7–5.9) | FORS |
3.CCS Wave II—Cumulative Dataset 2013–2018 | 268 | 4.0 (3.6–4.5) | 206 | 3.5 (3.1–4.0) | FORS |
4.CCS Wave I—Cumulative Dataset 2005–2013 | 265 | 4.0 (3.5–4.5) | 212 | 3.6 (3.2–4.1) | FORS |
5. Selects, cumulated file 1971–2015 | 222 | 3.3 (2.9–3.8) | 185 | 3.2 (2.7–3.6) | FORS |
6. Selects 2015 Panel / Rolling cross-section study | 216 | 3.2 (2.8–3.7) | 169 | 2.9 (2.5–3.4) | FORS |
7. TREE, cohort 1 | 213 | 3.2 (2.8–3.7) | 172 | 2.9 (2.5–3.4) | University of Bern |
8. Selects 2015 Candidate survey | 185 | 2.8 (2.4–3.2) | 164 | 2.8 (2.4–3.3) | FORS |
9.VoxIt: standardized post-vote surveys | 124 | 1.9 (1.6–2.2) | 106 | 1.8 (1.5–2.2) | Universities of Geneve and Zurich, FORS |
10. Swiss Volunteering Survey 2016 | 120 | 1.8 (1.5–2.2) | 109 | 1.9 (1.5–2.3) | University of Bern |
Total | 2651 | 39.8 | 2158 | 36.9 |
* Traced from FORSbase online catalogue.
** Bootstrapped 95% stability intervals based on 1000 resamples
*** Column sums differing from cell sums are due to rounding
The most downloaded dataset, SHP Data Waves 1–19, is the Swiss annual household panel study based on a random sample of private households in Switzerland, interviewing all household members mainly by telephone. SHP is provided free of charge from FORSbase for the scientific community [ 57 ]. Other datasets are related with Swiss elections or popular votes (datasets 2, 3, 4, 5, 6, 9) or with education and civil society (datasets 7, 10).
The fact that the share of the ten most downloaded datasets decreases slightly if duplicates and versions of the same dataset are excluded ( Table 5 “Percentage of total downloads” vs. “Percentage of unique user-dataset downloads”) shows that the most downloaded datasets are updated more often than the other datasets. However, the ranking of the most downloaded datasets does not change substantially showing that duplicates and versions spread quite evenly across those highly downloaded datasets. The bootstrapped 95%-stability intervals (see Table 5 , column 3 in brackets) show that the ranking consists of four parts: A clear leader (dataset 1) and a clear second place (dataset 2) followed by a middle part (datasets 3 to 8) and studies 9 and 10 form the fourth group.
During the examined time window, 2281 unique users downloaded data from FORSbase. These users are called as “active users” in Table 6 . In February 2020, there were 6628 registered users in FORSbase. Thus, only a third of the registered users downloaded a dataset during the time window (note that to upload data, one needs to register as a user). One half of the active users downloaded only one dataset during the given time period ( Table 6 , column on the righthand side). One fifth downloaded two datasets and 28 per cent downloaded three or more datasets. There was a group of heavy users downloading more than 5 datasets (5% of the registered users and 13% of the active users). At the end of the scale, one user downloaded 149 datasets during the time window. The group of 306 users downloading at least five datasets combined more than half (51.7%) of all the downloads during the time window. On average, considering all registered users, one user downloaded one dataset, while considering only active users, a user downloaded 2.9 datasets.
Number of Downloaded Datasets | Number of Registered Users | Percentage of Registered Users | Percentage of Active Users |
---|---|---|---|
0 | 4347 | 65.6 | n/a |
1 | 1187 | 17.9 | 52.0 |
2 | 457 | 6.9 | 20.0 |
3 | 210 | 3.2 | 9.2 |
4 | 121 | 1.8 | 5.3 |
5+ | 306 | 4.6 | 13.4 |
Total | 6628 | 100.0 | 100.0 |
Looking at unique user-dataset downloads ( Table 7 ), 58 per cent of the active users downloaded only one unique dataset whereas 21 per cent downloaded two and 22 per cent three or more. The group of heavy users (5+ downloaded datasets) amounts to 4 per cent of all registered users and 11 per cent of the active users. The person who downloaded most datasets downloaded 140 unique datasets. If only unique user-dataset downloads are considered, the average is 0.9 downloads per registered user and 2.6 downloads per active user.
Number of Downloaded Datasets | Number of Registered Users Downloading Unique Datasets | Percentage of Registered Users Downloading Unique Datasets | Percentage of Active Users Downloading Unique Datasets |
---|---|---|---|
0 | 4347 | 65.6 | n/a |
1 | 1311 | 19.8 | 57.5 |
2 | 474 | 7.2 | 20.8 |
3 | 160 | 2.4 | 7.0 |
4 | 95 | 1.4 | 4.2 |
5+ | 241 | 3.6 | 10.6 |
Total | 6628 | 100 | 100 |
A clear majority of users downloaded only quantitative datasets (99%), 8 users downloaded both quantitative and qualitative data and 4 users only qualitative data.
Regarding the role of users, the majority of the downloads were made by users registered as students, while doctoral students, lecturers/postdocs and professors and other researchers were downloading less, and teachers and non-academics the least ( Table 8 ).
User group | Frequency | Percent |
---|---|---|
Student | 3874 | 58.2 |
Doctoral student | 954 | 14.3 |
Lecturer / post-doc | 603 | 9.1 |
Professor | 513 | 7.7 |
Other researcher, project manager | 403 | 6.1 |
Teacher | 196 | 2.9 |
Non-academic | 113 | 1.7 |
Total | 6656 | 100.0 |
Regarding download frequency across user groups, students were more likely to download many datasets compared to scholars, teachers, and non-academics (see Fig 3 ). Note that using bootstrapped regression, only the difference between students and scholars, teachers and non-academics were significant. If one takes only unique user-dataset downloads into account, students downloaded significantly more unique datasets than all other groups except for non-academics (as the latter have a large variability). However, the user roles are not clear-cut entities as the same person can indicate a different role for each download. This means that for unique user-dataset downloads only the first role is retained.
Average number of Downloads per user group with bootstrapped 95% stability intervals using 1000 resamples on the basis of a) all downloads and b) only unique user-dataset downloads.
The majority of the downloads were made for studying purposes (see Table 9 ). Of those downloading data for study purposes, at least 13 per cent (n = 497) downloaded the dataset for a bachelor’s thesis and at least 12 per cent (n = 452) for master’s thesis (combining 14.3% of all downloads used for a BA or MA thesis). However, these numbers represent minima because not all users did describe their purpose of download in such detail and the users not describing the purpose in detail might have used the data for a thesis as well.
Purpose of download | Frequency | Percent |
---|---|---|
Studying | 3878 | 58.3 |
Research | 2565 | 38.5 |
Teaching | 213 | 3.2 |
Total | 6656 | 100 |
Almost 40 per cent of the downloads served research purposes. Out of downloads used for research, at least 5 per cent download data for doctoral thesis (2% of the total downloads). However, the real share of downloading data for doctoral theses is probably much higher since more than 14 per cent of the users were registered as doctoral students.
Finally, only 3 per cent of the downloads served teaching purposes. This is surprising given that the biggest user group are students, and one would expect that it is the teachers who inform students about the dataset(s) used in the courses. However, users can only indicate one purpose for the download but can of course use it for many purposes after download. Also, it might mean that some teachers invite students to download the data themselves, while others download it and distribute the data to the students–which would mean that even more users would be students as the data covers only those students who downloaded the data themselves.
Users downloading datasets were also asked if they expect to write publications using the downloaded dataset. This was asked only if they were indicating that they were using the data for research and not teaching. Also, the question has a high share of non-response (463 or 7% of those who indicated research as the use of the download). Of those who replied to the question, a large majority (77.4%) did not expect to publish and just over one fifth expected to do so. Those downloading the dataset for research purposes were most likely to expect to write a publication (43%). Expectedly, professors, lecturers/postdoctoral researchers, and doctoral students expected publication more often compared to students ( Table 10 ). Indeed, professors, lecturers/post-docs and, more unexpectedly, non-academics have a similar percentage intending to publish as the bootstrapped differences are not significant. All other groups do differ significantly from these three groups and between each other. The relationship between role and intention to publish is quite strong with a Cramér’s V of 0.43.
User role | Percentage intending to publish | Bootstrapped 95% Stability Intervals |
---|---|---|
Professor (n = 513) | 47.2 | 42.9–51.5 |
Lecturer / post-doc (n = 603) | 48.1 | 44.0–52.3 |
Doctoral student (n = 954) | 40.5 | 37.5–43.5 |
Other researcher project manager (n = 403) | 31.8 | 27.4–36.4 |
Student (n = 3343) | 7.2 | 6.4–8.1 |
Non-Academic (n = 96) | 52.1 | 42.2–61.8 |
Total (N = 5912) | 22.6 | Cramér’s V = 0.43 |
Note. Bootstrapped stability intervals were calculated using 1000 resamples.
This study investigated whether there is a demand for open data in the social sciences by examining the use and users of a research data archive. It continued a discussion started by Late and Kekäläinen [ 15 ] studying the use of social science research data archives based on user log data. The results show that there is a demand for research data as datasets have been downloaded frequently from the FORSbase, i.e., on average 145 downloads per month. As in Finland [ 15 ], the number of downloads has increased in Switzerland from 2016 to 2019. During the time window of the study, a large majority (65%) of the datasets archived in FORSbase were downloaded at least for once. The share of downloaded datasets was similar with the Finnish results (70%) [ 15 ].
An overwhelming majority of the downloaded datasets are quantitative. The number of archived qualitative datasets in FORSbase is very low, which explains the low numbers in the downloads. Earlier studies have discussed the obstacles of data sharing and re-use in social sciences [ 38 – 40 , 58 ]. Our results suggest that there might be strong differences in the habit of downloading open data from repositories across different specialisations: in qualitative social sciences, data sharing seems to be far less prominent than in quantitative social sciences. There is little evidence about the re-use of qualitative datasets and further studies are needed to understand the potential and pitfalls of open data policies for qualitative studies [ 53 , 58 ]. The lack of data sharing, and re-use has certainly several reasons but ethical issues play an important role [ 59 ].
In this study, from the 725 archived datasets, the ten most frequently downloaded ones were investigated in more detail. Each of these datasets was downloaded more than 100 times, the most popular being downloaded more than 600 times. The downloads of these ten datasets amounts to almost 40 per cent of all downloads from the archive, which indicates that, similar to publications [ 60 ], a small share of datasets gains most of the attention. The same phenomenon was observed by Late and Kekäläinen [ 15 ]. The most frequently downloaded datasets share a few properties: all of them are longitudinal or time-series survey data collected not by individual scholars or research groups but by organizations or consortia such as FORS. Also, those datasets are local survey projects and the analysed archive, FORSbase, is the main source for obtaining this data. International longitudinal or time-series datasets were not among the ten most downloaded, even though local versions of these datasets would be available in the archive. Researchers interested in those cross-national datasets are more likely to download the datasets containing data from several countries from the international repository. Again, these results are in line with the study of Late and Kekäläinen [ 15 ]. In Finland, most downloaded datasets were local and national surveys. However, in the Finnish archive, the most downloaded datasets also included large international statistics collected by a single scholar. Qualitative datasets were also more often downloaded from the Finnish archive compared to the Swiss archive.
The fact that the most downloaded datasets were collected by prestigious and well-known organizations is in line with the argument raised in earlier studies [ 5 , 9 ] that scholars’ trust in data is essential for the data re-use. However, what is considered as trustworthy may differ between disciplines. For the social scientists, reputation along with data selection and cleaning process play an important role in trust creation [ 61 ]. Systematic documentation and providing high quality paradata (i.e. data about the data) is valued by the data users [ 8 , 9 , 12 , 62 ]. Other factors influencing the users’ trust in the data archives are recommendations, frequency of use, past experiences, and perceptions of the role of the archive [ 10 ]. However, frequently downloaded datasets are probably more well-known and thus, more visible for the users. Data findability is another critical point for data re-use that should be supported better [ 12 , 52 ]. Furthermore, archives can increase their own visibility and prestige by archiving high quality and well-known datasets by establishing collection strategies and profiling for certain topics and data types to gain competitive advantage and reputation. However, the value of non-used (or non-downloaded) datasets cannot be overlooked, since they may become valuable in the future as needs are difficult to predict (i.e. delayed recognition in science [ 63 ]).
Earlier studies have not investigated the number of users of the data archives although it can be considered as an is important metric for evaluating the impact of archives. Our results show that FORSbase was used by more than 2000 unique users as one third of the registered users downloaded data from FORSbase. Most of them downloaded only one dataset. However, there was a smaller group of heavy users of the archive downloading several datasets and forming a remarkable share of all downloads. This might be an indication of field specific differences; in some fields of social sciences data can and is re-used more often. Also, it might indicate personal differences between the users. Users that have found datasets useful come back for downloading more relevant data or new versions of the datasets. Indeed, other studies have shown that scholars sharing their data are also more active re-users of data shared by others [ 12 ]. Our results show, however, that not all registered users download data which might indicate that some users of FORSbase use it for archiving, not data retrieval. Late and Kekäläinen [ 15 ] showed that users represented several countries, disciplines, and organisations. Our data did not allow for such analyses.
Earlier research has focused mainly on scholars’ data sharing and re-use practices and shown experienced scholars being the most active data re-users [ 12 ]. Yet, our findings confirm the results by Late and Kekäläinen’s [ 15 ] that students form the largest user group for the data archive. Students as a special user group should be taken into special consideration by data archives and service providers since there is a great potential in this user group as future data users and providers. Re-using data is important for developing knowledge creation skills and in socializing into the discipline [ 48 ]. Novice users have specific needs for data re-use and are influenced by experiences of their mentors [ 8 ]. Therefore, data archives need to pay special attention when thinking what services could be offered especially for the students and what guidance students need. More research, for example on the data management skills of students, is certainly needed. This is not only relevant for students who want to become future academics, but data becomes an important part of many professions in a digitalised society and skills in data use, management, archiving, and documenting will be relevant competences students need to learn. Also, scholars wish training for data management skills [ 64 ]. The role of data archives along with data managers and libraries have been identified as central in fostering such skills [ 17 ].
Only three per cent of the downloads served teaching purposes. However, studies by Late and Kekäläinen [ 15 ] and Bishop and Kuula-Luumi [ 53 ] show higher share of downloads for teaching purposes from Finnish and UK archives. There might be several reasons for the difference. However, users of FORSbase can only indicate one use purpose per download, while they could use the data for several purposes. Researchers can download a dataset for a research project and then use this project and the dataset in teaching without re-downloading the data and register it as a purpose for teaching. Also, they may ask the students to download the data, for example, in a research methods seminar. The high share of students among the users suggests that teaching is a frequent use of the datasets downloaded from FORSbase. However, an important question for future research is what data re-use means in teaching. Is it rather to teach research methods or also to replicate studies and foster the idea of responsible research already in teaching? Familiarizing students with the open research infrastructures might be an effective way to promote open science ideals.
More than one third of the downloads were made for research purposes. The share of research use was lower in the study by Late and Kekäläinen [ 15 ] covering only on fifth of the total use. In the Swiss archive, about half of the downloads for research were expected to result in a publication. Professors, lecturers, and post-doctoral scholars were most likely to plan to use the dataset for a publication. However, there is little evidence about how often re-used data are actually utilized in publications and for what purposes data are used for [ 65 ]. Unfortunately, no further information is available from our data that shows other research purposes than publications. Regarding Responsible Research and Innovation, it would be interesting to follow how often data is re-used for validation or replication purposes rather than publication.
Regarding the policy demand for open science and open data, the valorisation of data sharing becomes relevant. Data stewardship is not yet a relevant aspect in academic career development, which might hinder the motivation to share and document data sufficiently [ 36 , 39 ]. However, European guidelines for responsible research assessment have already included data and data sharing as research outputs and activities to be recognized in the evaluation [ 66 ]. Therefore, further efforts should be made to study how (and how often) re-used datasets are cited in publications and how archives guide users to cite data. Data citation practices in social sciences are still evolving since citations are shown to be often incomplete or erroneous [ 15 , 67 – 69 ]. Not all re-used research data are cited, at least not in a formal way [ 15 ]. Developing more formal data citation practices would enable a quantitative evaluation of the impact of data re-use. The challenge is to get scholars to cite data in a systematic way [ 70 ]. This would also serve the need to provide quantitative metrics for evaluating the impact of research infrastructures [ 6 ]. User log data can provide information concerning the number of downloaded data, but for evaluating the impact on research, further studies are needed exploiting, for example, bibliometric methods.
The results provide several practical implications for utilizing user log data for evaluating digital data archive use and as a source of research data. First, it would be important for the archives to define clearly what a data “version” is and to separate updates from new waves that comprise a new dataset. As new versions and updates of the datasets influence user behaviour and the number of downloads and thus, should be taken into consideration when user log data is used in archive evaluation or in research. The most frequently downloaded datasets are characterised by various versions and are updated more often than datasets provided by individual scholars. In our study we decided to analyse both, the full number of downloads and unique downloads to recognize the share of duplicates. The differences were not significant yet existed. Further, our results provide implications for collecting user log data. For example, information collection should cover all kinds of users and use types. In the case of FORSbase, for example, “studying” as a data re-use purpose was not provided. This underlines the importance of user studies for the service providers to truly know who their clients are. Given the relevance of replicational and open research data in science policy and the lack of knowledge on open research data practices, it is also advisable to archives to collect meaningful log data to be able to supplement ethical considerations with empirical evidence on data re-use.
This study comes with limitations: making conclusions about data re-use based on user log data is somewhat unreliable since it is likely that not all downloaded datasets are used, or some are used for many times or for other purposes than expected. Generalizing findings across organizations may be challenging because download metrics may be contingent on the specific characteristics of the data archive or related organisations [ 4 ]. For example, datasets can be used as course material possibly leading to hundreds of data downloads [ 15 ]. Additionally, log-data cannot provide qualitative insights into the data re-use (e.g., why a dataset was selected and how it was used). Still, user log data can give useful insights into re-use of research data and the users of data archives on the macro level beyond self-reported data re-use and from the point of view of the archive [ 5 ]. Our findings show that data is downloaded for various purposes and by various user groups from the archive. Thus, studying data re-use based for example on citations captures only part of the data re-use. Results of this study will give grounds for future studies in this respect. In addition, we analysed log data only from one archive. However, as our results are in line with a similar study conducted in Finland [ 15 ], we believe the results can be generalised to similar national social science data archives. Future research will show how the frequency of data downloads will develop as open data practices establish in the social sciences.
This study contributes to our understanding of the utilization of digital data archives in the realm of social sciences. The findings indicate the demand for social science data, as evidenced by the increasing number of data downloads from a Swiss data archive. However, it is noteworthy that as majority of the archived datasets were downloaded at least once, a limited set of longitudinal and time-series survey datasets compiled by organizations rather than individual scholars gained substantial share of the downloads. Since the case archive primarily specializes in housing quantitative data, the re-use of qualitative data was marginal. Among the users, students constituted a significant proportion who accessed the archive to acquire data for their educational purposes. Nonetheless, the user base encompassed individuals from diverse roles, including experienced and novice scholars and non-academics. As the findings are in line with previous research [ 15 ] it is likely to find similar patterns across data archives specialised for the social sciences. The increasing availability of digital datasets for the re-use may create new data practices within social sciences.
Enriched log data capturing the use of the digital data archive provide a macro level understanding about the re-use of the data from singular archive. To obtain more comprehensive insights into data re-use and evolving data practices within social sciences, future research applying quantitative and qualitative approaches is needed. A future research agenda on data re-use would include comparative studies of different archives (which would preclude some previous agreement on the collection of meta-data between archives), studies into the (epistemological and empirical) meanings and definitions of re-use of research data in social sciences and into the trade-offs between collecting new data versus re-using existing data. A very important issue is the data citation practices in social sciences. For further developing the research infrastructures, user studies are needed to address how users interact with the infrastructures, what obstacles they face and what support they desire.
Acknowledgments.
We thank Dr. Jaana Kekäläinen for her valuable comments for the manuscript.
This research was partially funded by Academy of Finland ( https://www.aka.fi/en/ ) grant 351247 (EL) and benefitted from a Short Term Scientific Mission of the COST Action CA 15137 ‘European Network for Research Evaluation in the SSH (ENRESSH)’, supported by European Cooperation in Science and Technology ( https://www.cost.eu/ ) (EL, MO). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
OpenGPT-X Team publishes European LLM Leaderboard
15.07.2024 // Press Releases and Reports , Research , ScaDS.AI Dresden/Leipzig
Exploring the Future of AI: 10th International Summer School on AI and Big Data
09.07.2024 // Events , ScaDS.AI Dresden/Leipzig
Mutation Explorer – A 3D Tool for Protein Mutation Visualization and Analysis
02.07.2024 // ScaDS.AI Dresden/Leipzig
Tutorial: JupyterHub on HPC
A Hitchhiker’s Guide to Ontology
ScaDS.AI (Center for Scalable Data Analytics and Artificial Intelligence Dresden/Leipzig is a center for Data Science, Artificial Intelligence and Big Data with locations in Dresden and Leipzig. It is one of the five new AI centers in Germany funded under the federal government’s AI strategy by the Federal Ministry of Education and Research (BMBF) and the Free State of Saxony. It is established as a permanent research facility at both locations with strong connections to the local universities: the TUD Dresden University of Technology and the Leipzig University .
ScaDS.AI Dresden/Leipzig expands the former Big Data competence center ScaDS Dresden/Leipzig, founded in 2014, and combines the AI and data science expertise of the partner institutions to close the gap between the efficient use of mass data, knowledge management and advanced AI. For this reason, the center conducts interdisciplinary research with an international team of over 60 Principal Investigators, more than 180 employees and 4 junior research groups in Dresden and Leipzig.
For Professionals
We offer consulting and support as well as possibilities for training and collaboration to companies and professionals…
For Students
Our multifaceted teaching activities lay the foundation for future in-depth research and high-quality practice…
For Beginners
The Living Lab invites interested citizens of all ages and experience levels to learn more about Artificial Intelligence…
Living Lab Lecture Series
Supported by:.
Copyright 2024 © SCADS.AI Dresden/Leipzig – All rights reserved.
UC Berkeley researchers launched a pioneering interdisciplinary training program this week that will blend criminal justice and computer science in ways that experts say will help reduce long-standing, systemic inequities in the criminal legal system.
The program, called Computational Research for Equity in the Legal System (CRELS), is being made possible with a $3-million National Science Foundation grant. Launched by a multidisciplinary research team that includes Berkeley’s Division of Social Sciences , Social Science Matrix , D-Lab , College of Computing, Data Science, and Society , Berkeley Institute for Data Science , Institute for the Study of Societal Issues , Human Technology Futures group, Possibility Lab , Eviction Research Network and EPIC Data Lab , the CRELS program will bring together researchers in the social sciences, computer science and statistics. It will equip a new generation of diverse Ph.D. students with the skills needed to tackle problems at the intersection of inequality, criminal legal systems, data science, artificial intelligence and big data.
"This program leverages Berkeley's expertise in social sciences, data science and AI to bring a transformative approach to the study of criminal justice systems,” said David Harding, the grant’s principal investigator and chair of Berkeley’s Department of Sociology. “We will train doctoral students to harness the power of large-scale data to develop novel interventions to reduce inequities in criminal justice systems and foster a more just society.”
This innovative program aligns with the NSF’s Big Ideas, including Harnessing the Data Revolution, Growing Convergence Research and Transforming Education and Career Pathways. It seeks to create a link between these ambitious goals and Berkeley’s faculty expertise in the social sciences, criminal legal systems, data science, and the ethics and social implications of AI. CRELS is supported by a $3-million dollar, 5-year grant from NSF’s NRT Research Traineeship Program .
"We're excited to be at the forefront of this crucial convergence of disciplines,” Berkeley Social Sciences Dean Raka Ray said. “The CRELS program reflects our commitment to using data science and technology to address some of society's most pressing social issues, such as the need for criminal justice system reform.”
BIDS Executive Director Ashish Sahni added that this is a great opportunity to do interdisciplinary work while creating knowledge and providing opportunity.
CRELS seeks to examine the use and misuse of AI within justice institutions. This research could contribute to a broader understanding of the social implications of AI, a topic of increasing importance as AI technologies become more pervasive in our society. The program aims to generate new scientific knowledge and develop novel tools for large-scale data integration and analysis.
The program will not only prepare a workforce capable of addressing complex societal issues with cutting-edge tools, but also contribute to the diversification of the scientific workforce by recruiting and training a broader representation of graduate students in these critical fields and implementing diversity, equity, inclusion and belonging values.
"By actively building these values into program design, CRELS seeks to enrich the data science workforce, creating inclusive intellectual spaces and expanding opportunities for traditionally underrepresented students," Harding said. “Its innovative approach will shape the future of graduate training programs and foster enduring interdisciplinary collaborations among faculty."
This story was first published by Berkeley Social Sciences .
School of Law news Wednesday 17 July 2024
S outh Asian Heritage Month in the UK was co-founded in 2019 by Jasvir Singh CBE and Dr Binita Kane.
Jasvir Singh CBE and Dr Binita Kane ’s mission was to use the heritage month to ‘deepen people’s understanding of the rich and diverse contributions of South Asian communities to British society’. First celebrated in 2020, it commemorates, marks and celebrates South Asian cultures, histories, and communities. The dates of the month (18 July – 17 August) are designed to span several Independence Days across the region (Maldives, Bhutan, Pakistan, India). It also roughly coincides with Saravan/Sawan, the primary monsoon month during which the region’s habitat undergoes renewal.
Our academic strategy for 2020 to 2030 ‘Universal Values, Global Change’, sets a blueprint for a values-driven university that harnesses expertise in research and education to help shape a better future for humanity. The School of Law is proud to have staff that strengthen the vital link between South Asia and the UK by conducting impactful research that addresses global inequalities, highlights the significant contributions of South Asian countries on the international stage, and makes a tangible difference in the world.
Read on to discover their research profiles.
Professor Subhajit Basu , Professor of Law and Technology, investigates the numerous challenges posed by digital technologies across various societal sectors, including transport, education, healthcare, and social justice. Renowned internationally for pioneering interdisciplinary research, Professor Basu's work particularly emphasizes the Global South. Recognizing the potential for ‘big data’ to control lives, Professor Basu is deeply interested in enhancing consumer information and empowerment. Additionally, he seeks to update legal frameworks to better protect privacy and provide the public with the necessary knowledge to make informed decisions. From 2018 to 2021, he served as the Chair of the British and Irish Law Education Technology Association . In 2020, he was honoured with the Hind Rattan by the Non-Resident Indians Welfare Society of India for outstanding contributions to education and achievements in Information Technology Law.
He is an Adjunct Professor at Parul University , as well as a Visiting Scholar at West Bengal National University of Juridical Sciences . He also became the International Advisor of The Dialogue think tank last year. He sits on the Editorial Advisory Board of two Indian Law School journals: NUJS Journal of Regulatory Studies and NALSAR Law Review .
This year alone, he has delivered 7 Keynotes or been Invited Speaker at conferences organised in India , including a keynote at the prestigious Rajiv Gandhi School of Intellectual Property Law, IIT Kharagpur . This year he also published two articles centring India:
Evaluating ICT Adoption in the Indian Judiciary: Challenges, Opportunities, and the Impact of the eCourts Project and Silenced Voices: Unravelling India's Dissent Crisis Through Historical and Contemporary Analysis of Free Speech and Suppression .
Professor Basu is a member of the Centre for Business Law and Practice .
You can find him on X: basu_subhajit and LinkedIn: Subhajit Basu.
Dr Sanjay Jain, an outstanding blind scholar who was formerly the Principal of the Indian Law Society College of Law University in Pune, is now based at the National Law School of India University, Bengaluru.
Dr Jain’s publications, which have been quoted by the Indian Supreme Court, include leading works on Indian Constitutional Law as well as on issues of disability and human rights in India. His advice is regularly sought by members of the judiciary, the administration and civil society.
Dr Jain, and his institutions in Pune and Bengaluru, have been collaborating with Professor Anna Lawson and colleagues at the School of Law, University of Leeds, since 2018. In that year, Dr Jain began acting as the Indian partner for the Leeds-based Inclusive Public Space research project. This project, funded by the European Research Council , explores ways in which law can more effectively be used to challenge the disadvantage (to disabled and older pedestrians in particular) caused by inaccessible, exclusionary aspects of city streets. The project shines a light on such barriers and legal initiatives in ten cities across five countries: India being one of them. Dr Jain is the lead author of an extensive report on relevant Indian law and policy and has played a vital role in facilitating and supporting the fieldwork in India.
Besides Dr Jain’s collaboration with Leeds through the Inclusive Public Space project, the Leeds Centre for Disability Studies has supported and co-hosted two international conferences led by Dr Jain during his time at Pune. He will support three events in India in July and August. Dr Jain will be visiting Leeds in September 2024 – when he will take part in the Inclusive Public Space final conference on 16-17 September as well as a number of other events.
Dr Amrita Limbu ’s research delves into the lived experiences of individuals from migrant communities and from low- and moderate-income backgrounds. She is currently a Postdoctoral Research Fellow at the School of Law working on the Making it to the Registers: Documenting Migrant Carers’ Experiences of Registration and Fitness to Practice project with Professor Marie-Andrée Jacob (Primary Investigator) and Dr Priyasha Saksena (Co-Investigator). In this role, she is involved in archival and qualitative research exploring the migrant and refugee health professionals’ experience with professional registration in the UK.
She completed her PhD on migration and affective family relations across two migration pathways of education migration and labour migration from Nepal from the Institute for Culture and Society, Western Sydney University, Australia.
She is interested in migrant stories and experiences, and experiences of transnational family life owing to migration and living away from her family. Prior to her PhD, she was a researcher at Social Science Baha ’s Centre for the Study of Labour and Mobility (CESLAM) in Kathmandu, Nepal. At CESLAM, she conducted research and fieldwork for several projects focused on labour migration from Nepal to the Persian Gulf and Malaysia.
In 2024 she completed a research project on ‘Migration and the Persistence of Inequality’ as part of the University of Leeds Michael Beverley Innovation Fellowship , to understand the inequality and the continual cycle of intergenerational migration from Nepal to the Persian Gulf countries.
She is part of a University of Reading-led consortium on transnational families – and presented at their symposium Migration, Care and Intersecting Inequalities in June 2024, with ‘Care, inequality, and intergenerational migration: Cultural insights on care and migration in Nepal’. Dr Limbu was the lead author of their policy briefing paper: Impact of COVID-19 on migrant families in the UK , published in March 2024. In May this year she gave a paper at the Britain-Nepal Academic Council Nepal Study Days.
She is a member of the Centre for Law and Social Justice .
Dr Ali Malik is a lecturer in Criminal Justice. Dr Malik leads the project ‘Policing and community resilience in the context of climate change’ , funded by Economic and Social Research Council’s (ESRC) Vulnerability & Policing Futures Research Centre . His current research focuses on the role of police and local governance actors in preparing for and responding to climate disasters and extreme weather events. He is interested in exploring how police and local governance actors perceive, categorise, and track climate vulnerability, and how they leverage community-based actors to inform local emergency planning and disaster response activity. He is also leading a project funded by the University of Leeds’ Research Culture Research Equity, Diversity and Inclusion (REDI) Fund to raise awareness about the impacts of climate change on marginalised communities and public services in the UK through the use of visual (photography) and aural (stories) narratives.
As the holder of the Michael Beverley Innovation Fellowship (Cohort 4, 2023-24), Dr Malik has been involved in fostering collaborative ties with local police forces and national bodies such as HMICS , the College of Policing , and the Police Foundation to garner support for co-produced research examining the impact of climate change on local communities and local police and first responders. Additionally, to develop links with international scholars and researchers in this field, in December 2023, Dr Malik participated in a symposium on Policing the Climate Crisis as part of the Australian and New Zealand Criminology Conference , held in Melbourne.
His book, The Politics of Police Governance: Scottish Police Reform, Localism, and Epistocracy (Policy Press) was launched in May 2024. In the book he developed an innovative framework that synthesised the concept of epistocracy with the broader scholarship on democratic policing, public administration, and police governance and accountability.
He is a Fellow of the Higher Education Academy and co-Deputy Director of the Centre for Criminal Justice Studies . He also appeared in the University of Leeds’ Celebrate Our Staff for the month of May 2024.
Find him on Twitter/X: @DrAliMalik_
Professor Surya Subedi OBE, KC, DCL, is Professor of International Law.
He has published 12 books and more than 60 scholarly articles in all major areas of international law in leading international law journals throughout his academic career. His publications emphasize the promotion of equity in international relations and the advancement of human rights.
At his OBE investiture he was described as having:
...made a highly distinguished contribution to our understanding of international law, and to its evolution" while his work in international law had "spanned almost every aspect of it – with a special focus on issues ... which make a real difference to people's lives. British Foreign Secretary
He has been an advisor to: the British Foreign Secretary (2010-2015); World Conservation Congress of the International Union for Conservation of Nature (2021); and a member of the Task Force on Investment Policy of the World Economic Forum (2015). He served for six years as the UN’s Special Rapporteur for human rights in Cambodia. In Nepal, he assisted the Prime Minister and other political leaders in resolving a 10-year Maoist conflict and in writing a new democratic constitution.
This year, Prime Minister Pushpa Kamal Dahal ‘Prachanda’ commended his achievements as a significant member of the Nepali diaspora. He said:
I sought Professor Surya Subedi’s assistance while drafting the Constitution of Nepal and reviewing past treaty agreements with India. He played a crucial role in those endeavours. Even in the recent Millennium Challenge Corporation (MCC) agreement, his input provided a middle-path which we embraced. Prime Minister Pushpa Kamal Dahal ‘Prachanda'
Between 2015 and 2022, he was Chairman of the Board of Editors of the Asian Journal of International Law, which is published by Cambridge University Press. He is also the editor of a Routledge series of books on ‘Human Rights and International Law’.
He was recently elected a Council Member of the Royal Asiatic Society of Great Britian and Ireland .
Professor Subedi is a member of Centre for Business Law and Practice .
Dr Nazia Yaqub is a lecturer in law, and her research interests span international human rights law, with her publishing record covering family law, child rights, law and religion, Islamic family law and cross-border parental child abduction. Dr Yaqub is a Solicitor of the Supreme Court of England and Wales and previously represented clients in Criminal, Mental Health, Family and Children’s law.
Dr Yaqub’s 2022 book Child Abduction to Islamic Law Countries examines statistical and empirical data she collated to explore how domestic and international law policies should be developed to uphold the rights of abducted children. Dr Yaqub is invited by the Permanent Bureau of the Hague Conference on Private International Law (HCCH) to share this research later this year with government officials and judges at its Fifth Malta Conference .
Dr Yaqub continues to work on policy developments in this area to prevent abductions and assess the implications of Islamic country accession to the Private International Law treaty, the 1980 Hague Abduction Convention. In this endeavour, she examines whether the use of GPS monitoring can be viewed as a bodyguard rather than a prison guard, to reduce the risk of cross-border parental child abduction, to be published in the leading journal, the Modern Law Review . She also received funding as a Michael Beverly Innovation Fellow to disseminate this novel research in video format .
In other projects, Dr Yaqub is working with adoption agencies in community engagement work, to improve adoption law processes for Muslim communities in the UK. And on the legal subject of ‘fam-migration’, she is working with colleagues at the Universities of Liverpool and Birmingham , together with NGOs: Social Workers without Borders and Bid (Bail for immigration detainees) investigating the complex interplay between family and immigration law court processes and decision-making.
Dr Yaqub is a fellow of the Higher Education Academy. She is a member of the Centre for Law and Social Justice and the Centre for Criminal Justice Studies . You can find her on X/Twitter: @DrNazia_Yaqub
The School of Law takes immense pride in counting such brilliant researchers among our staff, reflecting our commitment to using research to tackle some of the most important issues facing the global community today.
See all School of Law news
School of Law - Friday 12 July 2024
School of Law - Thursday 11 July 2024
The federal government collects taxes to finance various public services. As policymakers and the public weigh key decisions about revenues and expenditures, it is important to examine what the government does with the money it collects.
In fiscal year 2023, the federal government spent $6.1 trillion, amounting to 22.7 percent of the nation’s gross domestic product (GDP). About nine-tenths of the total went toward federal programs; the remainder went toward interest payments on the federal debt. Of that $6.1 trillion, over $4.4 trillion was financed by federal revenues. The remaining amount was financed by borrowing.
As the chart below shows, three major areas of program spending make up the majority of the budget:
Health insurance : Four health insurance programs — Medicare, Medicaid, the Children’s Health Insurance Program (CHIP), and Affordable Care Act (ACA) marketplace health insurance subsidies — together accounted for 24 percent of the budget in 2023, or $1.6 trillion. Roughly half of this amount, or $848 billion, went to Medicare, which in March 2023 provided health coverage to around 65.7 million people who are age 65 or older or have disabilities. The rest of this amount funded the federal costs of Medicaid and CHIP ($633 billion) and ACA subsidy and marketplace costs ($91 billion). Both Medicaid and CHIP require states to pay some of their total costs.
In March 2023, Medicaid and CHIP provided health coverage or long-term care to 93.9 million low-income children, parents, older adults, and people with disabilities. That was significantly higher than the 70.9 million enrollees before the pandemic because of temporary pandemic-related coverage protection, which expired in April 2023. With its expiration, enrollment dropped to 82.8 million by March 2024 and is likely to fall further, though projections are highly uncertain.
In February 2023, 14.3 million of the 15.7 million people enrolled in health insurance through ACA marketplaces received subsidies that lowered their premiums and out-of-pocket costs. Additionally, 20.8 million people opted for ACA marketplace coverage during the 2024 open enrollment period, a significant increase over enrollment in 2023.
Three other categories together account for the remaining program spending:
Economic security programs : About 8 percent (or $545 billion) of the 2023 federal budget supported programs that provide aid (other than health insurance or Social Security benefits) to individuals and families facing hardship. Economic security programs include: the refundable portions of the Earned Income Tax Credit and Child Tax Credit, which assist low- and moderate-income working families; programs that provide cash payments to eligible individuals or households, including unemployment insurance and Supplemental Security Income for low-income people who are over age 65 or disabled; various forms of in-kind assistance for low-income people, including the Supplemental Nutrition Assistance Program (formerly known as food stamps), school meals, low-income housing assistance, child care assistance, and help meeting home energy bills; and other programs such as aid for abused or neglected children.
Economic security programs keep millions of people above the poverty line each year. They also reduce, but do not eliminate, racial and ethnic differences in poverty rates.
In addition to program spending, the federal government makes regular interest payments on the money it has borrowed to finance past and current deficits. The net federal debt reached $23.7 trillion by the end of fiscal year 2023 and led to $658 billion in interest payments in 2023, or 10 percent of the budget. Interest costs reflect debt accumulated over the nation’s history — that is, the net impact of deficits and surpluses since 1789 — and therefore result from both revenue levels and program costs, past and present.
While critics often decry “government spending” in the abstract, it is important to determine whether the actual public services and investments that government programs provide are valuable. Federal revenue is used to pay for these services and investments. Consequently, when thinking about the costs that taxes impose, those costs should be weighed against the benefits the nation receives from the expenditure of those funds.
This backgrounder discusses total federal spending and thus does not distinguish between programs financed by general revenue and those financed by dedicated revenue (for example, the payroll taxes that support Social Security). For more information, see Policy Basics: Federal Payroll Taxes .
Our figures for fiscal year 2023 are derived from a database of account-level expenditures accompanying the President’s budget, released by the Office of Management and Budget on March 11, 2024. (Fiscal year 2023 ran from October 1, 2022 to September 30, 2023.)
The broad expenditure categories presented in this paper are constructed from official classifications commonly used by budget agencies. The categories consist of related programs and activities in different functions and subfunctions, as described below.
This category consists of the Medicare function (570), including benefits, administrative costs, and premiums, as well as the “Grants to States for Medicaid” account, the “Children’s health insurance fund” account, the ACA’s “Refundable Premium Tax Credit and Cost Sharing Reductions” account, and the ACA’s “Risk Adjustment Program Payments” account (all in function 550).
This category consists of all expenditures in the Social Security function (650), including both benefits and administrative costs.
This category is the national defense function (050).
This category includes all programs in the income security function (600) except those in the following two subfunctions: federal employees’ retirement and disability (602) and general retirement and disability insurance (601). The latter contains the Pension Benefit Guarantee Corporation and covers programs that provide pension and disability benefits to certain small groups of private sector workers.
This category combines the veterans’ benefits and services function (700) and the federal employee retirement and disability subfunction (602), which is part of the income security function.
This category consists of the net interest function (900).
This category includes all federal expenditures not included in one of the six categories defined above. The subcomponents of this category that are displayed in the chart are defined as follows:
The Center on Budget and Policy Priorities is a nonprofit, nonpartisan research organization and policy institute that conducts research and analysis on a range of government policies and programs. It is supported primarily by foundation grants.
Where do federal tax revenues come from.
IMAGES
VIDEO
COMMENTS
Big Data and Social Science Data Science Methods and Tools for Research and Practice. Ian Foster, Rayid Ghani, Ron S. Jarmin, Frauke Kreuter and Julia Lane. Preface to the 2nd edition.
2. Digital data require specific methods. The abundance and granularity of social media data have empowered and transformed network analysis. This latter technique has been used in sociology (Latour, Citation 2005; Scott, Citation 2012) and can be traced back to the sociometric work of Moreno (Citation 1934), who mapped out likes and dislikes among members of small social groups, such as ...
These data often have major differences in their origins, structure, and attributes compared to the data typically used social science research. Big Data Management: The big data management phase of the BDaaP framework involves both processes and supporting technologies for acquiring, storing, preparing, and retrieving the information for ...
We see at least seven reasons why qualitative research will be essential to 'big data' social science (Fig. 1). Fig. 1: Qualitative research and big data. Seven roles for qualitative research ...
The conclusion cautions against the marginalization of social science in the wake of developments in data-driven research that neglect social theory, established methodology and the contextual ...
Big data presents unprecedented opportunities to understand human behavior on a large scale. It has been increasingly used in social and psychological research to reveal individual differences and group dynamics. There are a few theoretical and methodological challenges in big data research that require attention. In this paper, we highlight four issues, namely data-driven versus theory-driven ...
forms of social research that dominated the 20th century. We claim that the challenges of big data are most pronounced vis-à-vis the canonical quantitative methodologies that have dominated social sciences for decades: they question established statistical techniques as well as key epistemic values and orienta-tions underpinning these approaches.
While sociologists have studied social networks for about one hundred years, recent developments in data, technology, and methods of analysis provide opportunities for social network analysis (SNA) to play a prominent role in the new research world of big data and computational social science (CSS).
From a philosophy-of-social-science perspective on big data, some researchers have discussed a paradigmatic shift toward "new empiricism" (based on a stronger focus on data evidence; Arbia 2021) or "digital positivism" (related to computer-generated evidence about the world; Fuchs 2017).More specifically, Chin-Yee and Upshur (2019) have identified three major philosophical problems ...
Drawing on interviews conducted with researchers at the forefront of big data research, we offer insight into questions of causal versus correlational research, the use of inductive methods, and the utility of theory in the big data age. ... they reassert the importance of fundamental tenets of social science research such as establishing ...
Big Data and Social Science: Data Science Methods and Tools for Research and Practice, Second Edition shows how to apply data science to real-world problems, covering all stages of a data-intensive social science or policy project. Prominent leaders in the social sciences, statistics, and computer science as well as the field of data science provide a unique perspective on how to apply modern ...
Perhaps some social scientists are resistant to social media data analytics because the methods differ from more traditional social science research. However, most Twitter analytic tools enable multiple research methods such as social network analysis, geographic analysis, content analysis, and textual hermeneutics and more.
Big Data and Social Science: Data Science Methods and Tools for Research and Practice, Second Edition shows how to apply data science to real-world problems, covering all stages of a data-intensive social science or policy project. Prominent leaders in the social sciences, statistics, and computer science as well as the field of data science provide a unique perspective on how to apply modern ...
This edited volume focuses on big data implications for computational social science and humanities from management to usage. The first part of the book covers geographic data, text corpus data, and social media data, and exemplifies their concrete applications in a wide range of fields including anthropology, economics, finance, geography, history, linguistics, political science, psychology ...
These data in conjunction with genome-wide genotype data and social science measures can reveal new insights to important research questions, for example, which genes of interest are subject to social regulation, how the social environment provokes the dynamics, and what social, psychological and biological mechanisms mediate the effects.
People look to academia as the source of innovation, and especially so in the natural and physical sciences. Researchers in biosciences, clinical medicine, physics, and chemistry have always generated new ideas for industry to capitalize on. Generally, innovations coming out of the social sciences w
Summary. Mihály Fazekas, PhD, Assistant Professor at the School of Public Policy, Central European University, discusses using big data for social science research including, new data sources and what they can help achieve, the difference between big data and traditional research methodology, and the collection and analysis of big data.
1. Introduction. Big data is heralded as a powerful new resource for social science research. The excitement around big data emerges from the recognition of the opportunities it may offer to advance our understanding of human behaviour and social phenomenon in a way that has never been possible before (see for example Burrows and Savage, 2014, Kitchin, 2014a, Kitchin, 2014b, Manovich, 2011 ...
The project on Big Data and Historical Social Science brings together researchers across a range of disciplines, methods, and research strategies to explore the intersection of classical historical and social science problems with big data. ... Social Science Research Council 300 Cadman Plaza West, 15th Floor Brooklyn, NY 11201, USA. We use ...
Big Data Social Science has three desired goals to better support big data and related research: (1) Expand research support (2) Help build an intellectual community around this work (3) Help expand data science teaching. ... The California Census Research Data Center (CCRDC) will soon be moving to into its new home at SSCERT. ...
This paper analyzes the ethics of social science research (SSR) employing big data. We begin by highlighting the research gap found on the intersection between big data ethics, SSR and research ethics. We then discuss three aspects of big data SSR which make it warrant special attention from a research ethics angle: (1) the interpretative character of both SSR and big data, (2) complexities of ...
Research data and data archives in the social sciences. The European Commission [] defines research infrastructures as "facilities that provide resources and services for research communities to conduct research and foster innovation" (p. 1).Research data archives are thus part of the infrastructure supporting and enabling open science by storing, managing, and disseminating research data ...
Research on AI and Big Data. ScaDS.AI Dresden/Leipzig expands the former Big Data competence center ScaDS Dresden/Leipzig, founded in 2014, and combines the AI and data science expertise of the partner institutions to close the gap between the efficient use of mass data, knowledge management and advanced AI.
UC Berkeley researchers launched a pioneering interdisciplinary training program this week that will blend criminal justice and computer science in ways that experts say will help reduce long-standing, systemic inequities in the criminal legal system. The program, called Computational Research for Equity in the Legal System (CRELS), is being made possible with a $3-million National Science ...
Read on to discover their research profiles. Big data and the Global South. Professor Subhajit Basu, Professor of Law and Technology, investigates the numerous challenges posed by digital technologies across various societal sectors, including transport, education, healthcare, and social justice. Renowned internationally for pioneering ...
Economic security programs: About 8 percent (or $545 billion) of the 2023 federal budget supported programs that provide aid (other than health insurance or Social Security benefits) to individuals and families facing hardship.Economic security programs include: the refundable portions of the Earned Income Tax Credit and Child Tax Credit, which assist low- and moderate-income working families ...